How the 2017 attention paper restarted the global AI race

How the 2017 attention paper restarted the global AI race | DuskNews

Chronology

Follow the arc from background to turning points. On mobile, swipe the cards and use the step rail below; on desktop, use the spine to jump.

From RNN to LLM era

1997📜 Backgroundtap to expand

LSTM published — recurrent baseline

Hochreiter and Schmidhuber publish Long Short-Term Memory, the recurrent architecture that becomes the dominant language-model building block for the next 20 years.

Sep 2014📌 Developmenttap to expand

Bahdanau introduces attention

Bahdanau, Cho, Bengio publish 'Neural Machine Translation by Jointly Learning to Align and Translate' — the first practical attention mechanism, bolted onto an RNN.

Jun 2017🔄 Turning Pointtap to expand

'Attention Is All You Need' uploaded to arXiv

Vaswani et al. submit the Transformer paper. Title is deliberately mild; the architecture replaces recurrence with self-attention and parallelizes training across the full sequence.

Oct 2018📌 Developmenttap to expand

BERT — first widely deployed Transformer

Google releases BERT. By Dec 2019, almost every Google Search query passes through a BERT model. Every major NLP benchmark is broken in months.

May 2020🚀 Breakthroughtap to expand

GPT-3 — emergent capabilities

OpenAI publishes GPT-3 with 175B parameters. The model exhibits zero-shot and few-shot capabilities not present at smaller scales — the first hint that scale alone produces qualitatively new behavior.

Nov 2022⚡ Climaxtap to expand

ChatGPT launches

OpenAI's chat product, built on GPT-3.5, hits 100 million users in two months — the fastest consumer product adoption in history. The Transformer paper, five years old, is suddenly a household conversation.

Mar 2023📌 Developmenttap to expand

GPT-4 — multi-modal frontier

OpenAI releases GPT-4. Estimated $50-100M training cost. The Transformer architecture has scaled five orders of magnitude in six years — every modern frontier system is still a Transformer.

Aug 2024✅ Resolutiontap to expand

Google licenses Character.ai for ~$2.7B

Two of the original eight Transformer authors return to Google in a complex licensing deal. The diaspora has now turned full-circle: Google bought back the people it published the architecture with.

Step 1/8 events

From RNN to LLM era

1997Background

LSTM published — recurrent baseline

Hochreiter and Schmidhuber publish Long Short-Term Memory, the recurrent architecture that becomes the dominant language-model building bloc…

Sep 2014Development

Bahdanau introduces attention

Bahdanau, Cho, Bengio publish 'Neural Machine Translation by Jointly Learning to Align and Translate' — the first practical attention mechan…

Jun 2017Turning Point

'Attention Is All You Need' uploaded to arXiv

Vaswani et al. submit the Transformer paper. Title is deliberately mild; the architecture replaces recurrence with self-attention and parall…

Oct 2018Development

BERT — first widely deployed Transformer

Google releases BERT. By Dec 2019, almost every Google Search query passes through a BERT model. Every major NLP benchmark is broken in mont…

May 2020Breakthrough

GPT-3 — emergent capabilities

OpenAI publishes GPT-3 with 175B parameters. The model exhibits zero-shot and few-shot capabilities not present at smaller scales — the firs…

Nov 2022Climax

ChatGPT launches

OpenAI's chat product, built on GPT-3.5, hits 100 million users in two months — the fastest consumer product adoption in history. The Transf…

Mar 2023Development

GPT-4 — multi-modal frontier

OpenAI releases GPT-4. Estimated $50-100M training cost. The Transformer architecture has scaled five orders of magnitude in six years — eve…

Aug 2024Resolution

Google licenses Character.ai for ~$2.7B

Two of the original eight Transformer authors return to Google in a complex licensing deal. The diaspora has now turned full-circle: Google…

How the 2017 attention paper restarted the global AI race

Understand in 2 minutes

Eight people, one room, six months

More in Technology

India's banking sector under siege: the cyber attack wave of 2026

India's 1,700 GCCs are reshaping Bengaluru and Hyderabad

India's UPI moved more transactions this year than Visa did

Where the eight authors went

2017 vs 2026 — how AI changed

Where the architecture goes next

Why this matters beyond AI

From RNN to LLM era

LSTM published — recurrent baseline

Bahdanau introduces attention

'Attention Is All You Need' uploaded to arXiv

BERT — first widely deployed Transformer

GPT-3 — emergent capabilities

ChatGPT launches

GPT-4 — multi-modal frontier

Google licenses Character.ai for ~$2.7B

From RNN to LLM era

LSTM published — recurrent baseline

Bahdanau introduces attention

'Attention Is All You Need' uploaded to arXiv

BERT — first widely deployed Transformer

GPT-3 — emergent capabilities

ChatGPT launches

GPT-4 — multi-modal frontier

Google licenses Character.ai for ~$2.7B

Deep Analysis

Sources (6)