WAP PAPER HUB

Attention Is All You Need

Choose a depth level and language. Each version is a full standalone page that follows the paper's original structure from motivation to results.

Start / HS EN Grad EN 高中中文研究生中文

Paper Snapshot

Authors

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

Venue

NeurIPS (NIPS) 2017

Focus

Sequence transduction, machine translation, self-attention

arXiv / DOI

1706.03762 / 10.48550/arXiv.1706.03762

Choose a version

All four pages share the same backbone but differ in depth and language.

HS / English

Clear narrative + core numbers + light math.

Grad / English

Technical details, formulas, training setup, and analysis.

高中 / 中文

面向高中读者的清晰讲解与关键数字。

研究生 / 中文

结构细节、实验设置、结果与分析。

Paper snapshot

The Transformer replaces recurrence and convolution with attention-only blocks, enabling parallel training and strong translation quality.

Core ideaMulti-head self-attention + position-wise feed-forward layers stacked in an encoder-decoder.
Training scaleWMT14 En-De (4.5M) and En-Fr (36M) sentence pairs; 8x P100 GPUs.
Results28.4 BLEU on En-De and 41.8 BLEU on En-Fr, with fast training.

Resources

arXiv Abstract 1706.03762

Paper PDF Download PDF

NeurIPS Proceedings NIPS 2017

Reference Code Tensor2Tensor