WAP PAPER HUB

Attention Is All You Need

Choose a depth level and language. Each version is a full standalone page that follows the paper's original structure from motivation to results.

Paper Snapshot
Authors
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
Venue
NeurIPS (NIPS) 2017
Focus
Sequence transduction, machine translation, self-attention
arXiv / DOI
1706.03762 / 10.48550/arXiv.1706.03762

Choose a version

All four pages share the same backbone but differ in depth and language.

Paper snapshot

The Transformer replaces recurrence and convolution with attention-only blocks, enabling parallel training and strong translation quality.

Core idea
Multi-head self-attention + position-wise feed-forward layers stacked in an encoder-decoder.
Training scale
WMT14 En-De (4.5M) and En-Fr (36M) sentence pairs; 8x P100 GPUs.
Results
28.4 BLEU on En-De and 41.8 BLEU on En-Fr, with fast training.

Resources

arXiv Abstract 1706.03762
Paper PDF Download PDF
NeurIPS Proceedings NIPS 2017
Reference Code Tensor2Tensor