Choose a depth level and language. Each version is a full standalone page that follows the paper's original structure from motivation to results.
All four pages share the same backbone but differ in depth and language.
The Transformer replaces recurrence and convolution with attention-only blocks, enabling parallel training and strong translation quality.