Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion, Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

The paper introduces the Transformer, a novel neural network architecture based solely on attention mechanisms, which outperforms previous models in machine translation and generalizes well to other NLP tasks, while being more efficient to train.
The Transformer architecture eliminates recurrence and convolutions, achieving superior translation quality and training efficiency, and demonstrates versatility across different NLP tasks.
Achieved 28.4 BLEU on WMT 2014 English-German translation.
Set a new state-of-the-art BLEU score of 41.8 on WMT 2014 English-French translation.
Requires significantly less training time compared to previous models.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a…
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
- 🤗nvidia/canary-qwen-2.5bmodel· 144k dl· ♡ 404144k dl♡ 404
- 🤗nvidia/diar_sortformer_4spk-v1model· 5.3k dl· ♡ 1375.3k dl♡ 137
- 🤗nvidia/diar_streaming_sortformer_4spk-v2model· 23k dl· ♡ 11123k dl♡ 111
- 🤗google/paligemma-3b-pt-224model· 86k dl· ♡ 42686k dl♡ 426
- 🤗google/paligemma-3b-mix-448model· 2.9k dl· ♡ 1162.9k dl♡ 116
- 🤗nvidia/canary-180m-flashmodel· 1.3k dl· ♡ 971.3k dl♡ 97
- 🤗nvidia/canary-1b-v2model· 123k dl· ♡ 371123k dl♡ 371
- 🤗ysdede/canary-180m-flash-onnxmodel· 24 dl· ♡ 124 dl♡ 1
- 🤗HamidRezaAttar/gpt2-product-description-generatormodel· 1.5k dl· ♡ 141.5k dl♡ 14
- 🤗keras-io/denoising-diffusion-implicit-modelsmodel· 27 dl· ♡ 1027 dl♡ 10
Attention Is All You Need· youtube
PhD Bodybuilder Predicts The Future of AI (97% Certain) [Dr. Mike Israetel]· youtube
Bold AI Predictions From Cohere Co-founder· youtube
Jay Alammar on LLMs, RAG, and AI Engineering· youtube
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
Methods07 Ways to Connect to someone at United Airline𝒔 · Aeroméxico Teléfono Guatemala ¿Cómo llamar a Aeroméxico desde Guatemala? · Aeroméxico Teléfono Guatemala ¿Cómo llamar a Aeroméxico en Guatemala? · Avianca Teléfono Guatemala ¿Cómo me comunico con Avianca Guatemala? · Avianca Teléfono Guatemala ¿Cómo puedo comunicarme con Avianca Guatemala? · Avianca Teléfono Guatemala ¿Cómo llamar a Avianca desde Guatemala? · Spirit Teléfono Guatemala ¿Cómo llamar a Spirit desde Guatemala? · United Teléfono Guatemala ¿Cómo llamar a United desde Guatemala? · United Teléfono Guatemala ¿Cómo llamar a United Airlines desde Guatemala? · Delta Teléfono Guatemala ¿Cómo llamar a Delta desde Guatemala?
