Back to article

pathtostaff.com

Self-Attention Solved the Sequential Bottleneck (opens in new tab)

Covers 10 stories including Attention is all you need (2017)Covered by tldr.techDiscussed on Hacker News

Covers 10 related stories

Attention is all you need (2017)

Discussed on Hacker News, Hacker News, and DEV

DeepSeek-V3 Technical Report

Discussed on Hacker News and Hacker News

Language models are few-shot learners (2020)

Discussed on Hacker News

Current LLMs are the future? No ways man! Look at Mamba: Selective State Spaces

Discussed on r/LLM

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Discussed on Hacker News

transformer-circuits.pub·

In-context Learning and Induction Heads

YaRN: Efficient Context Window Extension of Large Language Models

[2401.04088] Mixtral of Experts

[1409.0473] neural machine translation by jointly learning to align and translate

Let's build GPT: from scratch, in code, spelled out.

Covered in 1 article

GLM 5.2 comparisons ⚔️, use AI for code review 👀, Deno Desktop 🖥