Back to article

Attention is all you need (2017) (opens in new tab) 37 articles covering this post

arxiv.org··DEV, Hacker News, Hacker News·Open original

Covered in 37 articles

How to Tame AI’s Voracious Appetite for Energy

nautil.us··Hacker News

Emerging Patterns in Building GenAI Products

martinfowler.com··Hacker News

Vision LLMs are PDF Parsers Too: Reading Charts and Diagrams for RAG

towardsdatascience.com·

Parse PDFs for RAG Locally with Docling: Rich Tables, No Cloud Upload

towardsdatascience.com·

Stop Returning Flat Text from a PDF: The Relational Shape RAG Needs

towardsdatascience.com·

Beyond extract_text: The Two Layers of a PDF That Drive RAG Quality

towardsdatascience.com·

RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

towardsdatascience.com·

Embeddings Aren’t Magic: The Predictable Failure Modes of RAG Retrieval

towardsdatascience.com·

Baseline Enterprise RAG, From PDF to Highlighted Answer

towardsdatascience.com·

Unpacking AI: The Hardware Behind AI

pathtostaff.com··Hacker News

5 Fun Papers That Explain LLMs Clearly

kdnuggets.com·

AI Coding Tip 024 - Force a Criteria Check Before the Task Ends

91. The Transformer Architecture: The Invention That Changed AI

The usual implementaiton of attention transformers (SDPA) is kind of bad, actually

gist.github.com··Hacker News

How LLMs Work, Part 1: How LLMs Process Text

shbhmrzd.github.io··r/programming

FareedKhan-dev/train-llm-from-scratch: A straightforward method for training your LLM, from downloading data to generating text.

wisnunugroho21/nugie-jax-nemotron: A simple, minimalistic, and explainable code implementation of of Nemotron 3 Nano in JAX

github.com··r/learnmachinelearning

Language Models Struggle to Keep a Secret

Current AI Model Inadequacies: Implications for the Global South

orfonline.org·

AI Paper Review: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

freecodecamp.org·

Evaluating the role of pretraining dataset size and diversity on single-cell foundation model performance

·

The AI Consciousness Debate Is Happening at the Wrong Level

recursiveintelligence.io··r/cogsci, r/neurophilosophy

A deep dive into the Transformer architecture

blog.algomaster.io·

The Memo - 13/Jun/2026

lifearchitect.substack.com

The Anatomy of an LLM | Interactive Visual Guide to How Language Models Work

royvanrijn.com··Hacker News

How LLMs Actually Work: A Friendly Map for Humans • oreoro

oreoro.github.io··Hacker News

Understanding KV Cache: The Hidden Memory Cost of Serving LLMs

melchi.me··Hacker News

Sebastian Mallaby, Biographer of Demis Hassabis — Lessons from 100+ AI Insiders on The Race to Superintelligence, The Religion of AI, and Spotting Breakthroughs...

Thread by @KyeGomezB on Thread Reader App

threadreaderapp.com·

How LLM Inference Works

arpitbhayani.me··Hacker News

AI 101: Your Ultimate Guide to Attention: Mechanism, QKV, and KV Cache

turingpost.com·

Inside the Transformer: The Life of a Token

aleksagordic.com··Hacker News

Self Attention

·

Give your agents disposable environments in Go

tigrisdata.com·

"Agentic AI" Is a Bonfire of the Tokens While Fab Capacity, Power Grids, and P&Ls Are the brakes: (NOT THE) READ OF THE DAY

braddelong.substack.com··Substack

In other languages

有人在拆 Transformer：Memory Caching 與 CTM 各拆走了一半

Aandacht is alles wat je nodig hebt

janvandenberg.blog·