⚡ Transformers - test · Scour

Automated Attention Pattern Discovery at Scale in Large Language Models 💬LLMs

arxiv.org·3d

The Lyra Technique: Cognitive Geometry in Transformer KV-Caches — From Metacognition to Misalignment Detection 🔀LoRA

zenodo.org·7h·r/artificial

Zero-Shot Alignment: Harm Detection via Incongruent Attention Mechanisms 🔀LoRA

lesswrong.com·1d

milanm/AutoGrad-Engine: A complete GPT language model (training and inference) in ~600 lines of pure C#, zero dependencies 💬LLMs

github.com·19h·Hacker News

Demystifying FlashAttention: Building v1 from Scratch in Pure PyTorch ⚙️Inference

medium.com·2d

The Post-Transformer Era ⚙️Inference

medium.com·6d

FABLEN: Fuzzy Adaptive Bilinear Logic Engine Networks ⚙️Inference

medium.com·8h

Task Bert 📐Embeddings

producthunt.com·1d

30 Days of Building a Small Language Model — Day 6: Picking the Right Attention Mechanism: What… ⚙️Inference

devopslearning.medium.com·18h

Краткий справочник про внимания (self-attention, cross-attention, multi-head attention) 🔀LoRA

habr.com·2d

🧠 Bidirectional Encoder Representations from Transformers (BERT) 📐Embeddings

medium.com·5d

How the Mixture of Experts Architecture Works in AI Models 📊AI Evals

freecodecamp.org·2d

Mamba4 Explained: A Faster Alternative to Transformers for Sequential Modeling ⚙️Inference

analyticsvidhya.com·6d

Reading Note: Mamba-3 and the State Space Model Renaissance ⚙️Inference

ngrislain.github.io·1d·Hacker News

LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss 💬LLMs

machinelearning.apple.com·1d

Understanding Positional Embeddings in Transformers (with Intuition and Examples) 📐Embeddings

pub.towardsai.net

·6d

Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers 💬LLMs

arxiv.org·6h

AI Agentic Systems & Orchestration: The Architecture Behind Intelligent Automation 🕵️AI Agents

medium.com·3d

Is Gemma 4 Truly a Native Multimodal Model? — Dissecting the Architecture 🔀LoRA

medium.com·5d

CROSSPOST: COSMA SHALIZI: Aware of All Internet Traditions: Large Language Models as Information Retrieval & Synthesis 💬LLMs

braddelong.substack.com·3d·Substack

Loading more...