🤖 Transformer Architecture - buckman · Scour

Attention With Actual Numbers 🤖Transformers

pub.towardsai.net

·2h

Task Bert 🤖Transformers

producthunt.com·6d

Paper page - Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation 🤖Transformers

huggingface.co·10h

BERT (2018) 🤖Transformers

litecoder.medium.com·2d

amitshekhariitbhu/llm-internals: Learn LLM internals step by step - from tokenization to attention to inference optimization. 🤖LLM Inference

github.com·1d·Hacker News

Fine-tuning a Summarization model 🔤Tokenization

sakhawathossenofficial.medium.com·16h

Fine-tuning a DistilBERT classifier with numerical and text inputs 🔤Tokenization

engineering.freeagent.com·4d

All in One for AI Chatbot ✍️Prompt Engineering

nottoai.com·1d·Hacker News

Implementing DeepSeek-V2’s Multi-Head Latent Attention (MLA) from Scratch in PyTorch 🔥PyTorch

medium.com

·1d

GCA-DETR: Global-context-aware-based detection transformer 👁️Computer Vision

sciencedirect.com·4d

The Lyra Technique: Cognitive Geometry in Transformer KV-Caches — From Metacognition to Misalignment Detection 🧠Stacked PKM

zenodo.org·5d·r/artificial

A single-layer, single-head neural transformer written in PDP-11 assembly language 🧠Neuromorphic Computing

blog.adafruit.com·6d

Neural Networks 🧠Deep Learning

rlj0713.medium.com·4d

Low-Rank Key Value Attention: Reducing KV Cache Memory and Maintaining Head Diversity ⚡Quantization

fin.ai·5d·Hacker News

Understanding BERTopic: From Raw Text to Interpretable Topics 🤖Transformers

analyticsvidhya.com·3d

milanm/AutoGrad-Engine: A complete GPT language model (training and inference) in ~600 lines of pure C#, zero dependencies 🧠LLM

github.com·5d·Hacker News

Detecting Translation Hallucinations with Attention Misalignment 🧠LLM

towardsdatascience.com·6d

tmaselko/paper-attncap: Repository associated with the "Separate and Amplify: Attention's Geometry of Retrieval" paper. Contains TSAR synthetic task, minimal model, training/repro code, and chart/table generation. 🤖Transformers

github.com·6d·Hacker News

Neural Networks for Language: How Context Became a Learned Transformation 🧠LLM

pub.towardsai.net

·4d

SPUTNIKAI/LeechTransformer: Leech-Lila: A Geometric Attention Transformer(Language Model) with the Leech Lattice Attention 🤖LLM Inference

github.com·6d·Hacker News