🤖 Transformers - hello · Scour

Periodic RoPE for Infinite Context LLMs 🪢Rope Data Structure

Understanding Qwen 2.5: Features, Benefits, and Practical Applications 🔒S2N-QUIC

How KV Caching Slashes LLM Inference Costs at Scale ⏪Deoptimization

digitalocean.com·7h

jmaczan/tiny-vllm: Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM 🦙Ollama

github.com·2d·Hacker News

AI Paper Review: GPT-4 Technical Report (GPT-4) 💬Prompt Engineering

freecodecamp.org·4d

Attention: the series 🗃️Zettelkasten

Predictive metacognition: a neuro-computational framework for self-monitoring in large language models 💬Prompt Engineering

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs 💬Prompt Engineering

office233/Nexuscortex: The world's first OCM (Organic Cognitive Model). A new AI paradigm beyond LLMs. 500M params, ternary weights, 10 brain regions, sleep consolidation, emotional AI. Written from scratch in Go. 🧮Intel MKL-DNN

github.com·4d·Hacker News, Hacker News

Parallax: Parameterized Local Linear Attention for Language Modeling 💬Prompt Engineering

arxiv.org·3d·r/LocalLLaMA

Understanding Generalization and Forgetting in In-Context Continual Learning 💬Prompt Engineering

Anti Mode-Collapse in Mean-Field Transformer via Auxiliary Variables 🔄Transducers

Quaternion Self-Attention with Shared Scores 🤖TVM

Divide-and-Conquer Inference for Large-Scale Visual Recognition with Multimodal Large Language Models 🎨Chroma

MultiHaluDet: Multilingual Hallucination Detection via LLM Hidden State Probing 🤖TVM

Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations 🧮Embeddings

UNIQUE: Universal Top-k Sparse Attention for Training-free Inference and Sparsity-aware Training 📊HyperLogLog

An Efficient and Privacy-Preserving Architecture for Cross-Institutional Collaborative RAG 🔒Differential Privacy

Training Stratigraphy: Persistent Behavioral Artifacts in Large Language Models Observed Through Longitudinal AI-Human Interaction 💬Prompt Engineering

Quantum Parameterized Self-Attention Network for Image Classification ⚛️Quantum Computing

Log in to enable infinite scrolling