🔄 Transformers - matan · Scour

Beyond Standard LLMs

magazine.sebastianraschka.com·7h·

Discuss: Hacker News, r/LLM

🎲Bayesian Cognition

Flag this post

Transformer-Based Decoding in Concatenated Coding Schemes Under Synchronization Errors

arxiv.org·15h

🧮Information theory

Flag this post

Detailed Technical Documentation on AI Implementation Logic (Taking Large Language Models as an Example )

nbtab.com·11h·

Discuss: DEV

Flag this post

Everything About Transformers

krupadave.com·5d

📡Information Theory

Flag this post

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm EnablesFine-Grained Policy Optimization

paperium.net·2d·

Discuss: DEV

🎲Bayesian Cognition

Flag this post

Hybrid-Attention models are the future for SLMs

inference.net·18h·

Discuss: Hacker News

🔧Workflow Automation

Flag this post

Spatial Secrets: Unleashing Language Models with Unexpected Masking by Arvind Sundararajan

dev.to·15h·

Discuss: DEV

🎲Bayesian Cognition

Flag this post

Accumulating Context Changes the Beliefs of Language Models

arxiv.org·15h

🎲Bayesian Cognition

Flag this post

Attention ISN'T all you need?! New Qwen3 variant Brumby-14B-Base leverages Power Retention technique

venturebeat.com·43m

🔧Workflow Automation

Flag this post

Hybrid channel attention network for auditory attention detection

nature.com·1d

🎲Bayesian Cognition

Flag this post

Explore More, Learn Better: Parallel MLLM Embeddings under Mutual Information Minimization

arxiv.org·15h

🧮Information theory

Flag this post

Post-training methods for language models

developers.redhat.com·13h

Flag this post

AI Summarization Optimization

schneier.com·1d·

Discuss: Hacker News

Flag this post

Open Source Context-Aware PII Classifier

corp.roblox.com·39m·

Discuss: Hacker News

Flag this post

What Are Auto-regressive Models? A Deep Dive and Typical Use Cases

blog.pangeanic.com·1d

Flag this post

An underqualified reading list about the transformer architecture

fvictorio.github.io·5d·

Discuss: Hacker News

🎨Computational Creativity

Flag this post

ParaScopes: What do Language Models Activations Encode About Future Text?

arxiv.org·15h

🧮Information theory

Flag this post

Beyond Broca: The Two Routes to Speaking

psychologytoday.com·19h

🧠Psycholinguistics

Flag this post

Optimizing Native Sparse Attention with Latent Attention and Local Global Alternating Strategies

arxiv.org·15h

Flag this post

DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection

arxiv.org·15h

Flag this post

Loading more...