Beyond Standard LLMs
magazine.sebastianraschka.comยท7hยท
Discuss: Hacker News, r/LLM
๐ŸŽฒBayesian Cognition
Flag this post
Transformer-Based Decoding in Concatenated Coding Schemes Under Synchronization Errors
arxiv.orgยท15h
๐ŸงฎInformation theory
Flag this post
Detailed Technical Documentation on AI Implementation Logic (Taking Large Language Models as an Example )
nbtab.comยท11hยท
Discuss: DEV
๐Ÿ“NLP
Flag this post
Everything About Transformers
krupadave.comยท5d
๐Ÿ“กInformation Theory
Flag this post
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm EnablesFine-Grained Policy Optimization
paperium.netยท2dยท
Discuss: DEV
๐ŸŽฒBayesian Cognition
Flag this post
Hybrid-Attention models are the future for SLMs
inference.netยท18hยท
Discuss: Hacker News
๐Ÿ”งWorkflow Automation
Flag this post
Spatial Secrets: Unleashing Language Models with Unexpected Masking by Arvind Sundararajan
dev.toยท15hยท
Discuss: DEV
๐ŸŽฒBayesian Cognition
Flag this post
Accumulating Context Changes the Beliefs of Language Models
arxiv.orgยท15h
๐ŸŽฒBayesian Cognition
Flag this post
Attention ISN'T all you need?! New Qwen3 variant Brumby-14B-Base leverages Power Retention technique
venturebeat.comยท43m
๐Ÿ”งWorkflow Automation
Flag this post
Hybrid channel attention network for auditory attention detection
nature.comยท1d
๐ŸŽฒBayesian Cognition
Flag this post
Explore More, Learn Better: Parallel MLLM Embeddings under Mutual Information Minimization
arxiv.orgยท15h
๐ŸงฎInformation theory
Flag this post
Post-training methods for language models
developers.redhat.comยท13h
๐Ÿ“NLP
Flag this post
AI Summarization Optimization
schneier.comยท1dยท
Discuss: Hacker News
๐Ÿ“NLP
Flag this post
Open Source Context-Aware PII Classifier
corp.roblox.comยท39mยท
Discuss: Hacker News
๐Ÿค–AI
Flag this post
What Are Auto-regressive Models? A Deep Dive and Typical Use Cases
blog.pangeanic.comยท1d
๐Ÿค–AI
Flag this post
An underqualified reading list about the transformer architecture
fvictorio.github.ioยท5dยท
Discuss: Hacker News
๐ŸŽจComputational Creativity
Flag this post
ParaScopes: What do Language Models Activations Encode About Future Text?
arxiv.orgยท15h
๐ŸงฎInformation theory
Flag this post
Beyond Broca: The Two Routes to Speaking
psychologytoday.comยท19h
๐Ÿง Psycholinguistics
Flag this post
Optimizing Native Sparse Attention with Latent Attention and Local Global Alternating Strategies
arxiv.orgยท15h
๐Ÿค–AI
Flag this post
DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection
arxiv.orgยท15h
๐Ÿ“NLP
Flag this post