Transformers

Feeds to Scour
SubscribedAll
Scoured 79 posts in 14.1 ms

Claude Mythos Glasswing: Why AI Vuln Discovery Terrifies Me

 🧠LLM  Content type: Blog  Content type: Discussion
tildalice.io·

Hasse Diagrams for Attention: A Partial Order Framework for Designing Transformer Masks

 🧠LLM  Content type: Academic
arxiv.org·

Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles

 📊Statistics  Content type: Academic
arxiv.org·

A Mean-Field Analysis of Multi-Head Self-Attention under Cross-Entropy Training

 📐Optimization Theory  Content type: Academic
arxiv.org·

See, Act, Correct: three levers for working with a code agent

 🎮Reinforcement Learning  Content type: Blog

princezuda/-RequiemGPT-: Fully open source and open weights built and trained by fable five with one prompt. An experience in how AI actually works

 🤖AI  Content type: Code
github.com··Hacker News

Parallel Causal Associative Fields: Gated Sparse Memory for Long-Context Language Modeling

 🎛️Control Systems  Content type: Academic
arxiv.org·

Introducing the Third Generation of Apple’s Foundation Models

 🤖AI

DMT: Demographic Conditioning, Morphology-Enhanced Transformer for Cuffless Blood Pressure Estimation from PPG Signals

 📶Communications  Content type: Academic
arxiv.org·

Beyond Item IDs: Scaling Short-Form-Video Recommendation via Semantic-Native Long Sequence Modeling

 💬LLMs  Content type: Academic
arxiv.org·

Human-Like Neural Nets by Catapulting

 🧠LLM
gwern.net··Hacker News

History of WYSIWYG editors and CMS: a timeline (2022)

 💾Retro Computing  Content type: Blog
tiny.cloud··Hacker News

Attention at the Theoretical Minimum: A Mathematics of Arrays Framework for Memory-Optimal Transformer Kernels

 🧠LLM  Content type: Academic
arxiv.org·

Dynamic Linear Attention

 🧠LLM  Content type: Academic
arxiv.org·

DeepSeek Made AI Cheap. Now It Needs Billions to Keep It Cheap.

 🚀Startups  Content type: News  Content type: Blog

From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data

 📊Statistics  Content type: Academic
arxiv.org·

Best-Known Sorting Networks

 🗄️Vector Databases

Overcoming Decoder Inconsistencies in Whisper for Dravidian and Low-Resource Languages

 🧠LLM  Content type: Academic
arxiv.org·

An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection

 🧠LLM  Content type: Academic
arxiv.org·

DxPTA: An Architecture Design Space Exploration with Optical Dataflow-guided Strategy for HW/SW Co-Design of Photonic Transformer Accelerators

 📐Semidefinite Programming  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help