🤖 Transformers - yfff · Scour

Claude Mythos Glasswing: Why AI Vuln Discovery Terrifies Me

🧠LLM Blog Discussion

Hasse Diagrams for Attention: A Partial Order Framework for Designing Transformer Masks

🧠LLM Academic

Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles

📊Statistics Academic

A Mean-Field Analysis of Multi-Head Self-Attention under Cross-Entropy Training

📐Optimization Theory Academic

See, Act, Correct: three levers for working with a code agent

🎮Reinforcement Learning Blog

blog.owulveryck.info··Hacker News, Hacker News

princezuda/-RequiemGPT-: Fully open source and open weights built and trained by fable five with one prompt. An experience in how AI actually works

🤖AI Code

github.com··Hacker News

Parallel Causal Associative Fields: Gated Sparse Memory for Long-Context Language Modeling

🎛️Control Systems Academic

Introducing the Third Generation of Apple’s Foundation Models

machinelearning.apple.com··Hacker News, r/apple

DMT: Demographic Conditioning, Morphology-Enhanced Transformer for Cuffless Blood Pressure Estimation from PPG Signals

📶Communications Academic

Beyond Item IDs: Scaling Short-Form-Video Recommendation via Semantic-Native Long Sequence Modeling

💬LLMs Academic

Human-Like Neural Nets by Catapulting

gwern.net··Hacker News

History of WYSIWYG editors and CMS: a timeline (2022)

💾Retro Computing Blog

tiny.cloud··Hacker News

Attention at the Theoretical Minimum: A Mathematics of Arrays Framework for Memory-Optimal Transformer Kernels

🧠LLM Academic

Dynamic Linear Attention

🧠LLM Academic

DeepSeek Made AI Cheap. Now It Needs Billions to Keep It Cheap.

🚀Startups News Blog

chinacompany.substack.com··Substack

From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data

📊Statistics Academic

Best-Known Sorting Networks

🗄️Vector Databases

bertdobbelaere.github.io··Hacker News

Overcoming Decoder Inconsistencies in Whisper for Dravidian and Low-Resource Languages

🧠LLM Academic

An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection

🧠LLM Academic

DxPTA: An Architecture Design Space Exploration with Optical Dataflow-guided Strategy for HW/SW Co-Design of Photonic Transformer Accelerators

📐Semidefinite Programming Academic

Log in to enable infinite scrolling