Model welfare and open source
lesswrong.com·7h
⚡Incremental Computation
Flag this post
Weak-To-Strong Generalization
lesswrong.com·7h
∘Category Theory
Flag this post
Economics and Transformative AI (by Tom Cunningham)
lesswrong.com·11h
🔍AI Interpretability
Flag this post
Secretly Loyal AIs: Threat Vectors and Mitigation Strategies
lesswrong.com·1d
🔢Homomorphic Encryption
Flag this post
Reflections on 4 years of meta-honesty
lesswrong.com·4h
📮Message Queues
Flag this post
Halfhaven Digest #3
lesswrong.com·1d
📡RSS
Flag this post
Anthropic's Pilot Sabotage Risk Report
lesswrong.com·2d
👁️Observability
Flag this post
Agentic AI and Security
🚀MLOps
Flag this post
Vaccination against ASI
lesswrong.com·23h
📮Message Queues
Flag this post
Freewriting in my head, and overcoming the “twinge of starting”
lesswrong.com·1d
🗃️Zettelkasten
Flag this post
AISLE discovered three new OpenSSL vulnerabilities
lesswrong.com·2d
🦀Rust
Flag this post
Why Civilizations Are Unstable (And What This Means for AI Alignment)
lesswrong.com·3d
🔍AI Interpretability
Flag this post
Introducing Project Telos: Modeling, Measuring, and Intervening on Goal-directed Behavior in AI Systems
lesswrong.com·2d
🎯Reinforcement Learning
Flag this post
Summary and Comments on Anthropic's Pilot Sabotage Risk Report
lesswrong.com·2d
👁️Observability
Flag this post
Uncertain Updates: October 2025
lesswrong.com·3d
⚡Incremental Computation
Flag this post
Me consuming five different forms of media at once to minimize the chance of a thought occurring
lesswrong.com·1h
🌿Digital Gardens
Flag this post
Loading...Loading more...