Agentic Monitoring for AI Control
lesswrong.comยท5d
๐Ÿ”AI Interpretability
Flag this post
FTL travel and scientific realism
lesswrong.comยท3h
โˆ˜Category Theory
Flag this post
Weak-To-Strong Generalization
lesswrong.comยท7h
โˆ˜Category Theory
Flag this post
Model welfare and open source
lesswrong.comยท7h
โšกIncremental Computation
Flag this post
Introducing Project Telos: Modeling, Measuring, and Intervening on Goal-directed Behavior in AI Systems
lesswrong.comยท2d
๐ŸŽฏReinforcement Learning
Flag this post
Summary and Comments on Anthropic's Pilot Sabotage Risk Report
lesswrong.comยท2d
๐Ÿ”AI Interpretability
Flag this post
Economics and Transformative AI (by Tom Cunningham)
lesswrong.comยท11h
๐Ÿ”AI Interpretability
Flag this post
Uncertain Updates: October 2025
lesswrong.comยท3d
โšกIncremental Computation
Flag this post
Agentic AI and Security
martinfowler.comยท4dยท
๐Ÿš€MLOps
Flag this post
Secretly Loyal AIs: Threat Vectors and Mitigation Strategies
lesswrong.comยท1d
๐Ÿ”ขHomomorphic Encryption
Flag this post
Reflections on 4 years of meta-honesty
lesswrong.comยท4h
๐Ÿ“ฎMessage Queues
Flag this post
Decision theory when you can't make decisions
lesswrong.comยท11h
๐ŸŽฏReinforcement Learning
Flag this post
Get Ready for Clojure, GPU, and AI in 2026 with CUDA 13.0
dragan.rocksยท2dยท
Discuss: Hacker News
๐Ÿฆ€Rust
Flag this post
Anthropic's Pilot Sabotage Risk Report
lesswrong.comยท2d
๐Ÿ”งDevOps
Flag this post
Me consuming five different forms of media at once to minimize the chance of a thought occurring
lesswrong.comยท1h
๐ŸŒฟDigital Gardens
Flag this post
Why Civilizations Are Unstable (And What This Means for AI Alignment)
lesswrong.comยท3d
๐Ÿ”AI Interpretability
Flag this post
An intro to the Tensor Economics blog
lesswrong.comยท3d
๐Ÿ”ขHomomorphic Encryption
Flag this post
Evidence on language model consciousness
lesswrong.comยท1d
๐Ÿ”AI Interpretability
Flag this post