Introducing Project Telos: Modeling, Measuring, and Intervening on Goal-directed Behavior in AI Systems
lesswrong.comยท3d
๐ŸŽฏReinforcement Learning
Flag this post
Reflections on 4 years of meta-honesty
lesswrong.comยท1d
๐Ÿ“ฎMessage Queues
Flag this post
Vaccination against ASI
lesswrong.comยท1d
๐Ÿ“ฎMessage Queues
Flag this post
Halfhaven Digest #3
lesswrong.comยท2d
๐Ÿ“กRSS
Flag this post
Why I Transitioned: A Case Study
lesswrong.comยท1d
โˆ˜Category Theory
Flag this post
LLM Hallucinations: An Internal Tug of War
lesswrong.comยท4d
๐Ÿ”AI Interpretability
Flag this post
Asking Paul Fussell for Writing Advice
lesswrong.comยท2d
โˆ˜Category Theory
Flag this post
Seattle Secular Solstice 2025 โ€“ Dec 20th
lesswrong.comยท1d
๐ŸŒฟDigital Gardens
Flag this post
AISLE discovered three new OpenSSL vulnerabilities
lesswrong.comยท3d
๐Ÿฆ€Rust
Flag this post
Model Parameters as a Steganographic Private Channel
lesswrong.comยท6d
๐Ÿ”ขHomomorphic Encryption
Flag this post
A Bayesian Explanation of Causal Models
lesswrong.comยท6d
๐Ÿ”—Dependent Types
Flag this post
Centralization begets stagnation
lesswrong.comยท3d
๐ŸŒDistributed Systems
Flag this post
Temporarily Losing My Ego
lesswrong.comยท5d
๐Ÿ Self-Hosting
Flag this post
Anthropic's Pilot Sabotage Risk Report
lesswrong.comยท3d
๐Ÿ‘๏ธObservability
Flag this post