GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash
lesswrong.com·29m
Incremental Computation
Flag this post
Open-weight training practices and implications for CoT monitorability
lesswrong.com·6h
🎯Reinforcement Learning
Flag this post
Get Ready for Clojure, GPU, and AI in 2026 with CUDA 13.0
dragan.rocks·5d·
Discuss: Hacker News
🦀Rust
Flag this post
To improve Rationality, create Situations
lesswrong.com·1d
🗃️Zettelkasten
Flag this post
Solving a problem with mindware
lesswrong.com·1d
🗃️Zettelkasten
Flag this post
A toy model of corrigibility
lesswrong.com·1d
Incremental Computation
Flag this post
Introducing Project Telos: Modeling, Measuring, and Intervening on Goal-directed Behavior in AI Systems
lesswrong.com·4d
🎯Reinforcement Learning
Flag this post
Weak-To-Strong Generalization
lesswrong.com·2d
Category Theory
Flag this post
Red Heart
lesswrong.com·23h
🦀Rust
Flag this post
You think you are in control?
lesswrong.com·22h
🏠Self-Hosting
Flag this post
Economics and Transformative AI (by Tom Cunningham)
lesswrong.com·2d
🔍AI Interpretability
Flag this post
How Powerful AI Gets Cheap
lesswrong.com·23h
🔢Homomorphic Encryption
Flag this post