Introducing Project Telos: Modeling, Measuring, and Intervening on Goal-directed Behavior in AI Systems
lesswrong.comยท2d
๐Ÿ”AI Interpretability
Flag this post
Economics and Transformative AI (by Tom Cunningham)
lesswrong.comยท1d
๐Ÿ”AI Interpretability
Flag this post
Weak-To-Strong Generalization
lesswrong.comยท20h
โˆ˜Category Theory
Flag this post
A Sketch of Helpfulness Theory With Equivocal Principals
lesswrong.comยท5d
๐Ÿ—ƒ๏ธZettelkasten
Flag this post
A toy model of corrigibility
lesswrong.comยท4h
โšกIncremental Computation
Flag this post
Decision theory when you can't make decisions
lesswrong.comยท1d
โšกIncremental Computation
Flag this post
Reason About Intelligence, Not AI
lesswrong.comยท3h
๐Ÿ”AI Interpretability
Flag this post
Human Values โ‰  Goodness
lesswrong.comยท3h
๐ŸŒฟDigital Gardens
Flag this post
Model welfare and open source
lesswrong.comยท20h
โšกIncremental Computation
Flag this post
Brainstorming 25 Questions I Am Interested In
lesswrong.comยท10h
๐Ÿ—ƒ๏ธZettelkasten
Flag this post
Secretly Loyal AIs: Threat Vectors and Mitigation Strategies
lesswrong.comยท1d
๐Ÿ”ขHomomorphic Encryption
Flag this post
Evidence on language model consciousness
lesswrong.comยท1d
๐Ÿ”AI Interpretability
Flag this post
Strategy-Stealing Argument Against AI Dealmaking
lesswrong.comยท1d
๐Ÿ”AI Interpretability
Flag this post
25 Que
lesswrong.comยท10h
๐Ÿ—ƒ๏ธZettelkasten
Flag this post
An intro to the Tensor Economics blog
lesswrong.comยท4d
๐Ÿ”ขHomomorphic Encryption
Flag this post
Doom from a Solution to the Alignment Problem
lesswrong.comยท6h
โšกIncremental Computation
Flag this post
A Very Simple Model of AI Dealmaking
lesswrong.comยท4d
๐Ÿ”AI Interpretability
Flag this post
Vaccination against ASI
lesswrong.comยท1d
๐Ÿ“ฎMessage Queues
Flag this post
Agentic AI and Security
martinfowler.comยท5dยท
๐Ÿš€MLOps
Flag this post