Introducing Project Telos: Modeling, Measuring, and Intervening on Goal-directed Behavior in AI Systems
lesswrong.comยท2d
๐AI Interpretability
Flag this post
Economics and Transformative AI (by Tom Cunningham)
lesswrong.comยท1d
๐AI Interpretability
Flag this post
Weak-To-Strong Generalization
lesswrong.comยท20h
โCategory Theory
Flag this post
A Sketch of Helpfulness Theory With Equivocal Principals
lesswrong.comยท5d
๐๏ธZettelkasten
Flag this post
A toy model of corrigibility
lesswrong.comยท4h
โกIncremental Computation
Flag this post
Decision theory when you can't make decisions
lesswrong.comยท1d
โกIncremental Computation
Flag this post
Reason About Intelligence, Not AI
lesswrong.comยท3h
๐AI Interpretability
Flag this post
Human Values โ Goodness
lesswrong.comยท3h
๐ฟDigital Gardens
Flag this post
Model welfare and open source
lesswrong.comยท20h
โกIncremental Computation
Flag this post
Brainstorming 25 Questions I Am Interested In
lesswrong.comยท10h
๐๏ธZettelkasten
Flag this post
Secretly Loyal AIs: Threat Vectors and Mitigation Strategies
lesswrong.comยท1d
๐ขHomomorphic Encryption
Flag this post
Evidence on language model consciousness
lesswrong.comยท1d
๐AI Interpretability
Flag this post
Strategy-Stealing Argument Against AI Dealmaking
lesswrong.comยท1d
๐AI Interpretability
Flag this post
25 Que
lesswrong.comยท10h
๐๏ธZettelkasten
Flag this post
An intro to the Tensor Economics blog
lesswrong.comยท4d
๐ขHomomorphic Encryption
Flag this post
Doom from a Solution to the Alignment Problem
lesswrong.comยท6h
โกIncremental Computation
Flag this post
A Very Simple Model of AI Dealmaking
lesswrong.comยท4d
๐AI Interpretability
Flag this post
Vaccination against ASI
lesswrong.comยท1d
๐ฎMessage Queues
Flag this post
Agentic AI and Security
๐MLOps
Flag this post
Loading...Loading more...