A toy model of corrigibility
lesswrong.comยท1d
โกIncremental Computation
Flag this post
Why Is Printing So Bad?
lesswrong.comยท2d
โWriting
Flag this post
You think you are in control?
lesswrong.comยท6h
๐ Self-Hosting
Flag this post
Doom from a Solution to the Alignment Problem
lesswrong.comยท1d
โกIncremental Computation
Flag this post
Weak-To-Strong Generalization
lesswrong.comยท1d
โCategory Theory
Flag this post
Me consuming five different forms of media at once to minimize the chance of a thought occurring
lesswrong.comยท1d
๐ฟDigital Gardens
Flag this post
Secretly Loyal AIs: Threat Vectors and Mitigation Strategies
lesswrong.comยท3d
๐ขHomomorphic Encryption
Flag this post
Freewriting in my head, and overcoming the โtwinge of startingโ
lesswrong.comยท2d
๐๏ธZettelkasten
Flag this post
FTL travel and scientific realism
lesswrong.comยท1d
๐๏ธObservability
Flag this post
There's some chance oral herpes is pretty bad for you?
lesswrong.comยท18h
๐ฆRust
Flag this post
Halfhaven Digest #3
lesswrong.comยท3d
๐กRSS
Flag this post
Evidence on language model consciousness
lesswrong.comยท2d
๐AI Interpretability
Flag this post
Reflections on 4 years of meta-honesty
lesswrong.comยท1d
๐ฎMessage Queues
Flag this post
Summary and Comments on Anthropic's Pilot Sabotage Risk Report
lesswrong.comยท4d
๐๏ธObservability
Flag this post
Genius is Not About Genius
lesswrong.comยท5d
โCategory Theory
Flag this post
Vaccination against ASI
lesswrong.comยท2d
๐ฎMessage Queues
Flag this post
Why Civilizations Are Unstable (And What This Means for AI Alignment)
lesswrong.comยท5d
๐AI Interpretability
Flag this post
Loading...Loading more...