A toy model of corrigibility
lesswrong.comยท1d
โšกIncremental Computation
Flag this post
Why Is Printing So Bad?
lesswrong.comยท2d
โœWriting
Flag this post
You think you are in control?
lesswrong.comยท6h
๐Ÿ Self-Hosting
Flag this post
Doom from a Solution to the Alignment Problem
lesswrong.comยท1d
โšกIncremental Computation
Flag this post
Weak-To-Strong Generalization
lesswrong.comยท1d
โˆ˜Category Theory
Flag this post
Me consuming five different forms of media at once to minimize the chance of a thought occurring
lesswrong.comยท1d
๐ŸŒฟDigital Gardens
Flag this post
Secretly Loyal AIs: Threat Vectors and Mitigation Strategies
lesswrong.comยท3d
๐Ÿ”ขHomomorphic Encryption
Flag this post
Freewriting in my head, and overcoming the โ€œtwinge of startingโ€
lesswrong.comยท2d
๐Ÿ—ƒ๏ธZettelkasten
Flag this post
FTL travel and scientific realism
lesswrong.comยท1d
๐Ÿ‘๏ธObservability
Flag this post
There's some chance oral herpes is pretty bad for you?
lesswrong.comยท18h
๐Ÿฆ€Rust
Flag this post
Halfhaven Digest #3
lesswrong.comยท3d
๐Ÿ“กRSS
Flag this post
Evidence on language model consciousness
lesswrong.comยท2d
๐Ÿ”AI Interpretability
Flag this post
Reflections on 4 years of meta-honesty
lesswrong.comยท1d
๐Ÿ“ฎMessage Queues
Flag this post
Summary and Comments on Anthropic's Pilot Sabotage Risk Report
lesswrong.comยท4d
๐Ÿ‘๏ธObservability
Flag this post
Genius is Not About Genius
lesswrong.comยท5d
โˆ˜Category Theory
Flag this post
Vaccination against ASI
lesswrong.comยท2d
๐Ÿ“ฎMessage Queues
Flag this post
Why Civilizations Are Unstable (And What This Means for AI Alignment)
lesswrong.comยท5d
๐Ÿ”AI Interpretability
Flag this post