Model welfare and open source
lesswrong.comยท7h
โกIncremental Computation
Flag this post
Evidence on language model consciousness
lesswrong.comยท1d
๐AI Interpretability
Flag this post
Weak-To-Strong Generalization
lesswrong.comยท7h
โCategory Theory
Flag this post
An intro to the Tensor Economics blog
lesswrong.comยท3d
๐ขHomomorphic Encryption
Flag this post
Introducing Project Telos: Modeling, Measuring, and Intervening on Goal-directed Behavior in AI Systems
lesswrong.comยท2d
๐ฏReinforcement Learning
Flag this post
Economics and Transformative AI (by Tom Cunningham)
lesswrong.comยท11h
๐AI Interpretability
Flag this post
FTL travel and scientific realism
lesswrong.comยท3h
๐๏ธObservability
Flag this post
2025 Unofficial LW Community Census, Request for Comments
lesswrong.comยท4h
๐ฟDigital Gardens
Flag this post
Secretly Loyal AIs: Threat Vectors and Mitigation Strategies
lesswrong.comยท1d
๐ขHomomorphic Encryption
Flag this post
Summary and Comments on Anthropic's Pilot Sabotage Risk Report
lesswrong.comยท2d
๐๏ธObservability
Flag this post
LLM Hallucinations: An Internal Tug of War
lesswrong.comยท3d
๐AI Interpretability
Flag this post
[CS 2881r] Can We Prompt Our Way to Safety? Comparing System Prompt Styles and Post-Training Effects on Safety Benchmarks
lesswrong.comยท5d
๐Dependent Types
Flag this post
When Will AI Transform the Economy?
lesswrong.comยท4d
๐AI Interpretability
Flag this post
Loading...Loading more...