Temporarily Losing My Ego
lesswrong.comยท6d
๐ Self-Hosting
Flag this post
Anthropic's Pilot Sabotage Risk Report
lesswrong.comยท3d
๐๏ธObservability
Flag this post
[CS 2881r] Can We Prompt Our Way to Safety? Comparing System Prompt Styles and Post-Training Effects on Safety Benchmarks
lesswrong.comยท6d
๐Dependent Types
Flag this post
Decision theory when you can't make decisions
lesswrong.comยท1d
๐ฏReinforcement Learning
Flag this post
Seattle Secular Solstice 2025 โ Dec 20th
lesswrong.comยท2d
๐ฟDigital Gardens
Flag this post
Brainstorming Food on the Cheap+Healthy+Convenient+Edible Frontier
lesswrong.comยท6d
๐ฟDigital Gardens
Flag this post
Emergent Introspective Awareness in Large Language Models
lesswrong.comยท4d
๐๏ธZettelkasten
Flag this post
Strategy-Stealing Argument Against AI Dealmaking
lesswrong.comยท2d
๐ฏReinforcement Learning
Flag this post
Transactional method for non-transactional relationship: Relationship as a Common-pool Resource problem
lesswrong.comยท6d
๐ธ๏ธGraph Databases
Flag this post
OpenAI Moves To Complete Potentially The Largest Theft In Human History
lesswrong.comยท3d
๐Open Source
Flag this post
No title
lesswrong.comยท6d
๐AI Interpretability
Flag this post
Supervillain Monologues Are Unrealistic
lesswrong.comยท2d
โWriting
Flag this post
Why Civilizations Are Unstable (And What This Means for AI Alignment)
lesswrong.comยท5d
๐AI Interpretability
Flag this post
Workshop on Post-AGI Economics, Culture, and Governance
lesswrong.comยท5d
๐Open Source
Flag this post
Q2 AI Benchmark Results: Pros Maintain Clear Lead
lesswrong.comยท6d
๐AI Interpretability
Flag this post
Summary and Comments on Anthropic's Pilot Sabotage Risk Report
lesswrong.comยท3d
๐๏ธObservability
Flag this post
Verified Relational Alignment: A Framework for Robust AI Safety Through Collaborative Trust
lesswrong.comยท6d
๐AI Interpretability
Flag this post
On The Conservation of Rights
lesswrong.comยท4d
๐ขHomomorphic Encryption
Flag this post
Loading...Loading more...