🛡️ AI Safety - VgfMgscp9fdT · Scour

FoldSAE: Learning to Steer Protein Folding Through Sparse Representations

🧠LLMs Academic

Epiplexity

🤖AI Agents Blog

andys.blog··Hacker News

‘We Did Our Best!’ | Meghan O’Gieblyn

✍️Prompt Engineering

Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language Models

🧠LLMs Academic

Book of Cron Job

✍️Prompt Engineering

lesswrong.com·

What Do People Actually Want From AI? Mapping Preference Plurality

🧠LLMs Academic

Contra Dance at LessOnline

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

✍️Prompt Engineering Academic

The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models

🧠LLMs Academic

Installing the Seat on the Machine

✍️Prompt Engineering

cafebedouin.org·

Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers

✍️Prompt Engineering Academic

My research agenda and work

lesswrong.com·

(VERY PARTIAL) CROSSPOST: ALEX HEATH: SubStack Is Opening Up to AI: Interviewing CEO Chris Best

🧠LLMs News Blog

braddelong.substack.com

Interactions Between Crosscoder Features: A Compact Proofs Perspective

🧠LLMs Academic

Towards a Formal Scientific Epistemology

lesswrong.com·

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

🧠LLMs Academic

Coming Around To Political Donations

💰Long-Term Investing

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

🤖AI Agents Academic

Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

🧠LLMs Academic

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

🧠LLMs Academic

Sign up or log in to see more results

Log in to enable infinite scrolling