🛡️ AI Safety - vabsw · Scour

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

🤖AI Academic

Installing the Seat on the Machine

📜Tech Policy

cafebedouin.org·

SecureBio Detection is Hiring Software Engineers

Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

👁️Computer Vision Academic

One Year of PauseAI UK

🎬Documentaries

lesswrong.com·

Emergent alignment and the projectability of ethical personas

🔄Transformers Academic

DiffoR: A Unified Continuous Generative Framework for Universal Ordinal Regression

👁️Computer Vision Academic

Book of Cron Job

lesswrong.com·

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

🤖LLMs Academic

Contra Dance at LessOnline

⚖class politics

Wearable Single-Lead ECG Detects Fine-Grained Structural Heart Disease Through Echo-Report Supervision

🩺Health Academic

Less-relevant results

Towards a Formal Scientific Epistemology

lesswrong.com·

Coming Around To Political Donations

⚖class politics

How authoritarian governments twist AI safety to coerce tech companies to comply

📜Tech Policy

fastcompany.com·

Temporal Preference Concepts and their Functions in a Large Language Model

🤖LLMs Academic

Towards Evaluating the Robustness of Visual State Space Models

👁️Computer Vision Academic

PerceptTwin: Semantic Scene Reconstruction for Iterative LLM Planning and Verification

🎮Reinforcement Learning Academic

[Paper] Dictionary Learning Identifiability for Understanding SAEs

lesswrong.com·

Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads

🤖LLMs Academic

Unsupervised Pattern Analysis in Japanese Veterinary Toxicology: A Regulatory-Compliant Framework for Cross-Species Risk Assessment

👁️Computer Vision Academic

Sign up or log in to see more results

Log in to enable infinite scrolling