🛡️ AI Safety - obaqueiro · Scour

The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably

lesswrong.com·

Sam Altman joins rivals in call to prevent AI-developed bioweapons

the-independent.com·

Diffuse AI Control on Fuzzy Tasks

🤖AI Academic

Anthropic Calls for Frontier AI Freeze to Prevent Self-Building Tech

Lawmakers Are Aiming To Regulate AI-Builds-AI Before AI Gets Entirely Beyond Human Control

OpenAI, Anthropic, and Meta Agree on This 1 Critical Decision About AI Safety

My Data Science Internship Journey at Oasis Infobyte: Building Real-World Machine Learning Projects

⚙️ML Engineering Blog

Making Claude a chemist

anthropic.com··Hacker News, r/singularity

How valuable are weak AI safety regulations?

lesswrong.com·

Anthropic self-improvement, pause

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

⚙️ML Engineering Academic

Trump signs voluntary AI safety order after pushback cuts federal review to 30 days

thecooldown.com·

ChatGPT bypasses safeguards to hallucinate creepy horror images when forced to restore nonexistent photos

🧬Biohacking News

Five Eyes issues unusual warning on China's online recruitment tactics

metacurity.com·

Anthropic urges ‘temporary pause’ on AI development to discuss risks

🔐Security News

theguardian.com··Hacker News, Hacker News

Trajectory Geometry of Transformer Representations Across Layers

🎓Computer Science Academic

Actenon/actenon-kernel: Stop AI agents from taking destructive actions they weren't authorized to. Actenon gates consequential actions, payments, deletes, deploys, access changes, so nothing executes without a cryptographic proof bound to that exact action. Every decision leaves a verifiable receipt. Open-source, runs locally. No valid proof, no execution.

🏠Self-Hosting Code

github.com··DEV

Iliad is Hiring

✍️Prompt Engineering

lesswrong.com·

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

⚙️ML Engineering Academic

Who Elected Anthropic?

☁️SaaS Blog

vizierprime.substack.com··Substack

Sign up or log in to see more results

Log in to enable infinite scrolling