🛡️ AI Safety - bigkevuk · Scour

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

🔍RAG Academic

OpenAI, Anthropic, and Meta Agree on This 1 Critical Decision About AI Safety

⚙️AI Automation

The AI Ethics Brief #192: Canada Has a National AI Strategy. The Hard Questions Come Next.

🤖AI Agents News

brief.montrealethics.ai·

Anthropic Calls for Frontier AI Freeze to Prevent Self-Building Tech

Sam Altman joins rivals in call to prevent AI-developed bioweapons

⚙️AI Automation

the-independent.com·

How valuable are weak AI safety regulations?

lesswrong.com·

Anthropic self-improvement, pause

Trump signs voluntary AI safety order after pushback cuts federal review to 30 days

⚙️AI Automation

thecooldown.com·

Anthropic calls for pause of global AI development

⚙️AI Automation

techxplore.com·

Proxy Reward Internalization and Mechanistic Exploitation: A Learned Precursor to Reward Hacking and Its Generalization

✍️Prompt Engineering Academic

Making Claude a chemist

✍️Prompt Engineering

anthropic.com··Hacker News, r/singularity

Anthropic urges a way to pause AI development as risks grow with the tech advances

⚡AI Hardware News

independent.co.uk·

You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them

⚙️LLM Fine-tuning

lesswrong.com·

ChatGPT bypasses safeguards to hallucinate creepy horror images when forced to restore nonexistent photos

🧠PKM News

Diffuse AI Control on Fuzzy Tasks

🏠Local AI Academic

The crucial human component in computing and AI

⚡AI Hardware Academic

Actenon/actenon-kernel: Stop AI agents from taking destructive actions they weren't authorized to. Actenon gates consequential actions, payments, deletes, deploys, access changes, so nothing executes without a cryptographic proof bound to that exact action. Every decision leaves a verifiable receipt. Open-source, runs locally. No valid proof, no execution.

🤖AI Agents Code

github.com··DEV

The Exploit Always Wins

🤖AI Agents Blog

abhishek-shankar.com·

Anthropic proposes global development pause to mitigate recursive AI risks

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

⚙️LLM Fine-tuning Academic

Sign up or log in to see more results

Log in to enable infinite scrolling