AI Safety

Feeds to Scour
SubscribedAll
Scoured 308 posts in 19.4 ms

The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably

 🤖AI
lesswrong.com·

Sam Altman joins rivals in call to prevent AI-developed bioweapons

 🤖AI
the-independent.com·

Diffuse AI Control on Fuzzy Tasks

 🤖AI  Content type: Academic
arxiv.org·

Anthropic Calls for Frontier AI Freeze to Prevent Self-Building Tech

 🔐Security
pymnts.com·

Lawmakers Are Aiming To Regulate AI-Builds-AI Before AI Gets Entirely Beyond Human Control

 🤖AI
forbes.com·

OpenAI, Anthropic, and Meta Agree on This 1 Critical Decision About AI Safety

 🤖AI
inc.com·

My Data Science Internship Journey at Oasis Infobyte: Building Real-World Machine Learning Projects

 ⚙️ML Engineering  Content type: Blog
medium.com·

Making Claude a chemist

 🔬Science

How valuable are weak AI safety regulations?

 🤖AI
lesswrong.com·

Anthropic self-improvement, pause

 🤖AI
manton.org·

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

 ⚙️ML Engineering  Content type: Academic
arxiv.org·

Trump signs voluntary AI safety order after pushback cuts federal review to 30 days

 🔐Security
thecooldown.com·

ChatGPT bypasses safeguards to hallucinate creepy horror images when forced to restore nonexistent photos

 🧬Biohacking  Content type: News
digg.com·

Five Eyes issues unusual warning on China's online recruitment tactics

 🔐Security
metacurity.com·

Anthropic urges ‘temporary pause’ on AI development to discuss risks

 🔐Security  Content type: News

Trajectory Geometry of Transformer Representations Across Layers

 🎓Computer Science  Content type: Academic
arxiv.org·

Actenon/actenon-kernel: Stop AI agents from taking destructive actions they weren't authorized to. Actenon gates consequential actions, payments, deletes, deploys, access changes, so nothing executes without a cryptographic proof bound to that exact action. Every decision leaves a verifiable receipt. Open-source, runs locally. No valid proof, no execution.

 🏠Self-Hosting  Content type: Code
github.com··DEV

Iliad is Hiring

 ✍️Prompt Engineering
lesswrong.com·

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

 ⚙️ML Engineering  Content type: Academic
arxiv.org·

Who Elected Anthropic?

 ☁️SaaS  Content type: Blog
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help