AI Safety

Feeds to Scour
SubscribedAll
Scoured 185 posts in 6.0 ms

After backlash, Anthropic says its AI will now tell users when their request is being rejected or downgraded for national security concerns

 🌐AGI  Content type: News
fortune.com
·

New framework for auditing machine unlearning

 💬LLMs  Content type: Blog
research.google·

Learnings from starting an AI safety research team

 🧠AI Research
lesswrong.com·

new mantra just dropped

 ⚙️ROS
aphie.xyz·

Prompt injection still drives most agentic AI security failures in production

 ⚙️ROS
helpnetsecurity.com·

The crucial human component in computing and AI

 🌐AGI  Content type: Academic
news.mit.edu·

Anthropic pledges $200 million to research AI's economic impact as CEO suggests job loss solutions

 🌐AGI
techxplore.com·

Anthropic urges ‘temporary pause’ on AI development to discuss risks

 🌐AGI  Content type: News

Abdul El-Sayed calls for public ownership of AI, citing risk of ‘human demise’

 🌐AGI  Content type: News
bridgemi.com·

Anthropic Wants an AI Pause Button in 2026

 🌐AGI
memeburn.com·

ChatGPT bypasses safeguards to hallucinate creepy horror images when forced to restore nonexistent photos

 🏳️‍🌈LGBT Tech  Content type: News
digg.com·

AI CEOs Warn Congress Over Bioweapon Risks

 🌐AGI
memeburn.com·

Anthropic rankles users with safety-first Fable release

 🌐AGI  Content type: News  Content type: Reference
nbcnews.com·

Elon Musk endorses immigrant deportations before SpaceX IPO

 🏳️‍🌈LGBT Tech  Content type: News
mashable.com·

Anthropic's Model Naming, Extrapolated

 🌐AGI

Actenon/actenon-kernel: Stop AI agents from taking destructive actions they weren't authorized to. Actenon gates consequential actions, payments, deletes, deploys, access changes, so nothing executes without a cryptographic proof bound to that exact action. Every decision leaves a verifiable receipt. Open-source, runs locally. No valid proof, no execution.

 ⚙️ROS  Content type: Code
github.com··DEV

AI #172: The First Fable

 🌐AGI  Content type: Blog
thezvi.wordpress.com·

Diffuse AI Control on Fuzzy Tasks

 🧠AI Research  Content type: Academic
arxiv.org·

Grieving mother alleges ChatGPT failed to protect daughter in mental health crisis

 🏳️‍🌈LGBT Tech  Content type: News
the-independent.com·

Anthropic calls for global AI slowdown, says systems may outpace human control

 🌐AGI  Content type: News
france24.com·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help