Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Safety
🛡️ AI Safety
AI alignment, AI risk, existential risk, responsible AI
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
193
posts in
16.2
ms
🤖
AI
arXiv
·
1d
1 day ago
Affective
AI
Safety
: The Missing Piece in LLM
Safety
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Affective AI Safety: The Missing Piece in LLM Safety
⚖️
AI Governance
Science
·
5d
5 days ago
Researchers
caught in the crossfire as companies and government grapple over
AI
safety
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Researchers caught in the crossfire as companies and government grapple over AI safety
🎮
Gamification
medium.com
·
2d
2 days ago
Reward
hacking
in Reinforcement learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reward hacking in Reinforcement learning
⚖️
AI Ethics
stevekinney.com
·
5d
5 days ago
Some Thoughts on
AI
Safety
Covers
11 stories
See all stories this covers
including
Goodhart's Law
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Some Thoughts on AI Safety
⚖️
AI Governance
SiliconANGLE
·
2d
2 days ago
Nvidia introduces Halos for Robotics to bridge the physical
AI
safety
gap
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Nvidia introduces Halos for Robotics to bridge the physical AI safety gap
⚖️
AI Ethics
medium.com
·
5d
5 days ago
Ninety Percent of Physicians Trust Their Clinical
AI
. They Catch a Third of Its Dangerous Errors.
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Ninety Percent of Physicians Trust Their Clinical AI. They Catch a Third of Its Dangerous Errors.
⚖️
AI Regulation
E-International Relations
·
2d
2 days ago
Interview – Andrea Miotti
Covers
2 stories
See all stories this covers
including
When AI Builds Itself
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Interview – Andrea Miotti
🤖
AI
GitHub
·
2d
2 days ago
Open source
AI
projects from Banco Santander
Covered by
Interesting Engineering++
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Open source AI projects from Banco Santander
⚖️
AI Regulation
Financial Times
·
5d
5 days ago
Letter: Argentina’s
AI
fix widens the gap it is meant to close
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Letter: Argentina’s AI fix widens the gap it is meant to close
🤖
AI
Forbes
·
2d
2 days ago
OpenAI Tricks
AI
Into Revealing Its True Nature Prior To Being Unleashed Into The Real World
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for OpenAI Tricks AI Into Revealing Its True Nature Prior To Being Unleashed Into The Real World
⚖️
Tech Policy
tehnologijaviews.medium.com
·
6d
6 days ago
The Trump Administration’s Push to Block
AI
Jailbreaks: A
Safety
Measure or Political Theater?
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Trump Administration’s Push to Block AI Jailbreaks: A Safety Measure or Political Theater?
🤖
Machine Learning
arXiv
·
19h
19 hours ago
Are
Safety
Guarantees in Neural Networks Safe? How to Compute Trustworthy
Robustness
Certifications
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Are Safety Guarantees in Neural Networks Safe? How to Compute Trustworthy Robustness Certifications
🔬
ROM Hacking
medium.com
·
4d
4 days ago
What I Learned Studying Whether Fine-Tuning Breaks a Transformer’s “Copy
Mechanism
”
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for What I Learned Studying Whether Fine-Tuning Breaks a Transformer’s “Copy Mechanism”
⚖️
AI Governance
CNBC
·
6d
6 days ago
Synthesia CEO: Creating a coalition around an
AI
code of conduct will help build the
AI
future we all want
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Synthesia CEO: Creating a coalition around an AI code of conduct will help build the AI future we all want
⚖️
AI Regulation
Forward Future
·
6d
6 days ago
Anthropic Thinks “FOOM” Is Near
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Anthropic Thinks “FOOM” Is Near
⚖️
AI Ethics
medium.com
·
5d
5 days ago
Beyond
AI
Safety
:
Alignment
Through Positive Psychology
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Beyond AI Safety: Alignment Through Positive Psychology
🤖
AI
GitHub
·
1d
1 day ago
The Invisible Guardrail: How Commercial LLMs Enforce
Algorithmic
Paternalism
Discussed on
DEV
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Invisible Guardrail: How Commercial LLMs Enforce Algorithmic Paternalism
🤖
AI
arXiv
·
19h
19 hours ago
Can Language Model Agents be Helpful Circuit Explainers in
Mechanistic
Interpretability
?
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?
♟️
Game Theory
Bentham's Newsletter
·
6d
6 days ago
Effective Altruists Are Underestimating Politics
Discussed on
Substack
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Effective Altruists Are Underestimating Politics
🤖
LLMs
arXiv
·
19h
19 hours ago
Reinforcement Learning Towards Broadly and Persistently Beneficial Models
Covers
Reinforcement learning towards broadly and persistently beneficial models
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reinforcement Learning Towards Broadly and Persistently Beneficial Models
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report