Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Safety
🛡️ AI Safety
AI alignment, AI safety, interpretability, AGI risk, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
298
posts in
6.6
ms
The Best Politician In A Generation
🗂️
Personal Wikis
Content type:
News
Content type:
Blog
benthams.substack.com
·
1d
1 day ago
·
Substack
Actions for The Best Politician In A Generation
ML4Good Summer 2026 Bootcamps - Applications Open!
🎲
Procedural Generation
lesswrong.com
·
7h
7 hours ago
Actions for ML4Good Summer 2026 Bootcamps - Applications Open!
The technical community can't be the main character in
AI
safety
anymore
⚙️
History of Technology
substackcdn.com
·
3d
3 days ago
·
Substack
Actions for The technical community can't be the main character in AI safety anymore
Clearing Up The Confusion About What Anthropic Really Said On Globally Pausing The Unrelenting Race Toward
AI
That Builds
AI
🔍
Interpretability
forbes.com
·
2d
2 days ago
Actions for Clearing Up The Confusion About What Anthropic Really Said On Globally Pausing The Unrelenting Race Toward AI That Builds AI
AI
Scientist Bengio on Engineering
Safer
Agents
🔍
Interpretability
Content type:
News
bloomberg.com
·
5d
5 days ago
Actions for AI Scientist Bengio on Engineering Safer Agents
Microsoft updates
AI
agent security taxonomy with seven new failure
modes
🔌
API Design
4sysops.com
·
5d
5 days ago
Actions for Microsoft updates AI agent security taxonomy with seven new failure modes
A Unifying Lens on Reward Uncertainty in
RLHF
🔍
Interpretability
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
Complex Objects: Why
AI
Safety
Can’t Just Think in Posts
🔍
Interpretability
Content type:
Blog
medium.com
·
5d
5 days ago
Actions for Complex Objects: Why AI Safety Can’t Just Think in Posts
AI
Safety
— Genuine or Performative?
🔍
Interpretability
Content type:
Blog
medium.com
·
4d
4 days ago
Actions for AI Safety — Genuine or Performative?
Updating the taxonomy of failure
modes
in agentic
AI
systems: What a year of
red
teaming taught us
⚙️
Backend Dev
microsoft.com
·
5d
5 days ago
Actions for Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us
Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
🔍
Information Retrieval
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
AI
Scientist Bengio: Building Systems We Don't Know How to Control
🔍
Interpretability
Content type:
News
bloomberg.com
·
5d
5 days ago
Actions for AI Scientist Bengio: Building Systems We Don't Know How to Control
I Started an
AI
Safety
Research Org and Think These 7 Things Matter
🛠️
DevOps
lesswrong.com
·
3h
3 hours ago
Actions for I Started an AI Safety Research Org and Think These 7 Things Matter
In
policy
paper, OpenAI diverges from White House on
AI
safety
🔍
Interpretability
siliconangle.com
·
6d
6 days ago
Actions for In policy paper, OpenAI diverges from White House on AI safety
What Will Canada’s
AI
Strategy Mean for Jobs and
Safety
?
🏗️
System Design
Content type:
News
thetyee.ca
·
5d
5 days ago
Actions for What Will Canada’s AI Strategy Mean for Jobs and Safety?
Diffuse
AI
Control on Fuzzy Tasks
🌐
Distributed Systems
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Diffuse AI Control on Fuzzy Tasks
AI
Red
Teaming
(OWASP top 10)
🗳️
Consensus Algorithms
Content type:
Blog
blog.gopenai.com
·
5d
5 days ago
Actions for AI Red Teaming (OWASP top 10)
How valuable are weak
AI
safety
regulations?
🔍
Interpretability
lesswrong.com
·
2d
2 days ago
Actions for How valuable are weak AI safety regulations?
Controversial smut as an
AI
alignment
issue
🎲
Procedural Generation
Content type:
News
Content type:
Blog
thingofthings.substack.com
·
5d
5 days ago
·
Substack
Actions for Controversial smut as an AI alignment issue
Hidden Consensus:Preference-Validity Compression in Human Feedback
🌐
Distributed Systems
Content type:
Academic
arxiv.org
·
14h
14 hours ago
Actions for Hidden Consensus:Preference-Validity Compression in Human Feedback
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help