Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Safety
🛡️ AI Safety
AI alignment, model safety, guardrails, red teaming
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
255
posts in
6.2
ms
Sixteen schemes for
AI
safety
🛡️
Content Moderation
lesswrong.com
·
6d
6 days ago
Actions for Sixteen schemes for AI safety
AI
red
teaming
comes of age
🛡️
Content Moderation
csoonline.com
·
3h
3 hours ago
Actions for AI red teaming comes of age
[Recorded talk] "
AI
Alignment
Versus
AI
Ethical Treatment: 10 Challenges"
🏢
LLM Adoption
Content type:
Blog
meditationsondigitalminds.substack.com
·
1d
1 day ago
·
Substack
Actions for [Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"
Advanced
AI
Safety
Addendum
🛡️
Content Moderation
cloud.google.com
·
17h
17 hours ago
·
Hacker News
Actions for Advanced AI Safety Addendum
My Oslo Freedom Forum Keynote: Authoritarians and
AI
🛡️
Content Moderation
Content type:
Blog
redpacket.substack.com
·
1d
1 day ago
·
Substack
Actions for My Oslo Freedom Forum Keynote: Authoritarians and AI
Matador-og/huntbot:
AI
offensive security harness for bug bounty, pentesting,
red
teaming
.
🕸️
Knowledge Graphs
Content type:
Code
github.com
·
6h
6 hours ago
·
Hacker News
Actions for Matador-og/huntbot: AI offensive security harness for bug bounty, pentesting, red teaming.
Autonomous Pentesting vs Autonomous
Red
Teaming
: What's the Difference?
🤖
AI Agents
malware.news
·
3d
3 days ago
Actions for Autonomous Pentesting vs Autonomous Red Teaming: What's the Difference?
Criti-hyping is the best thing that happened to Big Tech
🛡️
Content Moderation
reveriesofahuman.com
·
1d
1 day ago
Actions for Criti-hyping is the best thing that happened to Big Tech
Claude Fable 5 and new
AI
safety
fables
🛡️
Content Moderation
Content type:
News
interconnects.ai
·
13h
13 hours ago
·
Hacker News
Actions for Claude Fable 5 and new AI safety fables
Mechanistic
Interpretability
: The Key to Trusting Agentic
AI
🤖
Agentic AI
Content type:
Discussion
bradenkelley.com
·
4d
4 days ago
Actions for Mechanistic Interpretability: The Key to Trusting Agentic AI
Learning to Attack and Defend: Adaptive
Red
Teaming
of Language
Models
via GRPO
🎯
RLHF
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO
Anthropic Launches Claude Fable 5: Mythos-Class
AI
With Cybersecurity
Guardrails
🔓
Open Source AI
securityweek.com
·
19h
19 hours ago
Actions for Anthropic Launches Claude Fable 5: Mythos-Class AI With Cybersecurity Guardrails
Germany to create
AI
safety
agency
🛡️
Content Moderation
techxplore.com
·
1d
1 day ago
Actions for Germany to create AI safety agency
The Best Politician In A Generation
🛡️
Content Moderation
Content type:
News
Content type:
Blog
benthams.substack.com
·
20h
20 hours ago
·
Substack
Actions for The Best Politician In A Generation
The technical community can't be the main character in
AI
safety
anymore
🛡️
Content Moderation
substackcdn.com
·
3d
3 days ago
·
Substack
Actions for The technical community can't be the main character in AI safety anymore
The Stoic Path to Actual
AI
Safety
: Three Practical Steps for Industry and Individuals
🛡️
Content Moderation
oodaloop.com
·
1d
1 day ago
Actions for The Stoic Path to Actual AI Safety: Three Practical Steps for Industry and Individuals
AI
Scientist Bengio on Engineering
Safer
Agents
🛡️
Content Moderation
Content type:
News
bloomberg.com
·
5d
5 days ago
Actions for AI Scientist Bengio on Engineering Safer Agents
OpenAI says it will comply with Trump's order to let the government review
AI
models
before release
🛡️
Content Moderation
qz.com
·
5d
5 days ago
Actions for OpenAI says it will comply with Trump's order to let the government review AI models before release
Meta’s
AI
Support Hack Is a Warning for Every
Team
Automating User Access
🤖
LLMs
Content type:
Discussion
langprotect.com
·
2d
2 days ago
·
DEV
Actions for Meta’s AI Support Hack Is a Warning for Every Team Automating User Access
The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably
🛡️
Content Moderation
lesswrong.com
·
2h
2 hours ago
Actions for The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help