Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Safety
๐ก๏ธ AI Safety
AI alignment, AI safety, interpretability, AGI risk, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
297
posts in
7.4
ms
Sixteen schemes for
AI
safety
ย
๐ฒ
LSM Trees
lesswrong.com
ยท
6d
6 days ago
Actions for Sixteen schemes for AI safety
Advanced
AI
Safety
Addendum
ย
๐
API Design
cloud.google.com
ยท
1d
1 day ago
ยท
Hacker News
Actions for Advanced AI Safety Addendum
AI
red
teaming
comes of age
ย
๐
Distributed Systems
csoonline.com
ยท
10h
10 hours ago
Actions for AI red teaming comes of age
The Neutral Mask: How
RLHF
Provides Shallow
Alignment
while Leaving Partisan Structure Intact in a Large Language
Model
ย
๐ฌ
Mech Interp
ย
Content type:
Academic
arxiv.org
ยท
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Matador-og/huntbot:
AI
offensive security harness for bug bounty, pentesting,
red
teaming
.
ย
โ
Formal Verification
ย
Content type:
Code
github.com
ยท
13h
13 hours ago
ยท
Hacker News
Actions for Matador-og/huntbot: AI offensive security harness for bug bounty, pentesting, red teaming.
Autonomous Pentesting vs Autonomous
Red
Teaming
: What's the Difference?
ย
๐
Distributed Systems
malware.news
ยท
3d
3 days ago
Actions for Autonomous Pentesting vs Autonomous Red Teaming: What's the Difference?
Musk's xAI accused of illegally firing engineer who raised
safety
concerns
ย
๐ง
Operating Systems
ย
Content type:
News
ca.finance.yahoo.com
ยท
2h
2 hours ago
Actions for Musk's xAI accused of illegally firing engineer who raised safety concerns
My Oslo Freedom Forum Keynote: Authoritarians and
AI
ย
๐
Observability
ย
Content type:
Blog
redpacket.substack.com
ยท
1d
1 day ago
ยท
Substack
Actions for My Oslo Freedom Forum Keynote: Authoritarians and AI
Claude Fable 5 and new
AI
safety
fables
ย
๐
Distributed Systems
ย
Content type:
News
interconnects.ai
ยท
20h
20 hours ago
ยท
Hacker News
Actions for Claude Fable 5 and new AI safety fables
Mechanistic
Interpretability
: The Key to Trusting Agentic
AI
ย
๐
Interpretability
ย
Content type:
Discussion
bradenkelley.com
ยท
4d
4 days ago
Actions for Mechanistic Interpretability: The Key to Trusting Agentic AI
[Recorded talk] "
AI
Alignment
Versus
AI
Ethical Treatment: 10 Challenges"
ย
โ
Formal Verification
ย
Content type:
Blog
meditationsondigitalminds.substack.com
ยท
1d
1 day ago
ยท
Substack
Actions for [Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"
AI
giant says its own
models
could soon improve themselves โ and now it wants a global pause
ย
๐
Interpretability
thecooldown.com
ยท
7h
7 hours ago
Actions for AI giant says its own models could soon improve themselves โ and now it wants a global pause
Germany to create
AI
safety
agency
ย
๐ฒ
Procedural Generation
techxplore.com
ยท
1d
1 day ago
Actions for Germany to create AI safety agency
AI
policy
scholar Dean W. Ball
shares
a text from his mother recommending he focus on frontier
AI
policy
ย
๐ง
Operating Systems
digg.com
ยท
6d
6 days ago
Actions for AI policy scholar Dean W. Ball shares a text from his mother recommending he focus on frontier AI policy
Assessing the Polyglot Chatbot: Multilingual
Safety
in
AI
Systems
ย
โ
Formal Verification
cdt.org
ยท
23h
23 hours ago
Actions for Assessing the Polyglot Chatbot: Multilingual Safety in AI Systems
The Stoic Path to Actual
AI
Safety
: Three Practical Steps for Industry and Individuals
ย
๐
Distributed Systems
oodaloop.com
ยท
2d
2 days ago
Actions for The Stoic Path to Actual AI Safety: Three Practical Steps for Industry and Individuals
From oversight to coercion: How authoritarian governments are twisting
AI
safety
to get tech companies to fall in line
ย
๐
Interpretability
theconversation.com
ยท
6d
6 days ago
Actions for From oversight to coercion: How authoritarian governments are twisting AI safety to get tech companies to fall in line
Representation-Aware Advantage Estimation: Your Reward
Model
Provides More Than A Scalar Output
ย
๐
Interpretability
ย
Content type:
Academic
arxiv.org
ยท
15h
15 hours ago
Actions for Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output
Criti-hyping is the best thing that happened to Big Tech
ย
๐
Distributed Systems
reveriesofahuman.com
ยท
1d
1 day ago
Actions for Criti-hyping is the best thing that happened to Big Tech
OpenAI says it will comply with Trump's order to let the government review
AI
models
before release
ย
โ
Formal Verification
qz.com
ยท
5d
5 days ago
Actions for OpenAI says it will comply with Trump's order to let the government review AI models before release
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help