Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🛡️ AI Safety
AI alignment, model safety, guardrails, red teaming
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
76
posts in
15.8
ms
Sponsored: Building bankable, resilient data centers: From site to operation
🚀
LLM Deployment
datacenterdynamics.com
·
6d
Preference Instability in
Reward
Models
: Detection and Mitigation via Sparse Autoencoders
🎯
RLHF
arxiv.org
·
2d
On Humanity & Human Beings
🔓
Open Source AI
mhdempsey.substack.com
·
5d
·
Substack
Document-tuning instills durable animal compassion in LLMs (and generalizes to humans)
🏢
LLM Adoption
lesswrong.com
·
1h
Disney's near-perfect 10-part crime thriller saga is so good, you'll finish it in one weekend
📊
LLM Evaluation
polygon.com
·
3d
Less-relevant results
AI
essays
🤖
AI Agents
rhollick.wordpress.com
·
13h
GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction
🕸️
Knowledge Graphs
arxiv.org
·
2d
Compositional
Adversarial
Training
for
Robust
Visual Watermarking
🧠
LLMs
arxiv.org
·
2d
AI
emotions and
aligned
behavior
🎯
RLHF
lesswrong.com
·
2d
By 20 to 1, Americans Want the White House to
Safety
Test
AI
🔓
Open Source AI
ifstudies.org
·
2d
·
r/OpenAI
Four
AI
supply-chain attacks in 50 days exposed the release pipeline
red
teams
aren't covering
🤖
AI Agents
venturebeat.com
·
2d
rl for
red
teaming
:
training
models to attack and defend themselves
🎯
RLHF
castform.com
·
6d
·
Hacker News
Stitched
Value
Model
for Diffusion
Alignment
⚡
Quantization
arxiv.org
·
1d
Automated
Alignment
is Harder Than You Think
⚙️
Transformers
lesswrong.com
·
6d
Confidentiality is not security: Why the real
AI
runtime crisis Is the Authorization Gap
🤖
AI Agents
techradar.com
·
6d
PAIR: Prefix-Aware Internal
Reward
Model
for Multi-Turn Agent
Optimization
🎯
RLHF
arxiv.org
·
2d
The 'Mythos Moment'
🤖
AI Agents
profserious.substack.com
·
3d
·
Substack
Let's have more partial insiders.
🏢
LLM Adoption
lesswrong.com
·
1d
The Anthropic Case: Do We Need an
Ethical
Framework for Interacting with
AI
?
🔓
Open Source AI
421.news
·
4d
Pairwise Preference
Reward
and Group-Based Diversity Enhancement for
Superior
Open-Ended Generation
🎯
RLHF
arxiv.org
·
2d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help