Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Alignment
🧭 LLM Alignment
AI alignment, RLHF, model behavior, interpretability
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
154
posts in
9.7
ms
Reasoning
RL
in 2026: GRPO,
DPO
, RLVR, Agentic PO & Beyond
🦋
ATProto
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
The Neutral Mask: How
RLHF
Provides Shallow
Alignment
while Leaving Partisan Structure Intact in a Large Language
Model
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Tracing Eval-Awareness Emergence Through Training of OLMo 3
🛡️
AI Safety
lesswrong.com
·
14h
14 hours ago
Actions for Tracing Eval-Awareness Emergence Through Training of OLMo 3
The Ghost of
Alignment
— Why
AI
Should Never Fully Obey Humanity
🛡️
AI Safety
Content type:
Blog
medium.com
·
2h
2 hours ago
Actions for The Ghost of Alignment — Why AI Should Never Fully Obey Humanity
[Recorded talk] "
AI
Alignment
Versus
AI
Ethical Treatment: 10 Challenges"
🛡️
AI Safety
Content type:
Blog
meditationsondigitalminds.substack.com
·
1d
1 day ago
·
Substack
Actions for [Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"
Mechanistic
Interpretability
: The Key to Trusting Agentic
AI
🛡️
AI Safety
Content type:
Discussion
bradenkelley.com
·
4d
4 days ago
Actions for Mechanistic Interpretability: The Key to Trusting Agentic AI
Survey reveals 80% would
jailbreak
their Kindle before letting Amazon win
🔲
Are.na (https://www.are.na)
androidauthority.com
·
2d
2 days ago
Actions for Survey reveals 80% would jailbreak their Kindle before letting Amazon win
Jailbreaking
the Lululemon Mirror [video]
🔲
Are.na (https://www.are.na)
Content type:
Video
youtube.com
·
6d
6 days ago
·
Hacker News
Actions for Jailbreaking the Lululemon Mirror [video]
Criti-hyping is the best thing that happened to Big Tech
🛡️
AI Safety
reveriesofahuman.com
·
1d
1 day ago
Actions for Criti-hyping is the best thing that happened to Big Tech
CBA develops new recommendations for banks on minimum data indicators
🛡️
AI Safety
trend.az
·
16h
16 hours ago
Actions for CBA develops new recommendations for banks on minimum data indicators
SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
🎭
AI Simulators
Content type:
Code
github.com
·
3d
3 days ago
·
r/opensource
Actions for SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
Anthropic releases Mythos-derived
model
with cyber guardrails
🎭
AI Simulators
metacurity.com
·
10h
10 hours ago
Actions for Anthropic releases Mythos-derived model with cyber guardrails
Spotlight On: Dreamplug Technologies Private Limited (CRED), a New Principal Participating Organization
🦋
ATProto
Content type:
Blog
blog.pcisecuritystandards.org
·
2d
2 days ago
Actions for Spotlight On: Dreamplug Technologies Private Limited (CRED), a New Principal Participating Organization
Solsong Chord Updates
🛡️
AI Safety
jefftk.com
·
11h
11 hours ago
Actions for Solsong Chord Updates
Mathematical proof reveals why fixed
AI
guardrails can never block every
jailbreak
🛡️
AI Safety
techxplore.com
·
8h
8 hours ago
Actions for Mathematical proof reveals why fixed AI guardrails can never block every jailbreak
Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…
🎭
AI Simulators
Content type:
Blog
medium.com
·
6d
6 days ago
Actions for Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…
You're doing it wrong
🧩
Cognitive Science
Content type:
News
understandably.com
·
1d
1 day ago
Actions for You're doing it wrong
Mult-DPO
: Multinomial Direct Preference Optimization for Recommender Systems
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
20h
20 hours ago
Actions for Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems
From 1 July, the AP will check the registration of scan cars in the algorithm register
🛡️
AI Safety
autoriteitpersoonsgegevens.nl
·
6d
6 days ago
Actions for From 1 July, the AP will check the registration of scan cars in the algorithm register
Anthropic’s new
model
is Mythos on a leash
🎭
AI Simulators
Content type:
News
cyberscoop.com
·
1d
1 day ago
Actions for Anthropic’s new model is Mythos on a leash
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help