Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Alignment
🧭 LLM Alignment
AI alignment, RLHF, model behavior, interpretability
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
155
posts in
4.7
ms
Reasoning
RL
in 2026: GRPO,
DPO
, RLVR, Agentic PO & Beyond
🦋
ATProto
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
The Neutral Mask: How
RLHF
Provides Shallow
Alignment
while Leaving Partisan Structure Intact in a Large Language
Model
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Tracing Eval-Awareness Emergence Through Training of OLMo 3
🛡️
AI Safety
lesswrong.com
·
17h
17 hours ago
Actions for Tracing Eval-Awareness Emergence Through Training of OLMo 3
The Ghost of
Alignment
— Why
AI
Should Never Fully Obey Humanity
🛡️
AI Safety
Content type:
Blog
medium.com
·
5h
5 hours ago
Actions for The Ghost of Alignment — Why AI Should Never Fully Obey Humanity
[Recorded talk] "
AI
Alignment
Versus
AI
Ethical Treatment: 10 Challenges"
🛡️
AI Safety
Content type:
Blog
meditationsondigitalminds.substack.com
·
1d
1 day ago
·
Substack
Actions for [Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"
Mechanistic
Interpretability
: The Key to Trusting Agentic
AI
🛡️
AI Safety
Content type:
Discussion
bradenkelley.com
·
5d
5 days ago
Actions for Mechanistic Interpretability: The Key to Trusting Agentic AI
AdBreak –
Jailbreaking
the Kindle
🔲
Are.na (https://www.are.na)
kindlemodding.org
·
3h
3 hours ago
·
Hacker News
Actions for AdBreak – Jailbreaking the Kindle
Survey reveals 80% would
jailbreak
their Kindle before letting Amazon win
🔲
Are.na (https://www.are.na)
androidauthority.com
·
2d
2 days ago
Actions for Survey reveals 80% would jailbreak their Kindle before letting Amazon win
SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
🎭
AI Simulators
Content type:
Code
github.com
·
3d
3 days ago
·
r/opensource
Actions for SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
Criti-hyping is the best thing that happened to Big Tech
🛡️
AI Safety
reveriesofahuman.com
·
1d
1 day ago
Actions for Criti-hyping is the best thing that happened to Big Tech
CBA develops new recommendations for banks on minimum data indicators
🛡️
AI Safety
trend.az
·
19h
19 hours ago
Actions for CBA develops new recommendations for banks on minimum data indicators
Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…
🎭
AI Simulators
Content type:
Blog
medium.com
·
6d
6 days ago
Actions for Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…
Spotlight On: Dreamplug Technologies Private Limited (CRED), a New Principal Participating Organization
🦋
ATProto
Content type:
Blog
blog.pcisecuritystandards.org
·
2d
2 days ago
Actions for Spotlight On: Dreamplug Technologies Private Limited (CRED), a New Principal Participating Organization
Anthropic releases Mythos-derived
model
with cyber guardrails
🎭
AI Simulators
metacurity.com
·
14h
14 hours ago
Actions for Anthropic releases Mythos-derived model with cyber guardrails
Solsong Chord Updates
🛡️
AI Safety
jefftk.com
·
14h
14 hours ago
Actions for Solsong Chord Updates
Mult-DPO
: Multinomial Direct Preference Optimization for Recommender Systems
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems
From 1 July, the AP will check the registration of scan cars in the algorithm register
🛡️
AI Safety
autoriteitpersoonsgegevens.nl
·
6d
6 days ago
Actions for From 1 July, the AP will check the registration of scan cars in the algorithm register
You're doing it wrong
🧩
Cognitive Science
Content type:
News
understandably.com
·
1d
1 day ago
Actions for You're doing it wrong
Mathematical proof reveals why fixed
AI
guardrails can never block every
jailbreak
🛡️
AI Safety
techxplore.com
·
11h
11 hours ago
Actions for Mathematical proof reveals why fixed AI guardrails can never block every jailbreak
Anthropic’s new
model
is Mythos on a leash
🎭
AI Simulators
Content type:
News
cyberscoop.com
·
1d
1 day ago
Actions for Anthropic’s new model is Mythos on a leash
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help