Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🛡️ AI Safety
AI alignment, model safety, guardrails, red teaming
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
78
posts in
9.4
ms
The Anthropic Case: Do We Need an
Ethical
Framework for Interacting with
AI
?
🔓
Open Source AI
421.news
·
4d
A civic grammar for
AI
rights
🏢
LLM Adoption
science.org
·
6d
Pairwise Preference
Reward
and Group-Based Diversity Enhancement for
Superior
Open-Ended Generation
🎯
RLHF
arxiv.org
·
2d
What Do You Actually Want?
🤖
AI Agents
dekodiert.de
·
4d
·
Hacker News
Large language
model
safety
research wins Rath Award at Spring 2026 Graduate Commencement
🗣️
NLP
minesnewsroom.com
·
5d
Goal-Conditioned Supervised Learning for LLM Fine-Tuning
🎯
LLM Finetuning
arxiv.org
·
2d
Aether Mind – on-chain neural cognitive engine on a quantum-VQE L1
💻
Local AI
huggingface.co
·
5d
·
Hacker News
Inference-Time Scaling in Diffusion
Models
through Iterative Partial Refinement
💻
Local AI
arxiv.org
·
1d
Weak-to-Strong Elicitation via Mismatched Wrong Drafts
🧠
LLMs
arxiv.org
·
2d
Mirror Descent-Type Algorithms for the Variational Inequality Problem with Functional Constraints
🚀
LLM Deployment
arxiv.org
·
2d
Synthetic Persona Pretraining:
Alignment
from Token Zero
🧪
Synthetic Data
lesswrong.com
·
15h
A No-Defense Defense Against Gradient-Based
Adversarial
Attacks on ML-NIDS: Is Less More?
🧠
LLMs
arxiv.org
·
2d
DEFLECT:
Delay-Robust
Execution via Flow-matching Likelihood-Estimated Counterfactual Tuning for VLA
Policies
💻
Local AI
arxiv.org
·
1d
When and Why
Adversarial
Training
Improves PINNs: A Neural Tangent Kernel Perspective
🧠
LLMs
arxiv.org
·
3d
ClaHF: A Human Feedback-inspired Reinforcement Learning Framework for Improving Classification Tasks
🎯
RLHF
arxiv.org
·
2d
DVMap: Fine-Grained Pluralistic
Value
Alignment
via High-Consensus
Demographic-Value
Mapping
⚡
Quantization
arxiv.org
·
6d
ACE: Self-Evolving LLM Coding Framework via
Adversarial
Unit Test Generation and Preference
Optimization
🧠
LLMs
arxiv.org
·
2d
Universal
Adversarial
Triggers
🔍
RAG
arxiv.org
·
2d
PROWL: Prioritized Regret-Driven
Optimization
for World
Model
Learning
💻
Local AI
arxiv.org
·
1d
SafeDiffusion-R1: Online
Reward
Steering for
Safe
Diffusion
Post-Training
🎯
RLHF
arxiv.org
·
2d
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help