Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Safety
🛡️ AI Safety
alignment, RLHF, red teaming, AI risk
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
138
posts in
7.0
ms
The Neutral Mask: How
RLHF
Provides Shallow
Alignment
while Leaving Partisan Structure Intact in a Large Language
Model
✨
Generative AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Mechanistic
Interpretability
: The Key to Trusting Agentic
AI
🤖
Agentic AI
Content type:
Discussion
bradenkelley.com
·
4d
4 days ago
Actions for Mechanistic Interpretability: The Key to Trusting Agentic AI
White House restricts public
AI
testing to prioritize national security
⚖️
AI Regulation
4sysops.com
·
6h
6 hours ago
Actions for White House restricts public AI testing to prioritize national security
Sequent:
scale
and automation for higher confidence in
alignment
🧠
AGI
lesswrong.com
·
5h
5 hours ago
Actions for Sequent: scale and automation for higher confidence in alignment
[Recorded talk] "
AI
Alignment
Versus
AI
Ethical Treatment: 10 Challenges"
🏢
Enterprise AI
Content type:
Blog
meditationsondigitalminds.substack.com
·
1d
1 day ago
·
Substack
Actions for [Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"
Model
Evaluations
: Prove Your Routing Policy Actually Works
🔓
Open Source AI
Content type:
Blog
digitalocean.com
·
6d
6 days ago
Actions for Model Evaluations: Prove Your Routing Policy Actually Works
Anthropic releases Mythos-derived
model
with cyber guardrails
🔓
Open Source AI
metacurity.com
·
7h
7 hours ago
Actions for Anthropic releases Mythos-derived model with cyber guardrails
Criti-hyping is the best thing that happened to Big Tech
✍️
Prompt Engineering
reveriesofahuman.com
·
1d
1 day ago
Actions for Criti-hyping is the best thing that happened to Big Tech
How To Keep Giant A.I. Robots From Killing Us All
🧠
AGI
dailywire.com
·
4d
4 days ago
Actions for How To Keep Giant A.I. Robots From Killing Us All
KiloBench - Because Your Benchmark Score Doesn't Pay the Bill
✍️
Prompt Engineering
Content type:
News
Content type:
Blog
blog.kilo.ai
·
2d
2 days ago
Actions for KiloBench - Because Your Benchmark Score Doesn't Pay the Bill
SONAR Sitrep: How nuclear verdicts are reshaping carrier economics
⚖️
AI Regulation
freightwaves.com
·
1h
1 hour ago
Actions for SONAR Sitrep: How nuclear verdicts are reshaping carrier economics
Ask HN: What happens when humans become as dumb as
AI
?
🤖
Agentic AI
Content type:
Discussion
news.ycombinator.com
·
6d
6 days ago
·
Hacker News
Actions for Ask HN: What happens when humans become as dumb as AI?
Lawmakers Are Aiming To Regulate
AI-Builds-AI
Before
AI
Gets Entirely Beyond Human Control
⚖️
AI Regulation
forbes.com
·
1d
1 day ago
Actions for Lawmakers Are Aiming To Regulate AI-Builds-AI Before AI Gets Entirely Beyond Human Control
Controversial smut as an
AI
alignment
issue
🧠
AGI
Content type:
News
Content type:
Blog
thingofthings.substack.com
·
5d
5 days ago
·
Substack
Actions for Controversial smut as an AI alignment issue
My Data Science Internship Journey at Oasis Infobyte: Building Real-World Machine Learning Projects
👨💻
Coding Assistants
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for My Data Science Internship Journey at Oasis Infobyte: Building Real-World Machine Learning Projects
A new chapter of efficient foundation
models
for medical imaging
🔓
Open Source AI
techcommunity.microsoft.com
·
7h
7 hours ago
Actions for A new chapter of efficient foundation models for medical imaging
Quote of the day by Nvidia CEO, Jensen Huang: "I appreciate that many of us grew up and enjoyed science fiction, but it's not helpful" — on quantifying the
existential
risks
posed by
AI
🧠
AGI
techradar.com
·
3d
3 days ago
Actions for Quote of the day by Nvidia CEO, Jensen Huang: "I appreciate that many of us grew up and enjoyed science fiction, but it's not helpful" — on quantifying the existential risks posed by AI
Why LLMs (still) lack taste
✨
Generative AI
beyondtheprior.com
·
1d
1 day ago
·
Hacker News
Actions for Why LLMs (still) lack taste
Hidden Consensus:Preference-Validity Compression in Human Feedback
✨
Generative AI
Content type:
Academic
arxiv.org
·
16h
16 hours ago
Actions for Hidden Consensus:Preference-Validity Compression in Human Feedback
Anish-185/Production-Line-Performance-Checker
🏢
Enterprise AI
Content type:
Code
github.com
·
2d
2 days ago
·
r/coding
Actions for Anish-185/Production-Line-Performance-Checker
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help