Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Safety
🛡️ AI Safety
Specific
alignment, AI risk, RLHF, model safety
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
74
posts in
4.3
ms
FoldSAE: Learning to Steer Protein Folding Through Sparse Representations
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for FoldSAE: Learning to Steer Protein Folding Through Sparse Representations
Epiplexity
🤖
AI Agents
Content type:
Blog
andys.blog
·
6d
6 days ago
·
Hacker News
Actions for Epiplexity
‘We Did Our Best!’ | Meghan O’Gieblyn
✍️
Prompt Engineering
nybooks.com
·
6d
6 days ago
Actions for ‘We Did Our Best!’ | Meghan O’Gieblyn
Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in
Large
Language
Models
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language Models
Book of Cron Job
✍️
Prompt Engineering
lesswrong.com
·
6d
6 days ago
Actions for Book of Cron Job
What Do People Actually Want From
AI
? Mapping Preference Plurality
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for What Do People Actually Want From AI? Mapping Preference Plurality
Contra Dance at LessOnline
🔷
Ethereum
jefftk.com
·
3d
3 days ago
Actions for Contra Dance at LessOnline
Shared Semantics, Divergent
Mechanisms
: Unsupervised Feature Discovery by Aligning Semantics and
Mechanisms
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms
The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of
Large
Language
Models
🧠
LLMs
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models
Installing the Seat on the Machine
✍️
Prompt Engineering
cafebedouin.org
·
6d
6 days ago
Actions for Installing the Seat on the Machine
Ablation-Reversible Heads Don't Transfer: A Stress Test for
Mechanistic
Role Claims in Transformers
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers
My research agenda and work
🧠
LLMs
lesswrong.com
·
5d
5 days ago
Actions for My research agenda and work
(VERY PARTIAL) CROSSPOST: ALEX HEATH: SubStack Is Opening Up to
AI
: Interviewing CEO Chris Best
🧠
LLMs
Content type:
News
Content type:
Blog
braddelong.substack.com
·
5d
5 days ago
·
Substack
Actions for (VERY PARTIAL) CROSSPOST: ALEX HEATH: SubStack Is Opening Up to AI: Interviewing CEO Chris Best
Interactions Between Crosscoder Features: A Compact Proofs Perspective
🧠
LLMs
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Interactions Between Crosscoder Features: A Compact Proofs Perspective
Towards a Formal Scientific Epistemology
🤖
AI Agents
lesswrong.com
·
1d
1 day ago
Actions for Towards a Formal Scientific Epistemology
EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant
RLHF
Platforms
🧠
LLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms
Coming Around To Political Donations
💰
Long-Term Investing
jefftk.com
·
4d
4 days ago
Actions for Coming Around To Political Donations
Position: Don't Just "Fix it in Post": A Science of
AI
Must Study Training Dynamics
🤖
AI Agents
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics
Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for
Interpreting
and Steering Vision Transformers
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers
Subspace-Aware Sparse Autoencoders for Effective
Mechanistic
Interpretability
🧠
LLMs
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help