Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Safety
🛡️ AI Safety
alignment, AI reliability, guardrails, responsible AI
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
61
posts in
6.4
ms
The Standard
Interpretable
Model: A general theory of
interpretable
machine learning to deductively design
interpretable
methods using Lagrangian
mechanics
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics
Less-relevant results
Towards a Formal Scientific Epistemology
🕸️
Distributed Systems
lesswrong.com
·
3d
3 days ago
Actions for Towards a Formal Scientific Epistemology
Interactions Between Crosscoder Features: A Compact Proofs Perspective
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Interactions Between Crosscoder Features: A Compact Proofs Perspective
Trajectory Geometry of Transformer Representations Across Layers
🧠
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Trajectory Geometry of Transformer Representations Across Layers
Sparse probes and murky physics: a case study of
interpretability
challenges in a foundation model for continuum dynamics
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics
The Chronicles of Radio Frequency Fingerprinting
📐
System Design
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for The Chronicles of Radio Frequency Fingerprinting
Shared Semantics, Divergent
Mechanisms
: Unsupervised Feature Discovery by Aligning Semantics and
Mechanisms
🧠
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms
FoldSAE: Learning to Steer Protein Folding Through Sparse Representations
🧠
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for FoldSAE: Learning to Steer Protein Folding Through Sparse Representations
Ablation-Reversible Heads Don't Transfer: A Stress Test for
Mechanistic
Role Claims in Transformers
🧠
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers
Adversarial
Attack and Disturbance Detection by Hadamard-Coded Output Representations for Object Detection and Semantic Segmentation
🧠
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Adversarial Attack and Disturbance Detection by Hadamard-Coded Output Representations for Object Detection and Semantic Segmentation
AI
Will Not Start a Nuclear War, but Humans Might
🤖
AI Engineering
Content type:
News
Content type:
Blog
aifrontiersmedia.substack.com
·
3d
3 days ago
·
Substack
Actions for AI Will Not Start a Nuclear War, but Humans Might
Position: Don't Just "Fix it in Post": A Science of
AI
Must Study Training Dynamics
🤖
AI Engineering
Content type:
Academic
arxiv.org
·
4d
4 days ago
·
Cited by 1 article
Actions for Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics
SciTrace: Trajectory-Aware
Safety
Reasoning for Scientific Discovery Agents
🤝
AI Agents
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for SciTrace: Trajectory-Aware Safety Reasoning for Scientific Discovery Agents
When CLIP Sees More, It Fights Back Harder: Multi-View Guided Adaptive Counterattacks for Test-Time
Adversarial
Robustness
🧠
LLMs
Content type:
Academic
arxiv.org
·
4d
4 days ago
Actions for When CLIP Sees More, It Fights Back Harder: Multi-View Guided Adaptive Counterattacks for Test-Time Adversarial Robustness
Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for
Interpreting
and Steering Vision Transformers
🤖
AI Engineering
Content type:
Academic
arxiv.org
·
4d
4 days ago
Actions for Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers
Stain-Aware Wavelet Regularization for Instant
Adversarial
Purification in Histopathology
🔍
RAG
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Stain-Aware Wavelet Regularization for Instant Adversarial Purification in Histopathology
Emergent
alignment
and the projectability of ethical personas
🧠
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Emergent alignment and the projectability of ethical personas
DiffoR: A Unified Continuous Generative Framework for Universal Ordinal Regression
🧠
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for DiffoR: A Unified Continuous Generative Framework for Universal Ordinal Regression
Wearable Single-Lead ECG Detects Fine-Grained Structural Heart Disease Through Echo-Report Supervision
🕸️
Distributed Systems
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Wearable Single-Lead ECG Detects Fine-Grained Structural Heart Disease Through Echo-Report Supervision
Personal-Values
Alignment
Tech: Some Initial Motivations
🤝
AI Agents
Content type:
News
Content type:
Blog
blog.danielsosebee.com
·
1d
1 day ago
·
Hacker News
Actions for Personal-Values Alignment Tech: Some Initial Motivations
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help