Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Safety
🛡️ AI Safety
Alignment, Interpretability, Adversarial Examples, Ethics
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
58
posts in
9.6
ms
Advanced
AI
Safety
Addendum
⚖️
AI Governance
cloud.google.com
·
1d
1 day ago
·
Hacker News
Actions for Advanced AI Safety Addendum
Subspace-Aware Sparse Autoencoders for Effective Mechanistic
Interpretability
🧠
LLM
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability
Claude Fable 5 and new
AI
safety
fables
🧠
LLM
Content type:
News
interconnects.ai
·
1d
1 day ago
·
Hacker News
Actions for Claude Fable 5 and new AI safety fables
The technical community can't be the main character in
AI
safety
anymore
⚖️
AI Governance
substackcdn.com
·
3d
3 days ago
·
Substack
Actions for The technical community can't be the main character in AI safety anymore
The Architecture of Syntropy: A Blueprint for
AI
, Psychology, and Systems Design
⚖️
AI Ethics
hackernoon.com
·
1d
1 day ago
Actions for The Architecture of Syntropy: A Blueprint for AI, Psychology, and Systems Design
Show HN: GitHub Copilot port of
Anthropic
's
AI
vulnerability discovery harness
🔬
Anthropic
Content type:
Code
github.com
·
2d
2 days ago
·
Hacker News
Actions for Show HN: GitHub Copilot port of Anthropic's AI vulnerability discovery harness
ZEC drops 30% after
Anthropic
AI
finds Zcash counterfeit vulnerability
🔬
Anthropic
Content type:
News
tradingview.com
·
5d
5 days ago
·
Hacker News
Actions for ZEC drops 30% after Anthropic AI finds Zcash counterfeit vulnerability
VFUSE: Virulent Feature Understanding with Sparse autoEncoders
🎨
Generative AI
Content type:
Academic
arxiv.org
·
19h
19 hours ago
Actions for VFUSE: Virulent Feature Understanding with Sparse autoEncoders
When Attribution Patching Lies: Diagnosis and a Second-Order Correction
🔬
Anthropic
Content type:
Academic
arxiv.org
·
19h
19 hours ago
Actions for When Attribution Patching Lies: Diagnosis and a Second-Order Correction
Interactions Between Crosscoder Features: A Compact Proofs Perspective
🔬
Anthropic
Content type:
Academic
arxiv.org
·
19h
19 hours ago
Actions for Interactions Between Crosscoder Features: A Compact Proofs Perspective
Adversarial
Attack
and Disturbance Detection by Hadamard-Coded Output Representations for Object Detection and Semantic Segmentation
👁️
Multimodal AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Adversarial Attack and Disturbance Detection by Hadamard-Coded Output Representations for Object Detection and Semantic Segmentation
Adversarial
Attacks
Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
👁️
Multimodal AI
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
Diffuse
AI
Control on Fuzzy Tasks
⚖️
AI Governance
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Diffuse AI Control on Fuzzy Tasks
Trajectory Geometry of Transformer Representations Across Layers
🔬
Anthropic
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Trajectory Geometry of Transformer Representations Across Layers
Towards Evaluating the
Robustness
of Visual State Space Models
🛡️
AI Security
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Towards Evaluating the Robustness of Visual State Space Models
Adversarial
Robustness
of Activation Steering in Large Language Models
🧠
LLM
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Adversarial Robustness of Activation Steering in Large Language Models
Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers
🔬
Anthropic
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers
Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads
👁️
Multimodal AI
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads
Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms
🧠
LLM
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms
Hybrid
Adversarial
Defence for Natural Language Understanding Tasks
🛡️
AI Security
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Hybrid Adversarial Defence for Natural Language Understanding Tasks
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help