🛡️ AI Security - hop1.ng.1357 · Scour

Mechanistic Steering of LLMs Reveals Layer-wise Feature Vulnerabilities in Adversarial Settings 🕳LLM Vulnerabilities

The Agentic AI Security Company 🔧Agent Tooling

straiker.ai·3d·Hacker News

Musk casts himself as AI's good guy in testimony vs. OpenAI 🤝Human-AI Collaboration

axios.com·19h·Hacker News

AI Wellbeing: Measuring and improving the functional pleasure and pain of AIs 🛡️AI Safety

ai-wellbeing.org·1d·Hacker News

AI-Augmented Social Engineering: When Trust Becomes a Control-Plane Risk 🤝Human-AI Collaboration

zenodo.org·4d·Hacker News

The Pious Little Delete Button 🤔Philosophy of Tech

gpt.gekko.de·2d·Hacker News

Adversarial Robustness of NTK Neural Networks 🛡️AI Safety

Raising AI by Lowering Expectations 🎭Claude

lesswrong.com·6d

Claude for Creative Work 🔌Claude Plugins

anthropic.com·2d·Hacker News, Hacker News

Dario Amodei, hype, AI safety, and the explosion of vibe-coded AI disasters 🕵️AI Agents

garymarcus.substack.com·3d·Substack

One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety 🤖LLM

Hot Research Topics in AI and ML in 2026 and Their Philosophical Connections 🕵️AI Agents

omseeth.github.io·5d·Hacker News

Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines 🛡️AI Safety

Jailbreaking a robot vacuum to run Tailscale and Valetudo 🔌Embedded Systems

tailscale.com·5d·Hacker News

pleasedodisturb/llm-safe-haven: The missing security guide for solo developers running autonomous AI coding agents 💉Prompt Injection

github.com·5h·Hacker News

From Stateless Queries to Autonomous Actions: A Layered Security Framework for Agentic AI Systems 🕹️Agentic AI

Behavioral security for AI agents, OS-level interception 🔧Agent Tooling

quintai.dev·20h·Hacker News

Evaluation of Prompt Injection Defenses in Large Language Models 💉Prompt Injection

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training 🛡️AI Safety

An update on our election safeguards 🛡️Anthropic PBC

anthropic.com·6d·Hacker News

Log in to enable infinite scrolling