🛡️ AI Security - emschwartz · Scour

Mechanistic Steering of LLMs Reveals Layer-wise Feature Vulnerabilities in Adversarial Settings 🕳LLM Vulnerabilities

The (In)security Landscape of AI-Powered GitHub Actions (Part 2/2) 🔄GitHub Actions

Malicious AI Prompt Injection Attacks Increasing, but Sophistication Still Low: Google 💉Prompt Injection

securityweek.com·3d

Goodfire’s New Tool Lets Engineers See Inside a Language Model While It Is Still Being Trained and That Changes Everything About AI Safety 🛡️AI Safety

startupfortune.com·7h

6 Lessons Security Leaders Must Learn About AI and APIs 🔎AI Auditing

lab.wallarm.com·2d

AI-Augmented Social Engineering: When Trust Becomes a Control-Plane Risk 🛡️AI Safety

zenodo.org·5d·Hacker News

Free Interactive AI Security Training Library (OWASP-aligned, white-label friendly, SCORM-ready) 🔧Agent Tooling

github.com·2d·r/opensource

ML Safety Newsletter #20: AI Wellbeing, Classifier Jailbreaking and Honest Pushback Benchmarking 🛡️AI Safety

lesswrong.com·2d

Your AI Security Agents Are Forgetting What They Did, And That’s a Massive Vulnerability 🔓Hacking

extrahop.com·3d·r/netsec

AI security capabilities and the human side of vulnerability management 🔓Hacking

securityautopsy.com·2d·r/netsec

The Agentic AI Security Company 💻Coding Agents

straiker.ai·4d·Hacker News

AI companies should publish security assessments 🔎AI Auditing

lesswrong.com·3d

Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines 💻Coding Agents

Third Symposium on AIT & ML: AI Safety Applications 🛡️AI Safety

lesswrong.com·5d

From Stateless Queries to Autonomous Actions: A Layered Security Framework for Agentic AI Systems 💻Coding Agents

AI safety can be a Pascal's mugging even if p(doom) is high 🛡️AI Safety

lesswrong.com·5d

Evaluation of Prompt Injection Defenses in Large Language Models 💉Prompt Injection

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training 🛡️AI Safety

Malicious AI Prompt Injection Attacks Increasing, but Sophistication Still Low: Google 💉Prompt Injection

·2d

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents 🛡️AI Safety

Log in to enable infinite scrolling