🕳 LLM Vulnerabilities - inarcissuss · Scour

🧠LLM arXiv·

AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability

📊LLM Evaluation arXiv·

REALM: A Unified Red-Teaming Benchmark for Physical-World VLMs

💉Prompt Injection arXiv·

PixJail: Self-Evolving Paper-to-Pipeline Reproduction for Text-to-Image Jailbreak Evaluation

📊LLM Evaluation arXiv·

OTTER: A Red-Teaming System for Toxicity-Evading Jailbreak Prompt Optimization

🤖LLM, Agent arXiv·

LLM agent safety, multi-turn red-teaming, jailbreak benchmarks, adversarial robustness, safety-critical systems

Covered by DEV Community

💉Prompt Injection arXiv·

TROPT: An Open Framework for Unifying and Advancing Discrete Text Optimization

🧠LLM arXiv·

A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots

💉Prompt Injection arXiv·

BELLS-O: Evaluating the Operational Trade-offs of LLM Supervision Systems

💉Prompt Injection arXiv·

Scalable Hierarchical Attention Transformers for Multi-Turn Jailbreak Detection in Long Conversations

💉Prompt Injection arXiv·

Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems

Covered by DEV Community

💬LLM Prompting role-confusion.github.io·

A Theory of Why Prompt Injection Works

Covers 3 stories including Playwright MCP Server – Snapshot based – faster and more reliable than images

Covered by 8 sources including Simon Willison’s Weblog, Schneier on Security

Discussed on Hacker News and Lobsters

🛡️AI Security Schneier on Security·

Interesting Paper Exploring Prompt Injection

Covers 3 stories including A Theory of Why Prompt Injection Works

Log in to enable infinite scrolling