LLM safety

Feeds to Scour
SubscribedAll
Scoured 377 posts in 23.3 ms

Zero-Click IP Leak in a Privacy Search Engine: Indirect Prompt Injection & Silent Patching

 🛡️Red Teaming
infosecwriteups.com
·

Claude Fable 5: The "Safe" Mythos for Everyone

 🛡️Red Teaming
drkpxl.com·

Your AI Agent Can Read. That’s the Whole Problem.

 🛡️Red Teaming  Content type: Blog
medium.com·

Prompt injection still drives most agentic AI security failures in production

 🛡️Red Teaming
helpnetsecurity.com·

How I Gave My Security Blog Its Own AI Agent and an Attitude

 🛡️Red Teaming  Content type: Blog
medium.com
·

When Your AI Agent’s Memory Becomes a Security Liability

 🤖Agent Architectures  Content type: News  Content type: Blog
blog.checkpoint.com·

Autonomous Pentesting vs Autonomous Red Teaming: What's the Difference?

 🛡️Red Teaming
malware.news·

Claude Code vulnerability exposes developer credentials via prompt injection

 Automation
4sysops.com·

Trust No Skill: Integrity Verification for AI Agent Supply Chains

 🛡️Red Teaming  Content type: Blog

[Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"

 🎯AI Alignment  Content type: Blog

ChatGPT's new Lockdown Mode lets you disable web access and more to protect sensitive data from prompt injection

 🛡️Red Teaming
the-decoder.com
·

JailbreakOPT: Tool-Assisted Iterative Jailbreak Prompt Optimization

 🛡️Red Teaming  Content type: Academic
arxiv.org·

Security Flaw in Claude Code Illustrates the Risk of AI in Developer Workflows

 Automation
devops.com·

Guardian Angels: LLM Personalization for Productivity and Security

 🎯AI Alignment
gwern.net··Hacker News

ashp15205/guardian-runtime: A zero-latency, local-first runtime firewall for LLMs. Intercept every prompt and response locally to stop data leaks and runaway token costs.

 💭Context Management  Content type: Code

AI researcher claims he's bypassed Anthropic's Fable 5 guardrails

 🛡️Red Teaming
cointelegraph.com··Hacker News

OpenAI adds Lockdown Mode to ChatGPT to block data theft from prompt injection attacks

 🛡️Red Teaming  Content type: News
thenextweb.com·

Anthropic makes Fable 5's invisible safeguards visible after backlash

 🛡️Red Teaming
xcancel.com··Hacker News

Meet Hades: The malware that lies to AI security agents

 ✍️Prompt Engineering  Content type: News
infoworld.com··Hacker News

Indirect Prompt Injection remains a fundamental security challenge for AI

 🛡️Red Teaming  Content type: Blog
brave.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help