Breaking Minds, Breaking Systems: Jailbreaking Large Language Models via Human-like Psychological Manipulation
arxiv.org·4d
🧪CBOR Fuzzing
Preview
Report Post

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have gained considerable popularity and protected by increasingly sophisticated safety mechanisms. However, jailbreak attacks continue to pose a critical security threat by inducing models to generate policy-violating behaviors. Current paradigms focus on input-level anomalies, overlooking that the model’s internal psychometric state can be systematically manipulated. To address this, we introduce Psychological Jailbreak, a new jailbreak attack paradigm that exposes a stateful psychological attack surface in LLMs, where attackers exploit the manipulation of a model’s psychological state across interactions. Building on this insight, we pro…

Similar Posts

Loading similar posts...