ShadowLogic: Backdoors in Any Whitebox LLM
arxiv.org·7h
Flag this post

Title:ShadowLogic: Backdoors in Any Whitebox LLM

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are widely deployed across various applications, often with safeguards to prevent the generation of harmful or restricted content. However, these safeguards can be covertly bypassed through adversarial modifications to the computational graph of a model. This work highlights a critical security vulnerability in computational graph-based LLM formats, demonstrating that widely used deployment pipelines may be susceptible to obscured backdoors. We introduce ShadowLogic, a method for creating a backdoor in a white-box LLM by injecting an uncensoring vector into its computational graph represe…

Similar Posts

Loading similar posts...