Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning
arxiv.org·22h

View PDF HTML (experimental)

Abstract:While Code Language Models (CLMs) have demonstrated superior performance in software engineering tasks such as code generation and summarization, recent empirical studies reveal a critical privacy vulnerability: these models exhibit unintended memorization of sensitive training data, enabling verbatim reproduction of confidential information when specifically prompted. To address this issue, several approaches, including training data de-duplication and differential privacy augmentation, have been proposed. However, these methods require full-model retraining for deployed CLMs, which incurs substantial computational costs. In this paper, we aim to answer the following …

Similar Posts

Loading similar posts...