REMIND: Input Loss Landscapes Reveal Residual Memorization in Post-Unlearning LLMs
arxiv.org·1d
Flag this post

View PDF HTML (experimental)

Abstract:Machine unlearning aims to remove the influence of specific training data from a model without requiring full retraining. This capability is crucial for ensuring privacy, safety, and regulatory compliance. Therefore, verifying whether a model has truly forgotten target data is essential for maintaining reliability and trustworthiness. However, existing evaluation methods often assess forgetting at the level of individual inputs. This approach may overlook residual influence present in semantically similar examples. Such influence can compromise privacy and lead to indirect information leakage. We propose REMIND (Residual Memorization In Neighborhood Dynamics), a nov…

Similar Posts

Loading similar posts...