Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation (opens in new tab)

Large language models (LLMs) have become an effective tool for synthetic data generation, including for low-resource languages, where generated data can improve downstream task performance. Current best-performing approaches typically rely on few-shot prompting with target-language examples, which increases inference costs and may reduce diversity through lexical anchoring. In this work, we investigate activation steering as an alternative for l...

Read the original article