PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech
machinelearning.apple.com·1d
Flag this post

AuthorsMichel Wong, Ali Alshehri, Sophia Kao, Haotian He

Text Normalization (TN) is a key preprocessing step in Text-to-Speech (TTS) systems, converting written forms into their canonical spoken equivalents. Traditional TN systems can exhibit high accuracy, but involve substantial engineering effort, are difficult to scale, and pose challenges to language coverage, particularly in low-resource settings. We propose PolyNorm, a prompt-based approach to TN using Large Language Models (LLMs), aiming to reduce the reliance on manually crafted rules and enable broader linguistic applicability with minimal human intervention. Additionally, we present a language-agnostic pipeline for automatic data curation and evaluation, designed to facilitate scalable experimentation across diverse la…

Similar Posts

Loading similar posts...