Diacritic Restoration for Low-Resource Indigenous Languages: Case Study with Bribri and Cook Islands M\=aori
arxiv.org·4d
📝Concrete Syntax
Preview
Report Post

View PDF HTML (experimental)

Abstract:We present experiments on diacritic restoration, a form of text normalization essential for natural language processing (NLP) tasks. Our study focuses on two extremely under-resourced languages: Bribri, a Chibchan language spoken in Costa Rica, and Cook Islands Māori, a Polynesian language spoken in the Cook Islands. Specifically, this paper: (i) compares algorithms for diacritics restoration in under-resourced languages, including tonal diacritics, (ii) examines the amount of data required to achieve target performance levels, (iii) contrasts results across varying resource conditions, and (iv) explores the related task of diacritic correction. We find that fi…

Similar Posts

Loading similar posts...