Revisiting the Effectiveness of Automatic N-Gram Rule Generation for Spelling Normalization in Filipino

Published in EMNLP 2022, Workshop on Simple and Efficient Natural Language Processing, 2022

We explore a spelling/slang correction task in Filipino on a curated dataset, and demonstrate that an n-gram model can outperform augmented deep learning methods and Google Translate’s spelling correction feature. The n-gram model has the benefit of (1) requiring little training time and compute power (it “trains” in a second on a CPU!) and (2) is inherently interpretable, allowing users to troubleshoot the model.

Download paper here