A Hybrid Translation Pipeline for Low-Resource Dialects: Translating English to a Ahirani Using NLLB and Rule-Based Adaptation
DOI:
https://doi.org/10.70917/ijcisim-2026-2618Keywords:
Low-resource languages, Machine translation, Ahirani dialect, Rule-based translation, Dictionary-based translation, NLLB, Multilingual models, Neural machine translation, Dialect adaptation, hybrid approachAbstract
Machine translation for low-resource languages remains a significant challenge due to the unavailability of large-scale parallel corpora and limited linguistic resources. This paper presents an exploratory study that compares two translation approaches for a low-resource dialect: a basic rule-based method using a custom English-to-dialect dictionary and a neural machine translation approach using Meta AI’s pretrained No Language Left Behind (NLLB-200) model. The NLLB model was used to translate from English to standard Marathi, followed by a post-processing step to adapt the output to the dialect using dictionary-based substitutions. This stepwise pipeline allows us to observe the differences in output quality, grammatical correctness, and contextual accuracy. The results highlight the strengths and limitations of both approaches, offering insight into the feasibility and challenges of applying neural models to dialectal translation in low-resource settings.