Accent Classification Using Machine Learning Techniques: A Review
DOI:
https://doi.org/10.70917/ijcisim-2025-0028Keywords:
Accent classification, Automatic speech recognition, Deep learning, Traditional machine learningAbstract
Accent is a person's distinct manner of speaking a particular language. It dramatically influences communication by producing pronunciation variations, which makes it challenging for automatic speech recognition (ASR) systems to understand spoken language accurately. The growing need for more accurate speech recognition technology means that improving machines' capability to classify and recognize accents becomes an essential challenge in speech processing. In response to this problem, this paper reviews previous studies on accent classification models. It discusses the principal methodologies used in this research, including datasets, preprocessing techniques, feature extraction, evaluation metrics and classification methods based on traditional machine learning (TML) and deep learning (DL) techniques utilized for accent recognition. The review includes journal articles and conference proceedings published between 2015 and 2025, emphasizing recent years. Relevant articles were sourced from leading academic databases and platforms, including Scopus, IEEE, Springer, MDPI, Google Scholar, and ResearchGate. The study concludes by identifying key research gaps and proposing future directions to advance accent recognition systems, offering valuable guidance for addressing current challenges and exploring innovative methodologies. A comparative analysis shows that the k-NN is the most effective traditional machine learning (TML) classifier. Among DL models, the pre-trained xResNet18 model outperforms other deep learning (DL) models when applied to well-structured English accent datasets while CNN achieves higher accuracy for datasets with diverse English accents but relatively small dataset sizes. Additionally, the fine-tuned transformer Wav2Vec2 achieves higher overall accuracy using a balanced and diverse dataset of six English accents, demonstrating strong performance in raw audio-based accent classification.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Sarah Jassim, Husam Ali Abdulmohsin

This work is licensed under a Creative Commons Attribution 4.0 International License.