Accent Classification Using Machine Learning Techniques: A Review

Sarah Jassim; Husam Ali Abdulmohsin

doi:10.70917/ijcisim-2025-0028

Authors

Sarah Jassim Department of Computer Sciences, College of Science, University of Baghdad
Husam Ali Abdulmohsin Department of Computer Sciences, College of Science, University of Baghdad

DOI:

https://doi.org/10.70917/ijcisim-2025-0028

Keywords:

Accent classification, Automatic speech recognition, Deep learning, Traditional machine learning

Abstract

Accent is a person's distinct manner of speaking a particular language. It dramatically influences communication by producing pronunciation variations, which makes it challenging for automatic speech recognition (ASR) systems to understand spoken language accurately. The growing need for more accurate speech recognition technology means that improving machines' capability to classify and recognize accents becomes an essential challenge in speech processing. In response to this problem, this paper reviews previous studies on accent classification models. It discusses the principal methodologies used in this research, including datasets, preprocessing techniques, feature extraction, evaluation metrics and classification methods based on traditional machine learning (TML) and deep learning (DL) techniques utilized for accent recognition. The review includes journal articles and conference proceedings published between 2015 and 2025, emphasizing recent years. Relevant articles were sourced from leading academic databases and platforms, including Scopus, IEEE, Springer, MDPI, Google Scholar, and ResearchGate. The study concludes by identifying key research gaps and proposing future directions to advance accent recognition systems, offering valuable guidance for addressing current challenges and exploring innovative methodologies. A comparative analysis shows that the k-NN is the most effective traditional machine learning (TML) classifier. Among DL models, the pre-trained xResNet18 model outperforms other deep learning (DL) models when applied to well-structured English accent datasets while CNN achieves higher accuracy for datasets with diverse English accents but relatively small dataset sizes. Additionally, the fine-tuned transformer Wav2Vec2 achieves higher overall accuracy using a balanced and diverse dataset of six English accents, demonstrating strong performance in raw audio-based accent classification.

Downloads

Download data is not yet available.

Accent Classification Using Machine Learning Techniques: A Review

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Information