Comparative study of Arabic Word Embeddings: Evaluation and Application
Keywords:
Word Embeddings, Evaluation methods, NER, Document Classification, POS-tagging, Sentiment AnalysisAbstract
Word Embeddings models have achieved impressive results for a variety of NLP tasks in the last few years which, consequently, led to the increasing interest in creating more adequate word representations to capture semantic and syntactic features for certain languages. This study aims to train different Arabic Word Embeddings models in a supervised framework and to investigate the impact that the models’ hyper-parameters have on downstream Arabic NLP tasks and applications. In this paper, we present the cleaning and the pre-processing steps followed to create three different training datasets. We provide a detailed description of the steps we followed for creating 180 different Word Embeddings models using Word2Vec and CBOW. To evaluate the quality of the Word Embeddings, we perform several extrinsic and intrinsic evaluation methods. The preliminary results show that these models can create meaningful Word Embeddings despite the higher-logical complexity of the Arabic language. We have concluded that the source of the training dataset is significant to the type of information captured by the model. Moreover, the hyperparameters of training architecture and the nature of an NLP task is significant to its accuracy.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 International Journal of Computer Information Systems and Industrial Management Applications
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.