Comparative study of Arabic Word Embeddings: Evaluation and Application

AZROUMAHLI Chaimae; El Younoussi Yacine; José F. Aldana Montes; Maciej Rybinski

Authors

AZROUMAHLI Chaimae ENSA Tetuan, Abdel Malek Essaâdi University, Morocco
El Younoussi Yacine ENSA Tetuan, Abdel Malek Essaâdi University, Morocco
José F. Aldana Montes Khaos Research, Universidad de Malaga, Spain
Maciej Rybinski Khaos Research, Universidad de Malaga, Spain

Keywords:

Word Embeddings, Evaluation methods, NER, Document Classification, POS-tagging, Sentiment Analysis

Abstract

Word Embeddings models have achieved impressive results for a variety of NLP tasks in the last few years which, consequently, led to the increasing interest in creating more adequate word representations to capture semantic and syntactic features for certain languages. This study aims to train different Arabic Word Embeddings models in a supervised framework and to investigate the impact that the models’ hyper-parameters have on downstream Arabic NLP tasks and applications. In this paper, we present the cleaning and the pre-processing steps followed to create three different training datasets. We provide a detailed description of the steps we followed for creating 180 different Word Embeddings models using Word2Vec and CBOW. To evaluate the quality of the Word Embeddings, we perform several extrinsic and intrinsic evaluation methods. The preliminary results show that these models can create meaningful Word Embeddings despite the higher-logical complexity of the Arabic language. We have concluded that the source of the training dataset is significant to the type of information captured by the model. Moreover, the hyperparameters of training architecture and the nature of an NLP task is significant to its accuracy.

Downloads

Download data is not yet available.

Comparative study of Arabic Word Embeddings: Evaluation and Application

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Information