RAID: Robust Algorithm for stemmIng text Document

Kabil BOUKHARI; Mohamed Nazih OMRI

RAID: Robust Algorithm for stemmIng text Document

Authors

Kabil BOUKHARI MARS Unit of Research, Department of computer sciences Faculty of sciences of Monastir, University of Monastir, 5000, Tunisia
Mohamed Nazih OMRI MARS Unit of Research, Department of computer sciences Faculty of sciences of Monastir, University of Monastir, 5000, Tunisia

Keywords:

Robust algorithm, Stemming, Documents indexing, Information retrieval.

Abstract

In this work, we propose a robust algorithm for automatic indexing unstructured Document. It can detect the most relevant words in an unstructured document. This algorithm is based on two main modules: the first module ensures the processing of compound words and the second allows the detection of the endings of the words that have not been taken into consideration by the approaches presented in literature. The proposed algorithm allows the detection and removal of suffixes and enriches the basis of suffixes by eliminating the suffixes of compound words. We have experienced our algorithm on two bases of words: a standard collection of terms and a medical corpus. The results show the remarkable effectiveness of our algorithm compared to others presented in related works.

Downloads

Download data is not yet available.

Downloads

Published

2016-01-01

How to Cite

Kabil BOUKHARI, & Mohamed Nazih OMRI. (2016). RAID: Robust Algorithm for stemmIng text Document. International Journal of Computer Information Systems and Industrial Management Applications, 8, 12. Retrieved from https://cspub-ijcisim.org/index.php/ijcisim/article/view/324

Download Citation

Issue

Vol. 8 (2016)

Section

Original Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.