Unsupervised Stemmer to Improve Rule Based Morph Analyzer

KVN Sunitha; N.Kalyani

Authors

KVN Sunitha
N.Kalyani

Abstract

Telugu is an Indian language spoken by more than 50 million people in the country. Language is very rich in literature, and it requires advancements in computational approaches. Applications like machine translation, speech recognition, speech synthesis and information retrieval need a powerful morphological generator to give morphological forms of nouns and verbs. The existing Telugu morphological analyzer (TMA) is rule based, the performance of it is further improved by our Novel approach which provides an Unsupervised Stemmer that gives information about possible decompositions of the word inflected by many morphemes. Using these possible decompositions the root word could be extracted for those words which were initially not recognized by rule based morphological analyzer. The experiment is conducted on CII Telugu corpus and the improvement in the performance is checked by the rule based morphological analyzer developed by LTRC group. In this present work we present an unsupervised stemmer for improving the performance of Telugu rule based morph analyzer. The main advantage is, increase in performance of rule based from 77% to 84.2% for words which are in hundreds. It can still be improved if the corpus is increased.

Downloads

Download data is not yet available.

Unsupervised Stemmer to Improve Rule Based Morph Analyzer

Authors

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Information