Clustering Techniques for Establishing Inflectionally Similar Groups of Stems

Authors

  • Zacharias Detorakis Inst. for Language and Speech Processing, 6 Artemidos & Epidavrou Str.
  • George Tambouratzis Inst. for Language and Speech Processing, 6 Artemidos & Epidavrou Str.

Keywords:

Agglomerative clustering, Hamming distance, inflectional paradigm, cluster proximity, cluster validity

Abstract

This article presents a hierarchical clustering algorithm aimed at creating groups of stems with similar characteristics. The resulting groups (clusters) are expected to comprise stems belonging to the same inflectional paradigm (e.g. verbs in passive voice) in order to support the creation of a morphological lexicon. A new metric for calculating the distance between the data objects is proposed, that better suits the specific application by addressing problems that may occur due to the limited amount of information from the data. A series of experimental results are provided, that demonstrate the performance of the algorithm, compare different distance metrics in terms of their effectiveness and assist in choosing appropriate approaches for a number of parameters.

Downloads

Download data is not yet available.

Downloads

Published

2012-04-01

How to Cite

Zacharias Detorakis, & George Tambouratzis. (2012). Clustering Techniques for Establishing Inflectionally Similar Groups of Stems. International Journal of Computer Information Systems and Industrial Management Applications, 4, 9. Retrieved from https://cspub-ijcisim.org/index.php/ijcisim/article/view/168

Issue

Section

Original Articles