Clustering Techniques for Establishing Inflectionally Similar Groups of Stems


  • Zacharias Detorakis Inst. for Language and Speech Processing, 6 Artemidos & Epidavrou Str.
  • George Tambouratzis Inst. for Language and Speech Processing, 6 Artemidos & Epidavrou Str.


Agglomerative clustering, Hamming distance, inflectional paradigm, cluster proximity, cluster validity


This article presents a hierarchical clustering algorithm aimed at creating groups of stems with similar characteristics. The resulting groups (clusters) are expected to comprise stems belonging to the same inflectional paradigm (e.g. verbs in passive voice) in order to support the creation of a morphological lexicon. A new metric for calculating the distance between the data objects is proposed, that better suits the specific application by addressing problems that may occur due to the limited amount of information from the data. A series of experimental results are provided, that demonstrate the performance of the algorithm, compare different distance metrics in terms of their effectiveness and assist in choosing appropriate approaches for a number of parameters.


Download data is not yet available.




How to Cite

Zacharias Detorakis, & George Tambouratzis. (2012). Clustering Techniques for Establishing Inflectionally Similar Groups of Stems. International Journal of Computer Information Systems and Industrial Management Applications, 4, 9. Retrieved from



Original Articles