Clustering Techniques for Establishing Inflectionally Similar Groups of Stems
Keywords:
Agglomerative clustering, Hamming distance, inflectional paradigm, cluster proximity, cluster validityAbstract
This article presents a hierarchical clustering algorithm aimed at creating groups of stems with similar characteristics. The resulting groups (clusters) are expected to comprise stems belonging to the same inflectional paradigm (e.g. verbs in passive voice) in order to support the creation of a morphological lexicon. A new metric for calculating the distance between the data objects is proposed, that better suits the specific application by addressing problems that may occur due to the limited amount of information from the data. A series of experimental results are provided, that demonstrate the performance of the algorithm, compare different distance metrics in terms of their effectiveness and assist in choosing appropriate approaches for a number of parameters.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 International Journal of Computer Information Systems and Industrial Management Applications
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.