Development of An External Cluster Validity Index using Probabilistic Approach and Min-max Distance
Keywords:
Cluster validity, External cluster validity index, Genetic K-means clustering algorithm, Single linkage clusteringAbstract
Validating a given clustering result is a very challenging task in real world. So for this purpose, several cluster validity indices have been developed in the literature. Cluster validity indices are divided into two main categories: external and internal. External cluster validity indices rely on some supervised information available and internal validity indices utilize the intrinsic structure of the data. In this paper a new external cluster validity index, MMI and its normalized version NMMI have been implemented based on Max-Min distance along data points and prior information using structure of data. A new probabilistic approach has been implemented to find the correct correspondence between the true and obtained clustering. Different possibilities for probabilistic approaches have been considered and tried to rectify their problems. Genetic K-means clustering algorithm (GAK-means) and single linkage clustering technique have been used as the underlying clustering techniques. Results of proposed index for classifying the true partitioning results have been shown for six artificial and two real-life data sets. GAK-means and single linkage clustering techniques are used as the underlying partitioning techniques with the number of clusters varied in a range. The MMI and NMMI index are then used to determine the appropriate number of clusters. Performance of MMI along with its two versions MMI old and MMI new along with its normalized version NMMI are compared with the existing external cluster validity indices, F-measure, purity, normalized mutual information (NMI), rand index (RI), adjusted rand index (ARI). Proposed MMI index works well for two class and multi class data sets.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 International Journal of Computer Information Systems and Industrial Management Applications
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.