Development of An External Cluster Validity Index using Probabilistic Approach and Min-max Distance

Authors

  • Abhay Kumar Alok Indian Institute of Technology Patna, Computer Science Engineering, Software Technological Park of India , Patna 800013, India
  • Sriparna Saha Indian Institute of Technology Patna, Computer Science Engineering, Software Technological Park of India , Patna 800013, India
  • Asif Ekbal Indian Institute of Technology Patna, Computer Science Engineering, Software Technological Park of India, Patna 800013, India

Keywords:

Cluster validity, External cluster validity index, Genetic K-means clustering algorithm, Single linkage clustering

Abstract

Validating a given clustering result is a very challenging task in real world. So for this purpose, several cluster validity indices have been developed in the literature. Cluster validity indices are divided into two main categories: external and internal. External cluster validity indices rely on some supervised information available and internal validity indices utilize the intrinsic structure of the data. In this paper a new external cluster validity index, MMI and its normalized version NMMI have been implemented based on Max-Min distance along data points and prior information using structure of data. A new probabilistic approach has been implemented to find the correct correspondence between the true and obtained clustering. Different possibilities for probabilistic approaches have been considered and tried to rectify their problems. Genetic K-means clustering algorithm (GAK-means) and single linkage clustering technique have been used as the underlying clustering techniques. Results of proposed index for classifying the true partitioning results have been shown for six artificial and two real-life data sets. GAK-means and single linkage clustering techniques are used as the underlying partitioning techniques with the number of clusters varied in a range. The MMI and NMMI index are then used to determine the appropriate number of clusters. Performance of MMI along with its two versions MMI old and MMI new along with its normalized version NMMI are compared with the existing external cluster validity indices, F-measure, purity, normalized mutual information (NMI), rand index (RI), adjusted rand index (ARI). Proposed MMI index works well for two class and multi class data sets.

Downloads

Download data is not yet available.

Downloads

Published

2014-04-01

How to Cite

Abhay Kumar Alok, Sriparna Saha, & Asif Ekbal. (2014). Development of An External Cluster Validity Index using Probabilistic Approach and Min-max Distance. International Journal of Computer Information Systems and Industrial Management Applications, 6, 11. Retrieved from https://cspub-ijcisim.org/index.php/ijcisim/article/view/276

Issue

Section

Original Articles