A Cross-Entropy Based Feature Selection Method for Binary Valued Data Classification

Authors

  • Zhipeng Wang Department Of Computer Science, College of Information Science and Technology University of Nebraska at Omaha, Omaha, Nebraska 68182 USA
  • Qiuming Zhu Department Of Computer Science, College of Information Science and Technology University of Nebraska at Omaha, Omaha, Nebraska 68182 USA

Keywords:

Binary Features, Feature Selection, Cross Entropy, Pattern Classification, Model Verification

Abstract

Feature selection is a process of finding a meaningful subset of attributes from a given set of measurements for a purpose of revealing a coherent relation or causality in an event. The process is often indispensable to facilitate an effective pattern classification. It is usually a preprocessing step before constructing a machine learning model in big data analytics for improving the accuracy of predictive results. By selecting the most significant features, it could reduce the time of training and the complexity of the model, avoid data overfitting, and help user to better understand the source data and the modeling outcomes. Though features are commonly dealt with in continuous values, many features appear to be binary valued, i.e., either 1 or 0, in many real-world machine learning applications. Inspired by existing feature selection methods, a new framework called FMC_SELECTOR was presented in this paper which addresses specifically the selection of significant features of binary valued attributes from highly imbalanced large datasets. The FMC_SELECTOR combines the fisher linear discriminant analysis with a cross-entropy mechanism to create an integrated mapping function for evaluating each individual features from a given dataset. A new formulization called Mapping Based CrossEntropy Evaluation (MCE) was derived for a quantitative ranking of the features. A Positive Case Prediction Score (PPS) is explored to verify the significance of the features selected in a classification process. The performance of FMC_SELECTOR is compared with two popular feature selection methods – the Univariate Importance (UI) and Recursive Feature Elimination (RFM), and shows a better performance on the datasets tested.

Downloads

Download data is not yet available.

Downloads

Published

2022-01-01

How to Cite

Zhipeng Wang, & Qiuming Zhu. (2022). A Cross-Entropy Based Feature Selection Method for Binary Valued Data Classification. International Journal of Computer Information Systems and Industrial Management Applications, 14, 13. Retrieved from https://cspub-ijcisim.org/index.php/ijcisim/article/view/501

Issue

Section

Original Articles