Semi-Supervised Learning Approach to Improve Machine Learning Algorithms for Churn Analysis in Telecommunication
Keywords:
Customer Churn, Data Mining, Machine Learning, Semi-supervised learning, Supervised ClassificationAbstract
In semi supervised learning, knowledge is acquired with the help of unlabeled and labeled data both. Supervised classification predicts the labels of unknown data with the guidance of labeled data. To obtain the labeled data in sufficient amount and at low cost is challenging task. This paper presents comparative research on six most widely used machine learning classifiers for churn prediction in telecommunication. We also propose a pseudo label semi supervised learning model that could validate the improvement in the classifiers performance by exploiting large volume of unlabeled data with partnership of small-labeled data. In the first stage, six supervised algorithms like SVM(Support Vector Machine), Random Forest, Logistic Regression, AdaBoosting, Gradient Boosting and eXtreme Gradient Boosting are applied and assessed using cross validation technique along with external measures on telecom dataset for churn prediction. In second stage, the improvement in performance of all classifiers is evaluated using semi supervised learning. Empirical results demonstrate the competency of proposed model to these six baseline classifiers. The overall best classifiers are Gradient Boosting and eXtreme Gradient Bossting classifiers with semi supervised learning having 99.24% and 99.62% approximate accuracy respectively.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 International Journal of Computer Information Systems and Industrial Management Applications
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.