The Effect of Anomaly Detection and Data Balancing in Prediction of Diabetes


  • Aruna Devi B.
  • Karthik N.


Diabetes is a health condition spurred by elevated blood glucose, commonly called blood sugar. In developing countries, diabetes is the most common illness. Expert medical intervention and prompt diagnostics are crucial measures in mitigating the effects of diabetes. The proliferation of databases in the healthcare industry presents numerous opportunities for artificial intelligence and machine learning technologies. Despite the availability of numerous medical devices, medical errors remain a major problem in the healthcare industry. Medical data with anomalous values can lead to wrong decisions. Anomaly detection is frequently employed in datasets to locate and eliminate anomalies. On the other hand, identifying and evaluating the outlier pattern may enhance a learning algorithm’s medical decisions and precision. This paper presents a novel strategy for diabetes prediction based on KNN imputation, Hybrid Sampling and Anomaly Detection. This model increases the detection rate of diabetes in the Pima Indian diabetes dataset. This work utilized five unsupervised anomaly detection algorithms and five supervised machine learning algorithms to perform diabetes prediction. This work was assessed under four conditions: diabetes prediction without anomaly detection in the unbalanced dataset, with a balanced dataset and without anomaly detection, with anomaly detection in the unbalanced dataset, and with anomaly detection in the balanced dataset. Results confirmed that the Isolation Forest and Random Forest outperform the other machine learning models in diabetes prediction with 99.23% accuracy and a precision of 0.99. The findings demonstrated that all compared methods could detect anomalous data and produce consistent outcomes across the different algorithms. The results of our experiments show that our method works better at identifying anomalies and highlighting the significance of dataset balancing and anomaly detection in diabetes prediction.


Download data is not yet available.




How to Cite

Aruna Devi B., & Karthik N. (2024). The Effect of Anomaly Detection and Data Balancing in Prediction of Diabetes . International Journal of Computer Information Systems and Industrial Management Applications, 16(2), 13. Retrieved from



Original Articles