Modified Random Forest based Graduates Earning of Higher Education Mining

Authors

  • Tahseen A. Wotaifi
  • Eman S. Al-Shamery

Keywords:

Earning Prediction of Graduates, Fuzzy-Selection Method, Education Data Mining, Random Forest, Linear Regression, Support Vector Regression

Abstract

With the significant trend of students and families towards higher education and the great change in the labor market, great attention is paid to the issue of job opportunities and the earnings of graduates. However, according to the principle of contemporary education, the policies of educational institutions changed significantly in terms of preparing and qualifying the students to compete for employment. This study aims at 1) identifying the important factors (relevant features) affecting the earnings, and 2) designing a system to predict in the employment of alumni. The new major contributions presented in this work are: the identification of the most important factors by using the fuzzy logic technique in the filter methods for feature selection, and the suggested prediction model by controlling the bootstrap samples that are selected for building the forest in the random forest algorithm. The proposed system has been carried out in light of the higher education system in the United States (US) and has been implemented on the college scorecard dataset. This dataset contains nearly (8000) colleges and exactly (1825) features, so the mechanism of selecting the relevant factors and ignoring the irrelevant features is performed using four methods: Fuzzy-Selection Method (FSM), Mean Decrease Impurity (MDI), Drop-Feature Importance (DFI), and Wrapper-Forward Selection (WFS). According to these methods, it has been found that there is a reduction rate of selection for more than 98% of the factors. Therefore, the Modified Random Forest Regression (MRFR) model is used with two other models: Linear regression and Support vector regression for comparison to predict the earnings of graduates. The Mean Absolute Error (MAE) values for these models are (0.052), (0.068), and (0.068) respectively. The research findings are better in terms of reducing the number of factors and MAE in comparison to previous works.

Downloads

Download data is not yet available.

Downloads

Published

2020-01-01

How to Cite

Tahseen A. Wotaifi, & Eman S. Al-Shamery. (2020). Modified Random Forest based Graduates Earning of Higher Education Mining. International Journal of Computer Information Systems and Industrial Management Applications, 12, 10. Retrieved from https://cspub-ijcisim.org/index.php/ijcisim/article/view/400

Issue

Section

Original Articles