Employee Attrition Prediction using Machine Learning in Rolling Stock Manufacturing Company


  • Mu’ammar Itqon
  • Jerry Dwi Trijoyo Purnomo


Logistic Regression, Naive Bayes, Tdsp Framework, Random Forest, Shap Analysis


Employee retention is crucial in human resource management, particularly in the rolling stock industry, characterized by a dynamic working environment and fierce competition. Elevated attrition rates can incur heightened recruitment expenses, productivity decline, and institutional knowledge loss. Addressing these concerns, this research endeavors to construct precise predictive models pinpointing employees at a heightened attrition risk. Employing the Team Data Science Process (TDSP) framework, three distinct machine learning algorithms were leveraged: Logistic Regression (LR), Naive Bayes (NB), and Random Forest (RF) to structure the employee dataset. TDSP procedure encompasses stages from data acquisition cleansing to descriptive analysis, dataset partitioning, algorithm deployment, and model appraisal. The evaluated variables include job designation, employment type, marital and educational status, gender, tenure, commute distance, and age. Model effectiveness was gauged via precision, recall, F1-score, and overall accuracy. The Random Forest algorithm surpassed its counterparts, boasting a remarkable accuracy of 93.1%. SHAP (SHapley Additive exPlanations) was incorporated for profound comprehension and model transparency. This analysis accentuated job role and employment type as pivotal in attrition forecasting. Such insights are instrumental for rolling stock firms to discern core determinants and craft potent, data-centric retention approaches.