Klasifikasi Obesitas Menggunakan Domain-Based Decision Tree, XGBoost, dan SHAP

Main Article Content

Embay Rohaeti
Ani Andriyati
Muhammad Edy Rizal

Abstract

Obesity is a complex public health issue that requires effective early identification strategies to mitigate its long-term health impacts. This study aimed to classify obesity levels, categorized as Underweight, Normal, Overweight, and Obese, using 14 predictors grouped into three domains: biological, dietary, and physical activity. In addition to developing an accurate predictive model, the study investigated which domain contributes most to obesity classification. Two complementary modeling strategies were applied: domain-specific decision tree models to evaluate the predictive strength of each domain independently, and a comprehensive model using eXtreme Gradient Boosting (XGBoost) trained on all predictors. To address class imbalance, SMOTENC oversampling was applied to the training set, and hyperparameter tuning was performed via cross-validation for both approaches. Evaluation on the test set showed that the XGBoost model outperformed the domain-based decision trees across all performance metrics, including balanced accuracy, precision, recall, specificity, and F1-score. While decision trees offered domain-level interpretability, they lacked the predictive power of the integrated model. SHAP (SHapley Additive Explanations) analysis revealed that influential features spanned all domains, with Age, Vegetable consumption, and Transportation type emerging as top predictors. These findings demonstrate that integrating multi-domain behavioral data enhances both the accuracy and interpretability of obesity classification models, supporting the use of interpretable machine learning for personalized health risk assessment and prevention strategies

Article Details

How to Cite
Rohaeti, E., Andriyati, A., & Edy Rizal, M. (2026). Klasifikasi Obesitas Menggunakan Domain-Based Decision Tree, XGBoost, dan SHAP. Limits: Journal of Mathematics and Its Applications, 23(1), 77–96. https://doi.org/10.12962/limits.v23i1.7970
Section
Articles

References

[1] T. Basu, A. Selman, A. P. Reddy, and P. H. Reddy, “Current Status of Obesity: Protective Role of Catechins,” Antioxidants 2023, Vol. 12, Page 474, vol. 12, no. 2, p. 474, Feb. 2023, doi: 10.3390/ANTIOX12020474.

[2] X. Jin et al., “Pathophysiology of obesity and its associated diseases,” Acta Pharm Sin B, vol. 13, no. 6, pp. 2403–2424, Jun. 2023, doi: 10.1016/J.APSB.2023.01.012.

[3] M. Janić, A. Janež, M. El-Tanani, and M. Rizzo, “Obesity: Recent Advances and Future Perspectives,” Biomedicines 2025, Vol. 13, Page 368, vol. 13, no. 2, p. 368, Feb. 2025, doi: 10.3390/BIOMEDICINES13020368.

[4] M. Tariqujjaman et al., “Determinants of Double Burden of Malnutrition Among School Children and Adolescents in Urban Dhaka: A Multi-Level Analyses,” Front Public Health, vol. 10, p. 926571, Jul. 2022, doi: 10.3389/FPUBH.2022.926571/BIBTEX.

[5] World Health Organization, “Obesity and overweight.” Accessed: Jul. 27, 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight

[6] J. Du, S. Yang, Y. Zeng, C. Ye, X. Chang, and S. Wu, “Visualization obesity risk prediction system based on machine learning,” Scientific Reports 2024 14:1, vol. 14, no. 1, pp. 1–9, Sep. 2024, doi: 10.1038/s41598-024-73826-6.

[7] Z. Helforoush and H. Sayyad, “Prediction and classification of obesity risk based on a hybrid metaheuristic machine learning approach,” Front Big Data, vol. 7, p. 1469981, Sep. 2024, doi: 10.3389/FDATA.2024.1469981/BIBTEX.

[8] N. Koklu and S. A. Sulak, “Predicting risk of obesity and meal planning to reduce the obese in adulthood using artificial intelligence,” Endocrine, vol. 78, no. 3, pp. 458–469, Dec. 2022, doi: 10.1007/S12020-022-03215-4/METRICS.

[9] Y. C. Lee et al., “Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions,” Front Genet, vol. 12, p. 783845, Jan. 2022, doi: 10.3389/FGENE.2021.783845/BIBTEX.

[10] World Health Organization, “Moderate and severe thinness, underweight, overweight and obesity,” Nutrition Landscape Information System (NLiS). Accessed: Nov. 02, 2025. [Online]. Available: https://apps.who.int/nutrition/landscape/help.aspx?menu=0&helpid=392&lang=EN

[11] S. M. Lundberg, P. G. Allen, and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” Adv Neural Inf Process Syst, vol. 30, 2017, Accessed: Jul. 27, 2025. [Online]. Available: https://github.com/slundberg/shap

[12] M. Imani, A. Beikmohammadi, and H. R. Arabnia, “Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels,” Technologies 2025, Vol. 13, Page 88, vol. 13, no. 3, p. 88, Feb. 2025, doi: 10.3390/TECHNOLOGIES13030088.

[13] E. Rohaeti and A. Andriyati, “Comparative Study of Predictive Classification Models on Data with Severely Imbalanced Predictors,” 2024.

[14] E. C. Gök and M. O. Olgun, “SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples,” Neural Comput Appl, vol. 33, no. 22, pp. 15693–15707, Nov. 2021, doi: 10.1007/s00521-021-06189-y.

[15] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: A statistical view of boosting,” 2000. doi: 10.1214/aos/1016218223.

[16] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.

[17] R. Diallo, C. Edalo, and O. O. Awe, “Machine Learning Evaluation of Imbalanced Health Data: A Comparative Analysis of Balanced Accuracy, MCC, and F1 Score,” STEAM-H: Science, Technology, Engineering, Agriculture, Mathematics and Health, vol. Part F4005, pp. 283–312, 2025, doi: 10.1007/978-3-031-72215-8_12.

[18] H. He and Y. Ma, Imbalanced learning: Foundations, algorithms, and applications, vol. 1. John Wiley & Sons, 2013. doi: 10.1002/9781118646106.

[19] D. J. Hand, “A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems,” 2001.

[20] M. Kuhn, D. Vaughan, and E. Hvitfeldt, “yardstick: Tidy Characterizations of Model Performance,” 2025. [Online]. Available: https://github.com/tidymodels/yardstick

[21] N. Reviani and Y. E. Riany, “Establishing Healthy Eating Habits during Child Development to Reduce the Prevalence of Obesity,” 2022.

[22] J. Jeon, S. Lee, and C. Oh, “Age-specific risk factors for the prediction of obesity using a machine learning approach.” [Online]. Available: https://knhanes.kdca.go.kr/knhanes

[23] G. R. Patil and G. Sharma, “Overweight/obesity relationship with travel patterns, socioeconomic characteristics, and built environment,” J Transp Health, vol. 22, p. 101240, Sep. 2021, doi: 10.1016/J.JTH.2021.101240.

[24] S. Yang et al., “Access to fruit and vegetable markets and childhood obesity: A systematic review,” Obesity Reviews, vol. 22, no. S1, p. e12980, Feb. 2021, doi: 10.1111/OBR.12980.

[25] F. Yuan et al., “Association of Physical Activity and Sitting Time with Overweight/Obesity in Chinese Occupational Populations,” Research Article Obes Facts, vol. 14, pp. 141–147, 2021, doi: 10.1159/000512834.