Handling Imbalance Data in Classification Model with Nominal Predictors

Authors

  • Kartika Fithriasari Institut Teknologi Sepuluh Nopember
  • Iswari Hariastuti National Family Planning Coordinating Board (BKKBN), East Java, Indonesia
  • Kinanthi Sukma Wening Institut Teknologi Sepuluh Nopember

Keywords:

ADASYN-N, CART, hybrid SMOTE-N, imbalanced data, premarital sex

Abstract

Decision tree, one of classification method, can be done to find out the factors that predict something with interpretable result. However, a small and unbalanced percentage will make the classification only lead to the majority class. Therefore, handling imbalance class needs to be done. One method that often used in nominal predictor data is SMOTE-N. For accuracy improving, a hybrid SMOTE-N and ADASYN-N was developed. SMOTE-N-ENN and ADASYN-N were developed for accuracy improvement. In this study, SMOTE-N, SMOTE-N-ENN and ADASYN-N will be compared in handling imbalance class in the classification of premarital sex among adolescent using base class CART. The conclusion obtained regarding the best method for handling class imbalance is ADASYN-N because it provides the highest AUC compared to SMOTE-N and SMOTE-N-ENN. The best decision tree provides information that factors that can predict adolescents having premarital sexual relations are dating style, knowledge of the fertile period, knowledge of the risk of young marriage, gender, recent education, and area of residence.

References

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.

S. Vluymans, N. Verbiest, C. Cornelis, and Y. Saeys, “Instance selection for imbalanced data,” in WorkshopRough Sets: Theory and Applications(RST&A); held at the 2014 Joint Rough Set symposium (JRS 2014), 2014.

H. Haibo, B. Yang, G. A. Edwardo, and L. Shutao, “Adaptive synthetic sampling approach for imbalanced learning,” in IEEE International Joint Conference on Neural Networks, IJCNN, vol. 8, no. 3, 2016, pp. 1322–1328.

S. Rahayu, T. Adji, and N. Setiawan, “Analisis perbandingan metode oversampling adaptive synthetic-nominal (adasyn-n) dan adaptive synthetic-knn (adsyn-knn) untuk data dengan fitur nominal-multi categories,” 2017.

M. Adiansyah, “Perbandingan metode cart dan analisis regresi logistik serta penerapannya untuk klasifikasi ketertinggalan kabupaten dan kota di Indonesia,” Ph.D. dissertation, Institut Pertanian Bogor, 2017.

D. Jeyarani, G. Anushya, R. Rajeswari, and A. Pethalakshmi, “A comparative study of decision tree and naive bayesian classifiers on medical datasets,” International Journal of Computer Applications, vol. 975, p. 8887, 2013.

L. Breiman, J. Friedman, C. Stone, and R. Olshen, Classification and regression trees. CRC press, 1984.

K. Fithriasari, S. Pangastuti, N. Iriawan, and W. Suryaningtyas, “Classification boosting in imbalanced data,” MJS, vol. 38, no. Sp2, pp. 36–45, 2019.

S. Cost and S. Salzberg, “A weighted nearest neighbor algorithm for learning with symbolic features,” Machine learning, vol. 10, no. 1, pp. 57–78, 1993.

Downloads

Published

2020-02-15

How to Cite

Fithriasari, K., Hariastuti, I., & Wening, K. S. (2020). Handling Imbalance Data in Classification Model with Nominal Predictors. (IJCSAM) International Journal of Computing Science and Applied Mathematics, 6(1), 33–37. Retrieved from https://journal.its.ac.id/index.php/ijcsam/article/view/4555

Issue

Section

Articles