Handling Imbalance Data in Classification Model with Nominal Predictors

Penulis

  • Kartika Fithriasari Institut Teknologi Sepuluh Nopember
  • Iswari Hariastuti National Family Planning Coordinating Board (BKKBN), East Java, Indonesia
  • Kinanthi Sukma Wening Institut Teknologi Sepuluh Nopember

Kata Kunci:

ADASYN-N, CART, hybrid SMOTE-N, imbalanced data, premarital sex

Abstrak

Decision tree, one of classification method, can be done to find out the factors that predict something with interpretable result. However, a small and unbalanced percentage will make the classification only lead to the majority class. Therefore, handling imbalance class needs to be done. One method that often used in nominal predictor data is SMOTE-N. For accuracy improving, a hybrid SMOTE-N and ADASYN-N was developed. SMOTE-N-ENN and ADASYN-N were developed for accuracy improvement. In this study, SMOTE-N, SMOTE-N-ENN and ADASYN-N will be compared in handling imbalance class in the classification of premarital sex among adolescent using base class CART. The conclusion obtained regarding the best method for handling class imbalance is ADASYN-N because it provides the highest AUC compared to SMOTE-N and SMOTE-N-ENN. The best decision tree provides information that factors that can predict adolescents having premarital sexual relations are dating style, knowledge of the fertile period, knowledge of the risk of young marriage, gender, recent education, and area of residence.

Referensi

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.

S. Vluymans, N. Verbiest, C. Cornelis, and Y. Saeys, “Instance selection for imbalanced data,” in WorkshopRough Sets: Theory and Applications(RST&A); held at the 2014 Joint Rough Set symposium (JRS 2014), 2014.

H. Haibo, B. Yang, G. A. Edwardo, and L. Shutao, “Adaptive synthetic sampling approach for imbalanced learning,” in IEEE International Joint Conference on Neural Networks, IJCNN, vol. 8, no. 3, 2016, pp. 1322–1328.

S. Rahayu, T. Adji, and N. Setiawan, “Analisis perbandingan metode oversampling adaptive synthetic-nominal (adasyn-n) dan adaptive synthetic-knn (adsyn-knn) untuk data dengan fitur nominal-multi categories,” 2017.

M. Adiansyah, “Perbandingan metode cart dan analisis regresi logistik serta penerapannya untuk klasifikasi ketertinggalan kabupaten dan kota di Indonesia,” Ph.D. dissertation, Institut Pertanian Bogor, 2017.

D. Jeyarani, G. Anushya, R. Rajeswari, and A. Pethalakshmi, “A comparative study of decision tree and naive bayesian classifiers on medical datasets,” International Journal of Computer Applications, vol. 975, p. 8887, 2013.

L. Breiman, J. Friedman, C. Stone, and R. Olshen, Classification and regression trees. CRC press, 1984.

K. Fithriasari, S. Pangastuti, N. Iriawan, and W. Suryaningtyas, “Classification boosting in imbalanced data,” MJS, vol. 38, no. Sp2, pp. 36–45, 2019.

S. Cost and S. Salzberg, “A weighted nearest neighbor algorithm for learning with symbolic features,” Machine learning, vol. 10, no. 1, pp. 57–78, 1993.

Diterbitkan

2020-02-15

Cara Mengutip

Fithriasari, K., Hariastuti, I., & Wening, K. S. (2020). Handling Imbalance Data in Classification Model with Nominal Predictors. (IJCSAM) International Journal of Computing Science and Applied Mathematics, 6(1), 33–37. Diambil dari https://journal.its.ac.id/index.php/ijcsam/article/view/4555

Terbitan

Bagian

Articles