Selection of Feature Data in KNN Classification Datasets
Main Article Content
Abstract
Featured data in a dataset can affect the data processing, either for the better or for the worse. In addition, feature data can also affect the time of data processing. Selection of the right feature data may need to be done where the feature data can represent the whole of a dataset. In this study, a search for feature data will be carried out that can result in better data processing. The classification process will be carried out on an Iris dataset with the KNN algorithm. The iris dataset has 4 feature data (Sepal Length, Sepal Width, Petal Length, Petal Width) and the exact feature data variation will be determined in this classification. The dataset will be broken down into 7 variations of data and tested with a comparison of the training data and test data, namely 90:10, 80:20, 70:30, 60:40, 50:50, 40:60, 30:70, 20:80 and 10:90. The KNN algorithm used has parameters with the number of n neighbors 5 and the Minkowski metric. In this study, the highest accuracy value was 96% and the lowest accuracy value was 71%. The highest accuracy value is obtained from the variation of the Petal Length and Petal Width data features while the lowest accuracy value is obtained from the variation of the Sepal Length and Sepal Width data features.