THE INFLUENCE OF DATA CATEGORIZATION AND ATTRIBUTE INSTANCES REDUCTION USING THE GINI INDEX ON THE ACCURACY OF THE CLASSIFICATION ALGORITHM MODEL

Authors

  • Willy Fernando Institut Bisnis dan Teknologi Pelita Indonesia, Indonesia
  • Deny Jollyta Institut Bisnis dan Teknologi Pelita Indonesia, Indonesia
  • Dadang Priyanto Universitas Bumigora, Indonesia
  • Dwi Oktarina Institut Bisnis dan Teknologi Pelita Indonesia, Indonesia

DOI:

https://doi.org/10.21107/kursor.v12i3.372

Keywords:

Categorization, Classification Algorithms, Confusion Matrix, Numerical Data

Abstract

Numerical data problems are typically caused by a failure to comprehend the data and the outcomes of its processing. In order to give richer context and a deeper understanding of the facts, numerical data must be transformed into categories. On the other hand, changes in data have a significant impact on the analysis's outcomes. The purpose of this study is to see how transforming numerical data into categories affects the model produced by the classification algorithms. The dataset used in this study is the Maternal Health Risk. Categorization refers to formal arrangements. Categorization is also accomplished by using the Gini Index to limit the number of instances of an attribute. The classification is carried out using the Random Forest (RF), K-Nearest Neighbor (K-NN) and Support Vector Machine (SVM) algorithms to produce a model. The influence of data modifications to model can be observed in the confusion matrix with 5 different data splitting. The study results suggested that changing numerical data to categories data significantly improved the performance of the SVM model from 76.92% to 80.77% at a data splitting percentage of 95/5.

Downloads

Download data is not yet available.

References

O. Dammann, “Data, Information, Evidence, and Knowledge: A Proposal for Health Informatics and Data Science,” Online J. Public Health Inform., vol. 10, no. 3, p. 9, 2019, doi: 10.5210/ojphi.v10i3.9631.

M. Islam, “Data Analysis: Types, Process, Methods, Techniques and Tools,” Int. J. Data Sci. Technol., vol. 6, no. 1, pp. 10–15, 2020, doi: 10.11648/j.ijdst.20200601.12.

J. Sanders, “Defining terms: Data, information and knowledge,” in SAI Computing Conference 2016, 2016, no. July, pp. 1–6, doi: 10.1109/SAI.2016.7555986.

P. Frederick, J. C. Finley, and C. Magalis, “A Quantitative Analysis for Non-Numeric Data,” Int. J. Quant. Qual. Res. Methods, vol. 11, no. 1, pp. 1–11, 2023, doi: 10.37745/ijqqrm13/vol11n1111.

H. J. Park, “A method to convert non-numeric characters into numerical values in dynamic time warping for string matching,” Int. J. Electr. Comput. Eng., vol. 11, no. 3, pp. 2660–2665, 2021, doi: 10.11591/ijece.v11i3.pp2660-2665.

A. Ardiansyahroni, A. Tjalla, and M. Mahdiyah, “Data Kategorik dalam Penelitian: Review Bibliometrik,” J. Ilm. Mandala Educ., vol. 9, no. 1, pp. 796–802, 2023, doi: 10.58258/jime.v9i1.4814.

K. Khadijah, N. Sabilly, and F. A. Nugroho, “Sentiment Analysis of League of Legends: Wild Rift Reviews on Google Play Using NaãVe Bayes Classifier,” J. Ilm. Kursor, vol. 12, no. 1, pp. 23–30, 2023, doi: 10.21107/kursor.v12i01.328.

T. O. Togunwa, A. O. Babatunde, and K. U. R. Abdullah, “Deep hybrid model for maternal health risk classification in pregnancy: synergy of ANN and random forest,” Front. Artif. Intell., vol. 6, no. July, pp. 1–11, 2023, doi: 10.3389/frai.2023.1213436.

D. Mennickent et al., “Machine learning applied in maternal and fetal health: a narrative review focused on pregnancy diseases and complications,” Front. Endocrinol. (Lausanne)., vol. 14, no. May, pp. 1–22, 2023, doi: 10.3389/fendo.2023.1130139.

Rekha S Kambli and Nirmala, “Model for Predicting Risk Levels in Maternal Healthcare,” Int. J. Adv. Res. Innov. Ideas Educ., vol. 8, no. 6, pp. 1633–1637, 2022.

T. Ibrahim and A. Ridwan, “Determinan Penyebab Kematian Ibu dan Neonatal di Indonesia,” J. Kedokt. Nanggroe Med., vol. 5, no. 2, pp. 43–48, 2020.

M. D. A. Rosyid and S. Subektiningsih, “Klasifikasi Tingkat Risiko Kesehatan Ibu Hamil Menggunakan Algoritma Support Vectore Machine,” Indones. J. Comput. Sci., vol. 12, no. 5, pp. 2798–2807, 2023, [Online]. Available: http://ijcs.stmikindonesia.ac.id/ijcs/index.php/ijcs/article/view/3135.

T. Triana, E. Utami, and A. D. Hartanto, “Implementasi Algoritma Nearest Neighbor Pada Aplikasi Deteksi Resiko Tinggi Pada Kehamilan,” INFOKES J. Ilm. Rekam Medis dan Inform. Kesehat. Vol, vol. 13, no. 2, pp. 64–71, 2023.

D. M. U. Atmaja, A. R. Hakim, A. Basri, and A. Ariyanto, “Klasifikasi Metode Persalinan pada Ibu Hamil Menggunakan Algoritma Random Forest Berbasis Mobile,” JRST (Jurnal Ris. Sains dan Teknol., vol. 7, no. 2, pp. 167–174, 2023, doi: 10.30595/jrst.v7i2.16705.

M. Savargiv, B. Masoumi, and M. R. Keyvanpour, “A new random forest algorithm based on learning automata,” Comput. Intell. Neurosci., vol. 2021, no., pp. 1–19, 2021, doi: 10.1155/2021/5572781.

X. Peng et al., “A Comparison of Random Forest Algorithm-Based Forest Extraction with GF-1 WFV, Landsat 8 and Sentinel-2 Images,” Remote Sens., vol. 14, no. 5296, pp. 1–16, 2022, doi: 10.3390/rs14215296.

B. Zagajewski, M. Kluczek, E. Raczko, A. Njegovec, A. Dabija, and M. Kycko, “Comparison of random forest, support vector machines, and neural networks for post-disaster forest species mapping of the krkonoše/karkonosze transboundary biosphere reserve,” Remote Sens., vol. 13, no. 2581, pp. 1–23, 2021, doi: 10.3390/rs13132581.

D. H. Depari, Y. Widiastiwi, and M. M. Santoni, “Perbandingan Model Decision Tree, Naive Bayes dan Random Forest untuk Prediksi Klasifikasi Penyakit Jantung,” Inform. J. Ilmu Komput., vol. 18, no. 3, pp. 239–248, 2022, doi: 10.52958/iftk.v18i3.4694.

H. A. Roysid, A. Maulana, and U. Pujianto, “Can K-Nearest Neighbor Method Be Used To Predict Success in Indonesia State University Student Selection,” Kursor, vol. 9, no. 4, pp. 137–144, 2018, doi: 10.28961/kursor.v9i4.186.

Q. Zheng, L. Wang, J. He, and T. Li, “KNN-Based Consensus Algorithm for Better Service Level Agreement in Blockchain as a Service (BaaS) Systems,” Electronics, vol. 12, no. 1429, pp. 1–21, 2023, doi: 10.3390/electronics12061429.

I. M. S. Bimantara and I. M. Widiartha, “Optimization of K-Means Clustering Using Particle Swarm Optimization Algorithm for Grouping Traveler Reviews Data on Tripadvisor Sites,” J. Ilm. Kursor, vol. 12, no. 1, pp. 1–10, 2023, doi: 10.21107/kursor.v12i01.269.

N. Arifin, U. Enri, and N. Sulistiyowati, “Penerapan Algoritma Support Vector Machine (SVM) dengan TF-IDF N-Gram untuk Text Classification,” STRING (Satuan Tulisan Ris. dan Inov. Teknol., vol. 6, no. 2, pp. 129–136, 2021, doi: 10.30998/string.v6i2.10133.

D. Jollyta, P. Prihandoko, A. Hajjah, E. Haerani, and M. Siddik, Algoritma Klasifikasi Untuk Pemula Solusi Pyhton dan RapidMiner. Yogyakarta: Deepublish, 2023.

S. Tangirala, “Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 2, pp. 612–619, 2020, doi: 10.14569/ijacsa.2020.0110277.

I. T. Jollife and J. Cadima, “Principal component analysis: A review and recent developments,” Philos. Trans. R. Soc. A Math. Phys. Eng. Sci., vol. 374, no. 2065, pp. 1–16, 2016, doi: 10.1098/rsta.2015.0202.

M. Kesehatan, PERATURAN MENTERI KESEHATAN REPUBLIK INDONESIA NOMOR 25 TAHUN 2016 TENTANG RENCANA AKSI NASIONAL KESEHATAN LANJUT USIA TAHUN 2016-2019, vol., no. 2016, p. 97.

C. A. Peralta, R. Katz, A. B. Newman, B. M. Psaty, and M. C. Odden, “Systolic and diastolic blood pressure, incident cardiovascular events, and death in elderly persons: The role of functional limitation in the cardiovascular health study,” Hypertension, vol. 64, no. 3, pp. 472–480, 2014, doi: 10.1161/HYPERTENSIONAHA.114.03831.

Y. Wahyuni, C. Zaddana, A. Maesya, and A. Izzuddin, “Early detection model of normal and abnormal blood flow using pulse Oximetry non-invasive of pregnant heart rate,” Sink. J. dan Penelit. Tek. Inform., vol. 7, no. 3, pp. 2125–2133, 2022.

I. I. Geneva, B. Cuzzo, T. Fazili, and W. Javaid, “Normal body temperature: A systematic review,” Open Forum Infect. Dis., vol. 6, no. 4, pp. 1–7, 2019, doi: 10.1093/ofid/ofz032.

P. Kansara, R. Dhar, R. Shah, D. Mehta, and P. Raut, “Heart Rate Measurement,” in Journal of Physics: Conference Series, 2021, vol. 1831, no. 1, p. 12, doi: 10.1088/1742-6596/1831/1/012020.

Published

2024-05-25

Issue

Section

Articles

Citation Check