A Multi-label book genre classification: Comparison of machine learning techniques and problem transformation methods
DOI:
https://doi.org/10.21107/kursor.v13i2.389Keywords:
Binary Relevance, Label Powerset, Logistic Regression, Multi-label classification, Multinomial Naïve Bayes, SVMAbstract
Books play an essential role in life as a source of knowledge and information. The increasing number of books published makes classification more complex, especially in a multi-label context where a book may belong to more than one genre. Furthermore, automatic classification of book genres is required due to the transition of books to e-book and audiobook formats. This research analyzes the application of machine learning techniques using Support Vector Machine (SVM), Logistic Regression (LR), and Multinomial Naive Bayes (MNB) for multi-label book genre classification by comparing their performance through stemming and unstemming process in text preprocessing with TF-IDF and K-Fold cross-validation (k = 10). In addition, two problem transformation methods, Binary Relevance (BR) and Label Powerset (LP), are evaluated. The results show that SVM combined with stemming outperforms other models across all metrics of accuracy, precision, recall, and F1-score. SVM is effective in handling complex and imbalanced data distributions, resulting in more accurate and consistent predictions. The stemming process positively contributes by reducing word variation and allowing the model to focus on word meanings. Among problem transformation methods, LP yields better results because it can capture relationships between labels more effectively than BR.
Downloads
References
[1] P. N. R. Indonesia, ‘Analisis Statistik ISBN per Tahun’, https://isbn.perpusnas.go.id/Home/Statistik#intro. 2024.
[2] D. Nurbaiti, ‘Perkembangan Ebook Dalam Industri Penerbitan Buku Fisik Serta Pertumbuhan Minat Menulis Buku’, IKRAITH Ekon., vol. 2, no. 2, pp. 11–20, Jul. 2019.
[3] V. R. S. Nastiti, S. Basuki, and Hilman, ‘Klasifikasi Sinopsis Novel Menggunakan Metode Naïve Bayes Classifier’, REPOSITOR, vol. 1, no. 2, pp. 125–130, Dec. 2019. https://doi.org/10.22219/repositor.v1i2.799
[4] B. Falakhi, I. Cholissodin, and R. S. Perdana, ‘Klasifikasi Sinopsis Novel berdasarkan Jenis Genre menggunakan Multi-class Support Vector Machine dan Chi-square’, J. Pengemb. Teknol. Inf. Dan Ilmu Komput., vol. 7, no. 1, pp. 192–202, Jan. 2023.
[5] W. K. Sari, D. P. Rini, R. F. Malik, and I. S. B. Azhar, ‘Multilabel Text Classification in News Articles Using Long-Term Memory with Word2Vec’, J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 4, no. 2, pp. 276–285, Apr. 2020, https://doi.org/10.29207/resti.v4i2.1655 .
[6] R. Wang, R. Ridley, X. Su, W. Qu, and X. Dai, ‘A novel reasoning mechanism for multi-label text classification’, Inf. Process. Manag., vol. 58, no. 2, p. 102441, Mar. 2021, https://doi.org/10.1016/j.ipm.2020.102441
[7] Z. Yang and F. Emmert-Streib, ‘Optimal performance of Binary Relevance CNN in targeted multi-label text classification’, Knowl.-Based Syst., vol. 284, p. 111286, Jan. 2024, https://doi.org/10.1016/j.knosys.2023.111286 .
[8] B. Al-Salemi, M. Ayob, G. Kendall, and S. A. M. Noah, ‘Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms’, Inf. Process. Manag., vol. 56, no. 1, pp. 212–227, Jan. 2019, https://doi.org/10.1016/j.ipm.2018.09.008 .
[9] A. Wiraguna, S. A. Faraby, and Adiwijaya, ‘Klasifikasi Topik Multi Label pada Hadis Bukhari dalam Terjemahan Bahasa Indonesia Menggunakan Random Forest’, E-Proceeding Eng., vol. 6, no. 1, pp. 2144–2153, Apr. 2019.
[10] J. W. Iskandar and Y. Nataliani, ‘Perbandingan Naïve Bayes, SVM, dan k-NN untuk Analisis Sentimen Gadget Berbasis Aspek’, J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 5, no. 6, pp. 1120–1126, Dec. 2021, https://doi.org/10.29207/resti.v5i6.3588 .
[11] Y. Wahba, N. Madhavji, and J. Steinbacher, ‘A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks’, 2023, pp. 304–313. https://doi.org/10.1007/978-3-031-25891-6_23 .
[12] K. Shah, H. Patel, D. Sanghvi, and M. Shah, ‘A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification’, Augment. Hum. Res., vol. 5, no. 1, p. 12, Dec. 2020, https://doi.org/10.1007/s41133-020-00032-0 .
[13] M. Y. H. Setyawan, R. M. Awangga, and S. R. Efendi, ‘Comparison Of Multinomial Naive Bayes Algorithm And Logistic Regression For Intent Classification In Chatbot’, in 2018 International Conference on Applied Engineering (ICAE), IEEE, Oct. 2018, pp. 1–5. https://doi.org/10.1109/INCAE.2018.8579372 .
[14] M. Veziroğlu, E. Veziroğlu, and İ. Ö. Bucak, ‘Performance Comparison between Naive Bayes and Machine Learning Algorithms for News Classification’, in Bayesian Inference - Recent Trends, IntechOpen, 2024. https://doi.org/10.5772/intechopen.1002778
[15] M. Abbas, K. A. Memon, A. A. Jamali, S. Memon, and A. Ahmed, ‘Multinomial Naive Bayes Classification Model for Sentiment Analysis’, IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 19, no. 3, pp. 62–67, Mar. 2019.
[16] K. Khadijah, N. Sabilly, and F. A. Nugroho, ‘Sentiment Analysis of League of Legends: Wild Rift Reviews on Google Play Using Naive Bayes Classifier’, J. Ilm. Kursor, vol. 12, no. 1, pp. 23–30, Jul. 2023, https://doi.org/10.21107/kursor.v12i01.328 .
[17] I. Z. Amalia, A. N. Ponco Bimantoro, A. Z. Arifin, M. Faisol, R. Indraswari, and R. W. Sholikah, ‘Indonesian-Translated Hadith Content Weighting in Pseudo-Relevance Feedback Query Expansion’, J. Ilm. Kursor, vol. 11, no. 1, Jul. 2021, https://doi.org/10.21107/kursor.v11i1.249 .
[18] Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, ‘Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation’, J. Big Data, vol. 8, no. 1, p. 26, Dec. 2021, https://doi.org/10.1186/s40537-021-00413-1 .
[19] M. Lestandy, A. Abdurrahim, and L. Syafa’ah, ‘Analisis Sentimen Tweet Vaksin COVID-19 Menggunakan Recurrent Neural Network dan Naïve Bayes’, J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 5, no. 4, pp. 802–808, Aug. 2021, https://doi.org/10.29207/resti.v5i4.3308 .
[20] U. L. Yuhana, I. Imamah, C. Fatichah, and B. J. Santoso, ‘Effectiveness of Deep Learning Approach For Text Classification in Adaptive Learning’, J. Ilm. Kursor, vol. 11, no. 3, pp. 137–144, Jul. 2022, https://doi.org/10.21107/kursor.v11i3.285 .
[21] V. N. Wijayaningrum, A. P. Kirana, and I. K. Putri, ‘Student Academic Performance Prediction Framework with Feature Selection and Imbalanced Data Handling’, J. Ilm. Kursor, vol. 12, no. 3, pp. 123–134, May 2024, https://doi.org/10.21107/kursor.v12i3.356 .
[22] T. Ridwansyah, ‘Implementasi Text Mining Terhadap Analisis Sentimen Masyarakat Dunia Di Twitter Terhadap Kota Medan Menggunakan K-Fold Cross Validation Dan Naïve Bayes Classifier’, KLIK Kaji. Ilm. Inform. Dan Komput., vol. 2, no. 5, pp. 178–185, Apr. 2022, https://doi.org/10.30865/klik.v2i5.362 .
[23] D. M. Abdullah and A. M. Abdulazeez, ‘Machine Learning Applications based on SVM Classification A Review’, Qubahan Acad. J., vol. 1, no. 2, pp. 81–90, Apr. 2021, https://doi.org/10.48161/qaj.v1n2a50 .
[24] W. Fernando, D. Jollyta, D. Priyanto, and D. Oktarina, ‘The Influence Of Data Categorization And Attribute Instances Reduction Using The Gini Index On The Accuracy Of The Classification Algorithm Model’, J. Ilm. Kursor, vol. 12, no. 3, pp. 111–122, May 2024, https://doi.org/10.21107/kursor.v12i3.372 .
[25] R. Torres, O. Ohashi, and G. Pessin, ‘A Machine-Learning Approach to Distinguish Passengers and Drivers Reading While Driving’, Sensors, vol. 19, no. 14, p. 3174, Jul. 2019, https://doi.org/10.3390/s19143174 .
[26] Yuyun, N. Hidayah, and S. Sahibu, ‘Algoritma Multinomial Naïve Bayes Untuk Klasifikasi Sentimen Pemerintah Terhadap Penanganan Covid-19 Menggunakan Data Twitter’, J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 5, no. 4, pp. 820–826, Aug. 2021, https://doi.org/10.29207/resti.v5i4.3146 .
[27] Nofriani and N. B. Kurniawan, ‘Harnessing Multi-label Classification Approaches for Economic Phenomena Categorization’, ASEAN J. Sci. Technol. Dev., vol. 38, no. 2, Aug. 2021, https://doi.org/10.29037/ajstd.680 .
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2025 Eka Mira Novita Subroto, Muhammad Faisal

This work is licensed under a Creative Commons Attribution 4.0 International License.





