COMPARISON OF STEMMING AND SIMILARITY ALGORITHMS IN INDONESIAN TRANSLATED AL-QUR'AN TEXT SEARCH

Authors

  • Ika Oktavia Suzanti University of Trunojoyo Madura, Indonesia
  • Achmad Jauhari

DOI:

https://doi.org/10.21107/kursor.v11i2.280

Keywords:

Information Retrieval, Enhanced Confix Stripping, Nazief and Adriani, Cosine Similarity, Dice Similarity

Abstract

The long history of information retrieval did not begin with Internet. Prior to widespread public daily use of search engines, in the 1960s information retrieval systems were discovered in commercial and intelligence applications. There are two stages in Information Retrieval in doing its main job which is to preprocessing text and to calculate similarity between term (word) and query (keyword) user searched for in a document. Stemming is final stage of pre-processing in an information retrieval system. The way stemming works is to remove affixes from a word, in form of prefixes, suffixes and insertions into form of basic word. Thus, in this paper we did compare search on information retrieval system without using stemming algorithm, using stemming Porter, Nazief & Adriani and Enhanced Confix Stripping with similarity method used is cosine similarity and dice similarity. Based on test results, text search ability on dice similarity is faster in stemming process with Porter Stemmer and ECS algorithms. While in Nazief & Adriani algorithm and without stemming, cosine similarity is faster than dice similarity.

Downloads

Download data is not yet available.

References

[1] F. Malik, “The Qur’an in English Translation Complete,” Mideast. Coexistence, 2007.
[2] A. M. Abualkishik, K. Omar, and G. A. Odiebat, “QEFSM model and Markov Algorithm for translating Quran reciting rules into Braille code,” J. King Saud Univ. Inf. Sci., vol. 27, no. 3, pp. 238–247, 2015.
[3] M. F. Hilmi, M. F. Haron, O. Majid, and Y. Mustapha, “Authentication of electronic version of the Holy Quran: an information security perspective,” in 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, 2013, pp. 61–65.
[4] M. Almazrooie, A. Samsudin, A. A.-A. Gutub, M. S. Salleh, M. A. Omar, and S. A. Hassan, “Integrity verification for digital Holy Quran verses using cryptographic hash function and compression,” J. King Saud Univ. Inf. Sci., vol. 32, no. 1, pp. 24–34, 2020.
[5] S. Raharjo, R. Wardoyo, and A. E. Putra, “Detecting proper nouns in indonesian-language translation of the quran using a guided method,” J. King Saud Univ. Inf. Sci., vol. 32, no. 5, pp. 583–591, 2020.
[6] Y. S. Yogi Suntono, “Implementasi Text Mining Pada Aplikasi Search Engine Tafsir Al-Qur’an Menggunakan Metode Cosine Similarity.” TEKNIK INFORMATIKA, 2017.
[7] M. Sanderson and W. B. Croft, “The history of information retrieval research,” Proc. IEEE, vol. 100, no. Special Centennial Issue, pp. 1444–1451, 2012.
[8] M. A. Hearst, “’Natural’search user interfaces,” Commun. ACM, vol. 54, no. 11, pp. 60–67, 2011.
[9] P. Seethalaksmi, “Semantic search based efficient retrieval of educational multimedia information using service oriented architecture.”
[10] C. W. Choo, B. Detlor, and D. Turnbull, “Information Seeking on the Web--An Integrated Model of Browsing and Searching.,” 1999.
[11] A. A. Magriyanti, “Analisis Pengembangan Algoritma Porter Stemming Dalam Bahasa Indonesia,” 2018.
[12] B. C. Ningrum, “Perbandingan Algoritma Stemming untuk Bahasa Indonesia dengan Parameter Akurasi dan Waktu Proses,” 2019.
[13] R. Melita, “Penerapan Metode Term Frequency Inverse Document Frequency (Tf-Idf) Dan Cosine Similarity Pada Sistem Temu Kembali Informasi Untuk Mengetahui Syarah Hadits Berbasis Web (Studi Kasus: Hadits Shahih Bukhari-Muslim),” Fakultas Sains dan Teknologi UIN Syarif Hidayatullah Jakarta, 2018.
[14] R. C. N. Santi, S. Eniyati, R. Retnowati, and H. Yulianton, “PENGGUNAAN SISTEM TEMU KEMBALI DALAM PENCARIAN KATA UNTUK TERJEMAHAN AL QURAN,” 2019.
[15] B. Poernomo et al., “Sistem Information Retrieval Pencarian Kesamaan Ayat Terjemahan Al Quran Berbahasa Indonesia,” Semin. Nas. Teknol. Inf. dan Komun., pp. 100–108, 2015.
[16] I. Humaini, T. Yusnitasari, L. Wulandari, D. Ikasari, and H. Dutt, “Informatian Retrieval of Indonesian Translated version of Al Quran and Hadith Bukhori Muslim,” in 2018 International Conference on Sustainable Energy, Electronics, and Computing Systems (SEEMS), 2018, pp. 1–5.
[17] I. Z. Amalia, A. N. P. Bimantoro, A. Z. Arifin, M. Faisol, R. Indraswari, and R. W. Sholikah, “INDONESIAN-TRANSLATED HADITH CONTENT WEIGHTING IN PSEUDO-RELEVANCE FEEDBACK QUERY EXPANSION,” J. Ilm. Kursor, vol. 11, no. 1, 2021.
[18] W. L. Ningrum and I. Humaini, “PRE-PROCESSING PENDUKUNG INFORMATION RETRIEVAL MELALUI PEMBENTUKAN KORPUS AL-QURAN TERJEMAHAN BAHASA INDONESIA,” in SNIA (Seminar Nasional Informatika dan Aplikasinya), 2020, vol. 4, pp. B34-36.
[19] A. Jauhari, I. O. Suzanti, Y. D. Pramudita, and N. P. W. Diantisari, “Enhanced Confix Stripping Stemmer And Cosine Similarity For Search Engine in The Holy Qur’an Translation,” in 2020 6th Information Technology International Seminar (ITIS), 2020, pp. 207–212.
[20] L. Agusta, “Perbandingan algoritma stemming Porter dengan algoritma Nazief & Adriani untuk stemming dokumen teks bahasa indonesia,” Konf. Nas. Sist. dan Inform., vol. 2009, pp. 196–201, 2009.
[21] D. Wahyudi, T. Susyanto, and D. Nugroho, “Implementasi dan analisis algoritma stemming nazief & adriani dan porter pada dokumen berbahasa indonesia,” J. Ilm. SINUS, vol. 15, no. 2, pp. 49–56, 2017.
[22] M. Alif, F. Solihin, and H. Husni, “Perbandingan Metode Enhanced Confix Stripping dan Porter Stemmer Untuk Stemming Konten Bahasa Indonesia,” 2014.
[23] R. Premalatha and S. Srinivasan, “Text processing in information retrieval system using vector space model,” in International Conference on Information Communication and Embedded Systems (ICICES2014), 2014, pp. 1–6.
[24] A. Jain, A. Jain, N. Chauhan, V. Singh, and N. Thakur, “Information retrieval using cosine and jaccard similarity measures in vector space model,” Int. J. Comput. Appl., vol. 164, no. 6, pp. 28–30, 2017.
[25] O. Nurdiana, J. Jumadi, and D. Nursantika, “Perbandingan metode Cosine Similarity dengan metode Jaccard Similarity pada aplikasi pencarian terjemah Al-Qur’an dalam Bahasa Indonesia,” J. Online Inform., vol. 1, no. 1, pp. 59–63, 2016.
[26] M. Chahal, “Information Retrieval using Dice Similarity Coefficient,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 6, no. 6, pp. 72–75, 2016.
[27] T. Yusnitasari, I. Humaini, L. Wulandari, and D. Ikasari, “Informatian Retrieval for Popular Words in Bahasa Translation of Al Quran and Hadith Bukhori Using Enhance Confix Stripping (ECS) Stemming,” Am. J. Softw. Eng. Appl., vol. 8, no. 1, p. 18, 2019.
[28] N. J. M. Verdaningroem and A. Saifudin, “Penerapan Kamus Dasar Pada Algoritma Porter Untuk Mengurangi Kesalahan Stemming Bahasa Indonesia,” J. Teknol., vol. 10, no. 2, pp. 103–112, 2018.
[29] M. D. R. Wahyudi, “Penerapan Algoritma Cosine Similarity pada Text Mining Terjemah Al-Qur’an Berdasarkan Keterkaitan Topik,” Semesta Tek., vol. 22, no. 1, pp. 41–50, 2019.
[30] M. N. Khidfi, I. Isnawaty, and J. Y. Sari, “RANCANG BANGUN APLIKASI PENDETEKSIAN KESAMAAN PADA DOKUMEN TEKS MENGGUNAKAN ALGORITMA ENHANCED CONFIX STRIPPING DAN ALGORITMA WINNOWING,” semanTIK, vol. 4, no. 2, pp. 1–10, 2018.
[31] Y. N. Fadziah and E. F. Rahman, “Penerapan Algoritma Enchanced Confix Stripping dalam Pengukuran Keterbacaan Teks Menggunakan Gunning Fog Index,” JATIKOM J. Apl. dan Teor. Ilmu Komput., vol. 1, no. 1, pp. 15–24, 2018.
[32] R. T. Wahyuni, D. Prastiyanto, and E. Supraptono, “Penerapan Algoritma Cosine Similarity dan Pembobotan TF-IDF pada Sistem Klasifikasi Dokumen Skripsi,” J. Tek. Elektro, vol. 9, no. 1, pp. 18–23, 2017, doi: 10.15294/jte.v9i1.10955.
[33] W. B. Croft, D. Metzler, and T. Strohman, Search engines: Information retrieval in practice, vol. 520. Addison-Wesley Reading, 2010.
[34] A. D. Fikri, “Perbandingan metode dice similarity dengan cosine similarity menggunakan query expansion pada pencarian ayatul ahkam dalam terjemah Alquran berbahasa Indonesia.” Universitas Islam Negeri Maulana Malik Ibrahim, 2018.
[35] D. Marutho, “PERBANDINGAN METODE NAÏVE BAYES, KNN, DECISION TREE PADA LAPORAN WATER LEVEL JAKARTA,” INFOKAM, vol. 15, no. 2, 2019.

Downloads

Published

2022-01-11

Issue

Section

Articles

Citation Check