COMPARISON OF STEMMING AND SIMILARITY ALGORITHMS IN INDONESIAN TRANSLATED AL-QUR'AN TEXT SEARCH

Ika Oktavia Suzanti; Achmad Jauhari

doi:10.21107/kursor.v11i2.280

Authors

Ika Oktavia Suzanti University of Trunojoyo Madura, Indonesia
Achmad Jauhari

DOI:

https://doi.org/10.21107/kursor.v11i2.280

Keywords:

Information Retrieval, Enhanced Confix Stripping, Nazief and Adriani, Cosine Similarity, Dice Similarity

Abstract

The long history of information retrieval did not begin with Internet. Prior to widespread public daily use of search engines, in the 1960s information retrieval systems were discovered in commercial and intelligence applications. There are two stages in Information Retrieval in doing its main job which is to preprocessing text and to calculate similarity between term (word) and query (keyword) user searched for in a document. Stemming is final stage of pre-processing in an information retrieval system. The way stemming works is to remove affixes from a word, in form of prefixes, suffixes and insertions into form of basic word. Thus, in this paper we did compare search on information retrieval system without using stemming algorithm, using stemming Porter, Nazief & Adriani and Enhanced Confix Stripping with similarity method used is cosine similarity and dice similarity. Based on test results, text search ability on dice similarity is faster in stemming process with Porter Stemmer and ECS algorithms. While in Nazief & Adriani algorithm and without stemming, cosine similarity is faster than dice similarity.

Downloads

Download data is not yet available.

References

[1] F. Malik, â€œThe Qurâ€™an in English Translation Complete,â€ Mideast. Coexistence, 2007.
[2] A. M. Abualkishik, K. Omar, and G. A. Odiebat, â€œQEFSM model and Markov Algorithm for translating Quran reciting rules into Braille code,â€ J. King Saud Univ. Inf. Sci., vol. 27, no. 3, pp. 238â€“247, 2015.
[3] M. F. Hilmi, M. F. Haron, O. Majid, and Y. Mustapha, â€œAuthentication of electronic version of the Holy Quran: an information security perspective,â€ in 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, 2013, pp. 61â€“65.
[4] M. Almazrooie, A. Samsudin, A. A.-A. Gutub, M. S. Salleh, M. A. Omar, and S. A. Hassan, â€œIntegrity verification for digital Holy Quran verses using cryptographic hash function and compression,â€ J. King Saud Univ. Inf. Sci., vol. 32, no. 1, pp. 24â€“34, 2020.
[5] S. Raharjo, R. Wardoyo, and A. E. Putra, â€œDetecting proper nouns in indonesian-language translation of the quran using a guided method,â€ J. King Saud Univ. Inf. Sci., vol. 32, no. 5, pp. 583â€“591, 2020.
[6] Y. S. Yogi Suntono, â€œImplementasi Text Mining Pada Aplikasi Search Engine Tafsir Al-Qurâ€™an Menggunakan Metode Cosine Similarity.â€ TEKNIK INFORMATIKA, 2017.
[7] M. Sanderson and W. B. Croft, â€œThe history of information retrieval research,â€ Proc. IEEE, vol. 100, no. Special Centennial Issue, pp. 1444â€“1451, 2012.
[8] M. A. Hearst, â€œâ€™Naturalâ€™search user interfaces,â€ Commun. ACM, vol. 54, no. 11, pp. 60â€“67, 2011.
[9] P. Seethalaksmi, â€œSemantic search based efficient retrieval of educational multimedia information using service oriented architecture.â€
[10] C. W. Choo, B. Detlor, and D. Turnbull, â€œInformation Seeking on the Web--An Integrated Model of Browsing and Searching.,â€ 1999.
[11] A. A. Magriyanti, â€œAnalisis Pengembangan Algoritma Porter Stemming Dalam Bahasa Indonesia,â€ 2018.
[12] B. C. Ningrum, â€œPerbandingan Algoritma Stemming untuk Bahasa Indonesia dengan Parameter Akurasi dan Waktu Proses,â€ 2019.
[13] R. Melita, â€œPenerapan Metode Term Frequency Inverse Document Frequency (Tf-Idf) Dan Cosine Similarity Pada Sistem Temu Kembali Informasi Untuk Mengetahui Syarah Hadits Berbasis Web (Studi Kasus: Hadits Shahih Bukhari-Muslim),â€ Fakultas Sains dan Teknologi UIN Syarif Hidayatullah Jakarta, 2018.
[14] R. C. N. Santi, S. Eniyati, R. Retnowati, and H. Yulianton, â€œPENGGUNAAN SISTEM TEMU KEMBALI DALAM PENCARIAN KATA UNTUK TERJEMAHAN AL QURAN,â€ 2019.
[15] B. Poernomo et al., â€œSistem Information Retrieval Pencarian Kesamaan Ayat Terjemahan Al Quran Berbahasa Indonesia,â€ Semin. Nas. Teknol. Inf. dan Komun., pp. 100â€“108, 2015.
[16] I. Humaini, T. Yusnitasari, L. Wulandari, D. Ikasari, and H. Dutt, â€œInformatian Retrieval of Indonesian Translated version of Al Quran and Hadith Bukhori Muslim,â€ in 2018 International Conference on Sustainable Energy, Electronics, and Computing Systems (SEEMS), 2018, pp. 1â€“5.
[17] I. Z. Amalia, A. N. P. Bimantoro, A. Z. Arifin, M. Faisol, R. Indraswari, and R. W. Sholikah, â€œINDONESIAN-TRANSLATED HADITH CONTENT WEIGHTING IN PSEUDO-RELEVANCE FEEDBACK QUERY EXPANSION,â€ J. Ilm. Kursor, vol. 11, no. 1, 2021.
[18] W. L. Ningrum and I. Humaini, â€œPRE-PROCESSING PENDUKUNG INFORMATION RETRIEVAL MELALUI PEMBENTUKAN KORPUS AL-QURAN TERJEMAHAN BAHASA INDONESIA,â€ in SNIA (Seminar Nasional Informatika dan Aplikasinya), 2020, vol. 4, pp. B34-36.
[19] A. Jauhari, I. O. Suzanti, Y. D. Pramudita, and N. P. W. Diantisari, â€œEnhanced Confix Stripping Stemmer And Cosine Similarity For Search Engine in The Holy Qurâ€™an Translation,â€ in 2020 6th Information Technology International Seminar (ITIS), 2020, pp. 207â€“212.
[20] L. Agusta, â€œPerbandingan algoritma stemming Porter dengan algoritma Nazief & Adriani untuk stemming dokumen teks bahasa indonesia,â€ Konf. Nas. Sist. dan Inform., vol. 2009, pp. 196â€“201, 2009.
[21] D. Wahyudi, T. Susyanto, and D. Nugroho, â€œImplementasi dan analisis algoritma stemming nazief & adriani dan porter pada dokumen berbahasa indonesia,â€ J. Ilm. SINUS, vol. 15, no. 2, pp. 49â€“56, 2017.
[22] M. Alif, F. Solihin, and H. Husni, â€œPerbandingan Metode Enhanced Confix Stripping dan Porter Stemmer Untuk Stemming Konten Bahasa Indonesia,â€ 2014.
[23] R. Premalatha and S. Srinivasan, â€œText processing in information retrieval system using vector space model,â€ in International Conference on Information Communication and Embedded Systems (ICICES2014), 2014, pp. 1â€“6.
[24] A. Jain, A. Jain, N. Chauhan, V. Singh, and N. Thakur, â€œInformation retrieval using cosine and jaccard similarity measures in vector space model,â€ Int. J. Comput. Appl., vol. 164, no. 6, pp. 28â€“30, 2017.
[25] O. Nurdiana, J. Jumadi, and D. Nursantika, â€œPerbandingan metode Cosine Similarity dengan metode Jaccard Similarity pada aplikasi pencarian terjemah Al-Qurâ€™an dalam Bahasa Indonesia,â€ J. Online Inform., vol. 1, no. 1, pp. 59â€“63, 2016.
[26] M. Chahal, â€œInformation Retrieval using Dice Similarity Coefficient,â€ Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 6, no. 6, pp. 72â€“75, 2016.
[27] T. Yusnitasari, I. Humaini, L. Wulandari, and D. Ikasari, â€œInformatian Retrieval for Popular Words in Bahasa Translation of Al Quran and Hadith Bukhori Using Enhance Confix Stripping (ECS) Stemming,â€ Am. J. Softw. Eng. Appl., vol. 8, no. 1, p. 18, 2019.
[28] N. J. M. Verdaningroem and A. Saifudin, â€œPenerapan Kamus Dasar Pada Algoritma Porter Untuk Mengurangi Kesalahan Stemming Bahasa Indonesia,â€ J. Teknol., vol. 10, no. 2, pp. 103â€“112, 2018.
[29] M. D. R. Wahyudi, â€œPenerapan Algoritma Cosine Similarity pada Text Mining Terjemah Al-Qurâ€™an Berdasarkan Keterkaitan Topik,â€ Semesta Tek., vol. 22, no. 1, pp. 41â€“50, 2019.
[30] M. N. Khidfi, I. Isnawaty, and J. Y. Sari, â€œRANCANG BANGUN APLIKASI PENDETEKSIAN KESAMAAN PADA DOKUMEN TEKS MENGGUNAKAN ALGORITMA ENHANCED CONFIX STRIPPING DAN ALGORITMA WINNOWING,â€ semanTIK, vol. 4, no. 2, pp. 1â€“10, 2018.
[31] Y. N. Fadziah and E. F. Rahman, â€œPenerapan Algoritma Enchanced Confix Stripping dalam Pengukuran Keterbacaan Teks Menggunakan Gunning Fog Index,â€ JATIKOM J. Apl. dan Teor. Ilmu Komput., vol. 1, no. 1, pp. 15â€“24, 2018.
[32] R. T. Wahyuni, D. Prastiyanto, and E. Supraptono, â€œPenerapan Algoritma Cosine Similarity dan Pembobotan TF-IDF pada Sistem Klasifikasi Dokumen Skripsi,â€ J. Tek. Elektro, vol. 9, no. 1, pp. 18â€“23, 2017, doi: 10.15294/jte.v9i1.10955.
[33] W. B. Croft, D. Metzler, and T. Strohman, Search engines: Information retrieval in practice, vol. 520. Addison-Wesley Reading, 2010.
[34] A. D. Fikri, â€œPerbandingan metode dice similarity dengan cosine similarity menggunakan query expansion pada pencarian ayatul ahkam dalam terjemah Alquran berbahasa Indonesia.â€ Universitas Islam Negeri Maulana Malik Ibrahim, 2018.
[35] D. Marutho, â€œPERBANDINGAN METODE NAÃVE BAYES, KNN, DECISION TREE PADA LAPORAN WATER LEVEL JAKARTA,â€ INFOKAM, vol. 15, no. 2, 2019.

COMPARISON OF STEMMING AND SIMILARITY ALGORITHMS IN INDONESIAN TRANSLATED AL-QUR'AN TEXT SEARCH

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

Citation Check

Make a Submission

system

TOOLS

tanggal_penting

Important Date

template2

certificate

histats

purcase_contact

Purchase Contact

Information