INDONESIAN-TRANSLATED HADITH CONTENT WEIGHTING IN PSEUDO-RELEVANCE FEEDBACK QUERY EXPANSION

  • Ivanda Zevi Amalia Departemen Teknik Informatika, Institut Teknologi Sepuluh Nopember, Surabaya
  • Akbar Noto Ponco Bimantoro
  • Agus Zainal Arifin
  • Maryamah Faisol
  • Rarasmaya Indraswari
  • Riska Wakhidatus Sholikah

Abstract

In general, hadith consists of isnad and matan (content). Matan can be separated into several components for example a story, main content, and some additional information. Other texts besides main content, such as isnad and story can interfere the retrieval process of relevant documents because most users typically use simple queries. Thus, in this paper, we proposed a Named Entity Recognition (NER) component weighting model in improving the Indonesian hadith retrieval system. We did 3 test scenarios, the first scenario (S1) did not separate the hadith into several components, the second scenario (S2) separated the hadith into 2 components, isnad and matan, and the third scenario separated the hadith into 4 components, isnad, background story, content, and additional information. From the experimental results, it is found that the TF-IDF with rocchio algorithm in query expansion outperforms DocVec. Also, separation and weighting of the hadith components affect the retrieval performance because isnad can be considered as noise in a query. Separation of 2 separate components had the best overall results in general although 4 separate components showed better results in some cases with precision up to 100% and 70% recall.

Downloads

Download data is not yet available.

References

[1] I. Rasyidi, A. Romadhony, and A. T. Wibowo, “Indonesian Hadith Retrieval System using thesaurus,” in 2013 International Conference on Computer, Control, Informatics and Its Applications (IC3INA), 2013, pp. 285–288, doi: 10.1109/IC3INA.2013.6819189.
[2] N. N. Amirah, T. M. Rahim, Z. Mabni, H. M. Hanum, and N. A. Rahman, “A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI),” in 2016 Third International Conference on Information Retrieval and Knowledge Management (CAMP), 2017, pp. 118–123, doi: 10.1109/INFRKM.2016.7806346.
[3] A. Aulia, D. Khairani, and N. Hakiem, “Development of a retrieval system for Al Hadith in Bahasa (case study: Hadith Bukhari),” in 2017 5th International Conference on Cyber and IT Service Management (CITSM), 2017, pp. 1–5, doi: 10.1109/CITSM.2017.8089323.
[4] N. A. Rahman, Z. A. Bakar, and T. M. T. Sembok, “Query expansion using thesaurus in improving Malay Hadith retrieval system,” in 2010 International Symposium on Information Technology, Kuala Lumpur, Malaysia, Jun. 2010, pp. 1404–1409, doi: 10.1109/ITSIM.2010.5561518.
[5] I. Humaini, T. Yusnitasari, L. Wulandari, D. Ikasari, and H. Dutt, “Informatian Retrieval of Indonesian Translated version of Al Quran and Hadith Bukhori Muslim,” in 2018 International Conference on Sustainable Energy, Electronics, and Computing Systems (SEEMS), 2019, pp. 1–5, doi: 10.1109/SEEMS.2018.8687330.
[6] A. Zayd, “Hadith: Muhammad’s Legacy in the Medieval and Modern World, 2nd ed.: By Jonathan A.C. Brown, London: Oneworld Publications, 2018. 353 pages.,” Am. J. Islam. Soc. Sci., vol. 36, no. 2, pp. 64–73, Apr. 2019, doi: 10.35632/ajiss.v36i2.575.
[7] Q. V. Le and T. Mikolov, “Distributed Representations of Sentences and Documents,” ArXiv14054053 Cs, May 2014, Accessed: Jan. 14, 2021. [Online]. Available: http://arxiv.org/abs/1405.4053.
[8] J. J. Rocchio, “Relevance feedback in information retrieval,” in The {SMART} Retrieval System -- Experiments in Automatic Document Processing, Englewood Cliffs: Prentice Hall, 1971, pp. 313–323.
[9] Buckley, Chris and Salton, Gerard, “Optimization of relevance feedback weights,” in Proceedings of the 18th annual international ACM SIGIR, New York, 1995, pp. 351–357, doi: 10.1145/215206.215383.
[10] A. T. Ni’mah and A. Z. Arifin, “Perbandingan Metode Term Weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis,” Rekayasa, vol. 13, no. 2, pp. 172–180, Aug. 2020, doi: 10.21107/rekayasa.v13i2.6412.
[11] A. Librian, “Sastrawi,” github, 2016. https://github.com/sastrawi/sastrawi#pustaka (accessed Jan. 04, 2021).
[12] A. D. Tahitoe and D. Purwitasari, “Implementasi Modifikasi Enhanced Confix Stripping Stemmer Untuk Bahasa Indonesia Dengan Metode Corpus Based Stemming,” J. Ilm. ITS, pp. 1–15, 2010.
[13] A. Jelita, “Effective Techniques for Indonesian Text Retrieval,” Doctor of Philosophy, RMIT University, 2007.
[14] A. Z. Arifin, I. P. A. K. Mahendra, and H. T. Ciptaningtyas, “ENHANCED CONFIX STRIPPING STEMMER AND ANTS ALGORITHM FOR CLASSIFYING NEWS DOCUMENT IN INDONESIAN LANGUAGE,” pp. 149–158, 2007.
[15] N. Rastogi, P. Verma, and P. Kumar, “Evaluation of Information Retrieval Performance Metrics using Real Estate Ontology,” in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, Aug. 2020, pp. 102–106, doi: 10.1109/ICSSIT48917.2020.9214285.
[16] D. Widiyatmoko and A. Setiyono, “Information Retrieval of Physical Force Using the TF-IDF,” in 2019 International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, Jul. 2019, pp. 519–522, doi: 10.1109/ICOIACT46704.2019.8938554.
[17] A. N. Khusna and I. Agustina, “Implementation of Information Retrieval Using Tf-Idf Weighting Method On Detik.Com’s Website,” in 2018 12th International Conference on Telecommunication Systems, Services, and Applications (TSSA), Yogyakarta, Indonesia, Indonesia, 2019, pp. 1–4.
Published
2021-07-01
How to Cite
AMALIA, Ivanda Zevi et al. INDONESIAN-TRANSLATED HADITH CONTENT WEIGHTING IN PSEUDO-RELEVANCE FEEDBACK QUERY EXPANSION. Jurnal Ilmiah Kursor, [S.l.], v. 11, n. 1, july 2021. ISSN 2301-6914. Available at: <http://kursorjournal.org/index.php/kursor/article/view/249>. Date accessed: 30 july 2021. doi: https://doi.org/10.21107/kursor.v11i1.249.
Section
Articles