INDONESIAN-TRANSLATED HADITH CONTENT WEIGHTING IN PSEUDO-RELEVANCE FEEDBACK QUERY EXPANSION
DOI:
https://doi.org/10.21107/kursor.v11i1.249Keywords:
Indonesian translated hadith, Information retrieval system, Named Entity Recognition, Pseudo-Relevance Feedback, Query ExpansionAbstract
In general, hadith consists of isnad and matan (content). Matan can be separated into several components for example a story, main content, and some additional information. Other texts besides main content, such as isnad and story can interfere the retrieval process of relevant documents because most users typically use simple queries. Thus, in this paper, we proposed a Named Entity Recognition (NER) component weighting model in improving the Indonesian hadith retrieval system. We did 3 test scenarios, the first scenario (S1) did not separate the hadith into several components, the second scenario (S2) separated the hadith into 2 components, isnad and matan, and the third scenario separated the hadith into 4 components, isnad, background story, content, and additional information. From the experimental results, it is found that the TF-IDF with rocchio algorithm in query expansion outperforms DocVec. Also, separation and weighting of the hadith components affect the retrieval performance because isnad can be considered as noise in a query. Separation of 2 separate components had the best overall results in general although 4 separate components showed better results in some cases with precision up to 100% and 70% recall.
Downloads
References
[2] N. N. Amirah, T. M. Rahim, Z. Mabni, H. M. Hanum, and N. A. Rahman, “A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI),†in 2016 Third International Conference on Information Retrieval and Knowledge Management (CAMP), 2017, pp. 118–123, doi: 10.1109/INFRKM.2016.7806346.
[3] A. Aulia, D. Khairani, and N. Hakiem, “Development of a retrieval system for Al Hadith in Bahasa (case study: Hadith Bukhari),†in 2017 5th International Conference on Cyber and IT Service Management (CITSM), 2017, pp. 1–5, doi: 10.1109/CITSM.2017.8089323.
[4] N. A. Rahman, Z. A. Bakar, and T. M. T. Sembok, “Query expansion using thesaurus in improving Malay Hadith retrieval system,†in 2010 International Symposium on Information Technology, Kuala Lumpur, Malaysia, Jun. 2010, pp. 1404–1409, doi: 10.1109/ITSIM.2010.5561518.
[5] I. Humaini, T. Yusnitasari, L. Wulandari, D. Ikasari, and H. Dutt, “Informatian Retrieval of Indonesian Translated version of Al Quran and Hadith Bukhori Muslim,†in 2018 International Conference on Sustainable Energy, Electronics, and Computing Systems (SEEMS), 2019, pp. 1–5, doi: 10.1109/SEEMS.2018.8687330.
[6] A. Zayd, “Hadith: Muhammad’s Legacy in the Medieval and Modern World, 2nd ed.: By Jonathan A.C. Brown, London: Oneworld Publications, 2018. 353 pages.,†Am. J. Islam. Soc. Sci., vol. 36, no. 2, pp. 64–73, Apr. 2019, doi: 10.35632/ajiss.v36i2.575.
[7] Q. V. Le and T. Mikolov, “Distributed Representations of Sentences and Documents,†ArXiv14054053 Cs, May 2014, Accessed: Jan. 14, 2021. [Online]. Available: http://arxiv.org/abs/1405.4053.
[8] J. J. Rocchio, “Relevance feedback in information retrieval,†in The {SMART} Retrieval System -- Experiments in Automatic Document Processing, Englewood Cliffs: Prentice Hall, 1971, pp. 313–323.
[9] Buckley, Chris and Salton, Gerard, “Optimization of relevance feedback weights,†in Proceedings of the 18th annual international ACM SIGIR, New York, 1995, pp. 351–357, doi: 10.1145/215206.215383.
[10] A. T. Ni’mah and A. Z. Arifin, “Perbandingan Metode Term Weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis,†Rekayasa, vol. 13, no. 2, pp. 172–180, Aug. 2020, doi: 10.21107/rekayasa.v13i2.6412.
[11] A. Librian, “Sastrawi,†github, 2016. https://github.com/sastrawi/sastrawi#pustaka (accessed Jan. 04, 2021).
[12] A. D. Tahitoe and D. Purwitasari, “Implementasi Modifikasi Enhanced Confix Stripping Stemmer Untuk Bahasa Indonesia Dengan Metode Corpus Based Stemming,†J. Ilm. ITS, pp. 1–15, 2010.
[13] A. Jelita, “Effective Techniques for Indonesian Text Retrieval,†Doctor of Philosophy, RMIT University, 2007.
[14] A. Z. Arifin, I. P. A. K. Mahendra, and H. T. Ciptaningtyas, “ENHANCED CONFIX STRIPPING STEMMER AND ANTS ALGORITHM FOR CLASSIFYING NEWS DOCUMENT IN INDONESIAN LANGUAGE,†pp. 149–158, 2007.
[15] N. Rastogi, P. Verma, and P. Kumar, “Evaluation of Information Retrieval Performance Metrics using Real Estate Ontology,†in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, Aug. 2020, pp. 102–106, doi: 10.1109/ICSSIT48917.2020.9214285.
[16] D. Widiyatmoko and A. Setiyono, “Information Retrieval of Physical Force Using the TF-IDF,†in 2019 International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, Jul. 2019, pp. 519–522, doi: 10.1109/ICOIACT46704.2019.8938554.
[17] A. N. Khusna and I. Agustina, “Implementation of Information Retrieval Using Tf-Idf Weighting Method On Detik.Com’s Website,†in 2018 12th International Conference on Telecommunication Systems, Services, and Applications (TSSA), Yogyakarta, Indonesia, Indonesia, 2019, pp. 1–4.