IMAGE CAPTIONING USING TRANSFORMER WITH IMAGE FEATURE EXTRACTION BY XCEPTION AND INCEPTION-V3
DOI:
https://doi.org/10.21107/kursor.v12i3.376Keywords:
batch_size, image captioning, Inception-V3, Xception, TransformerAbstract
Image captioning is a task in image processing that involves creating text descriptions that can describe the image content. The formation of the image captioning system model is influenced by image interpretation related to the given image caption. Image interpretation is influenced by the feature extraction used. This research proposes feature extraction with Xception and Inception-V3 by generating an image captioning model using Transformer. Model performance is measured based on BLUE and METEOR values. Based on the results of research conducted on the Flickr8k Dataset, it shows that the best model performance is using Xception feature extraction and batch_size = 256. The image captioning performance of Xception feature extraction for BLUE-1, BLUE-2, BLUE-3, BLUE-4, and METEOR when compared with Inception-V3 achieves increasing of 13.15%, 18.03%, 18.71%, 27.27%, and 15.43% respectively. The performance for Xception feature extraction with batch_size = 256 compared with batch_size = 128, increasing BLUE-1, BLUE-2, BLUE-3, BLUE-4, and METEOR namely 19.81%, 41.84%, 52.23%, 53.14%, and 31.56% respectively.
Downloads
References
A. Elhagry and K. Kadaoui, “A Thorough Review on Recent Deep Learning Methodologies for Image Captioning,” Jul. 2021, [Online]. Available: http://arxiv.org/abs/2107.13114
U. L. Yuhana, I. Imamah, C. Fatichah, and B. J. Santoso, “Effectiveness Of Deep Learning Approach For Text Classification In Adaptive Learning,” Jurnal Ilmiah Kursor, vol. 11, no. 3, p. 137, Jul. 2022, doi: 10.21107/kursor.v11i3.285.
Q. Wang, J. Wan, and A. B. Chan, “On Diversity in Image Captioning: Metrics and Methods,” IEEE Trans Pattern Anal Mach Intell, vol. 44, no. 2, pp. 1035–1049, Feb. 2022, doi: 10.1109/TPAMI.2020.3013834..
K. R. Chowdhary, “Natural Language Processing,” in Fundamentals of Artificial Intelligence, New Delhi: Springer India, 2020, pp. 603–649. doi: 10.1007/978-81-322-3972-7_19.
A. Mathew, P. Amudha, and S. Sivakumari, “Deep Learning Techniques: An Overview,” 2021, pp. 599–608. doi: 10.1007/978-981-15-3383-9_54.
A. E. Minarno, L. Aripa, Y. Azhar, and Y. Munarko, “Classification of Malaria Cell Image using Inception-V3 Architecture,” JOIV : International Journal on Informatics Visualization, vol. 7, no. 2, pp. 273–278, May 2023, doi: 10.30630/joiv.7.2.1301.
I. Fahruzi, “Sleep Disorder Identification From Single Lead ECG By Improving Hyperparameters Of 1D-CNN,” Jurnal Ilmiah Kursor, vol. 11, no. 4, pp. 157–164, Jan. 2023, doi: 10.21107/kursor.v11i4.302.
N. Jethwa, H. Gabajiwala, A. Mishra, P. Joshi, and P. Natu, “Comparative Analysis between InceptionResnetV2 and InceptionV3 for Attention based Image Captioning,” in 2021 2nd Global Conference for Advancement in Technology (GCAT), IEEE, Oct. 2021, pp. 1–6. doi: 10.1109/GCAT52182.2021.9587514.
N. Mathur, T. Baldwin, and T. Cohn, “Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics,” Jun. 2020, [Online]. Available: http://arxiv.org/abs/2006.06264
Fawaidul Badri, M. Taqijuddin Alawiy, and Eko Mulyanto Yuniarno, “Deep Learning Architecture Based On Convolutional Neural Network (CNN) In Image Classification,” Jurnal Ilmiah Kursor, vol. 12, no. 2, pp. 83–92, Dec. 2023, doi: 10.21107/kursor.v12i2.349.
D. Rizki Chandranegara, F. Haidar Pratama, S. Fajrianur, M. Rizky Eka Putra, and Z. Sari, “Automated Detection of Breast Cancer Histopathology Image Using Convolutional Neural Network and Transfer Learning,” vol. 22, no. 3, pp. 455–468, 2023, doi: 10.30812/matrik.v22i3.xxx.
R. H. Jatmiko and Y. Pristyanto, “Investigating The Effectiveness of Various Convolutional Neural Network Model Architectures for Skin Cancer Melanoma Classification,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 1, pp. 1–16, Oct. 2023, doi: 10.30812/matrik.v23i1.3185.
A. Pal, S. Kar, A. Taneja, and V. Kumar Jadoun, “Image Captioning and Comparison of Different Encoders,” J Phys Conf Ser, vol. 1478, no. 1, p. 012004, Apr. 2020, doi: 10.1088/1742-6596/1478/1/012004.
S. Sharma and S. Kumar, “The Xception model: A potential feature extractor in breast cancer histology images classification,” ICT Express, vol. 8, no. 1, pp. 101–108, Mar. 2022, doi: 10.1016/j.icte.2021.11.010.
X. Wu, R. Liu, H. Yang, and Z. Chen, “An Xception Based Convolutional Neural Network for Scene Image Classification with Transfer Learning,” in 2020 2nd International Conference on Information Technology and Computer Application (ITCA), IEEE, Dec. 2020, pp. 262–267. doi: 10.1109/ITCA52113.2020.00063.
R. F. Hadi, S. Sa’adah, and D. Adytia, “Forecasting of GPU Prices Using Transformer Method,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 12, no. 1, pp. 136–144, Mar. 2023, doi: 10.32736/sisfokom.v12i1.1569.
Z. Wang, Y. Ma, Z. Liu, and J. Tang, “R-Transformer: Recurrent Neural Network Enhanced Transformer,” Jul. 2019, [Online]. Available: http://arxiv.org/abs/1907.05572
H. Saadany and C. Orasan, “BLEU, METEOR, BERTScore: Evaluation of Metrics Performance in Assessing Critical Translation Errors in Sentiment-oriented Text,” Sep. 2021, doi: 10.26615/978-954-452 071-7_006.
N. Dong, L. Zhao, C. H. Wu, and J. F. Chang, “Inception v3 based cervical cell classification combined with artificially extracted features,” Appl Soft Comput, vol. 93, Aug. 2020, doi: 10.1016/j.asoc.2020.106311.
A. Vaswani, et al.,"Attention is all you need, " in 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
H. Sharma, M. Agrahari, S. K. Singh, M. Firoj, and R. K. Mishra, “Image Captioning: A Comprehensive Survey,” in 2020 International Conference on Power Electronics and IoT Applications in Renewable Energy and its Control, PARC 2020, Institute of Electrical and Electronics Engineers Inc., Feb. 2020, pp. 325–328. doi: 10.1109/PARC49193.2020.236619.
A. Lavie, A. Agarwal, Meteor: "An automatic metric for mt evaluation with improved correlation with human judgments," in The Second Workshop on Statistical Machine Translation, 2007, pp. 228–231
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2024 Jasman Pardede, Fandi
This work is licensed under a Creative Commons Attribution 4.0 International License.