• Jasman Pardede Institut Teknologi Nasional (Itenas) Bandung, Indonesia
  • Fandi Institut Teknologi Nasional (Itenas) Bandung, Indonesia



batch_size, image captioning, Inception-V3, Xception, Transformer


Image captioning is a task in image processing that involves creating text descriptions that can describe the image content. The formation of the image captioning system model is influenced by image interpretation related to the given image caption. Image interpretation is influenced by the feature extraction used. This research proposes feature extraction with Xception and Inception-V3 by generating an image captioning model using Transformer. Model performance is measured based on BLUE and METEOR values. Based on the results of research conducted on the Flickr8k Dataset, it shows that the best model performance is using Xception feature extraction and batch_size = 256. The image captioning performance of Xception feature extraction for BLUE-1, BLUE-2, BLUE-3, BLUE-4, and METEOR when compared with Inception-V3 achieves increasing of 13.15%, 18.03%, 18.71%, 27.27%, and 15.43% respectively. The performance for Xception feature extraction with batch_size = 256 compared with batch_size = 128, increasing BLUE-1, BLUE-2, BLUE-3, BLUE-4, and METEOR namely 19.81%, 41.84%, 52.23%, 53.14%, and 31.56% respectively.


Download data is not yet available.


