2
Department of Computer Engineering , Yazd University, Yazd, Iran
3
Department of Electrical Engineering , Yazd University, Yazd, Iran
Abstract
Automatic image captioning is a challenging task in computer vision and aims to generate computer-understandable descriptions for images. Employing convolutional neural networks (CNN) has a key role in image caption generation. However, during the process of generating descriptions for an image, there are two major challenges for CNN, such as: they do not consider the relationships and spatial hierarchical structures between the objects in the image, and the lack of resistance against rotational changes of the images. In order to solve these challenges, this paper presents an improved capsule network to describe image content using natural language processing by considering the relations between the objects . A capsule contains a set of neurons that consider the parameters of the state of objects in the image, such as size, direction, scale, and relationships of objects to each other. These capsules have a special focus on extracting meaningful features for use in the process of generating relevant descriptions for a given set of images. Qualitative tests on the MS-COCO dataset using the capsule network and ELMo embedding technique have resulted in 2-5% improvement in the evaluated metrics compared to existing image captioning models.
javanmardi, S., Latif, ,. A. M., & Sadeghi, M. T. (2023). Automatic image captioning using capsule neural network and ELMo embedding technique. Journal of Machine Vision and Image Processing, 10(1), 75-91.
MLA
shima javanmardi; , Ali Mohammad Latif; Mohammad Taghi Sadeghi. "Automatic image captioning using capsule neural network and ELMo embedding technique". Journal of Machine Vision and Image Processing, 10, 1, 2023, 75-91.
HARVARD
javanmardi, S., Latif, ,. A. M., Sadeghi, M. T. (2023). 'Automatic image captioning using capsule neural network and ELMo embedding technique', Journal of Machine Vision and Image Processing, 10(1), pp. 75-91.
VANCOUVER
javanmardi, S., Latif, ,. A. M., Sadeghi, M. T. Automatic image captioning using capsule neural network and ELMo embedding technique. Journal of Machine Vision and Image Processing, 2023; 10(1): 75-91.