Analyzing Visual Content across Frames to Identify Objects and Generating Textual Descriptions for Images

Authors

  • Soumya Upadhyay
  • Kamal Kumar Gola
  • Ravindra Kothiyal
  • Mridula

Abstract

Generating an image description is called a caption. Caption generators have been growing popularity since 2012. Every day, viewers come across many images from different sources that they want to interpret themselves, such as the Internet, TV, news articles, social network sites and more. Though generating caption for an image is a challenging task but recent advancement in Computer Vision has made it possible that a machine can be a storyteller of an image. Most of the time, we do not have images with descriptions. But normal human can easily understand those images. If humans want an automatic image caption from a machine, then machine needs to interpret some image captions. Image Caption Generation model is mostly based on Encoder-Decoder approach like a language translation model. Language translation model uses RNN (Recurrent Neural Network) for Encoding as well as Decoding but Image Caption Generation model uses CNN (Convolutional Neural Network) for Encoding and RNN for Decoding. In this paper, the machine takes an image as input and process a sequence of words as output (caption).

Downloads

Download data is not yet available.

Downloads

Published

2024-07-10

How to Cite

Soumya Upadhyay, Kamal Kumar Gola, Ravindra Kothiyal, & Mridula. (2024). Analyzing Visual Content across Frames to Identify Objects and Generating Textual Descriptions for Images . International Journal of Computer Information Systems and Industrial Management Applications, 16(3), 15. Retrieved from https://cspub-ijcisim.org/index.php/ijcisim/article/view/707

Issue

Section

Original Articles