INSIPR: Image-Text Synthesis Pipeline for Intelligent Retrieval and Generation

Authors

  • Viomesh Singh
  • Prithviraj Jadhav
  • Hritesh Maikap
  • Sarvesh Jadhav
  • Chinmay Ingale
  • Sahil Jadhav

DOI:

https://doi.org/10.70917/ijcisim-2025-0044

Abstract

The INSPIR (Image-Text Synthesis Pipeline for Intelligent Retrieval and Generation) framework introduces an innovative approach for image captioning and image retrieval by leveraging an ensemble of state-of-the-art models. This research proposes a method that generates descriptive captions from images using an ensemble of BLIP (Bootstrapping Language-Image Pre-training), ViT-GPT2 (Vision Transformer combined with GPT-2), and GIT (Generative Image Text) and employing CLIP (Contrastive Language-Image Pre-training) for ranking the generated captions based on their relevance. The impact of temperature scaling and ensemble weights on the generated caption ranking was analyzed to evaluate the system, revealing insights regarding the balance of relevance and diversity. Testing on the Flickr8k dataset demonstrated the model's effectiveness, achieving cosine similarity, BLEU, and METEOR scores on randomly selected photos. The top-ranked captions are utilized by Llama3.1 to produce creative outputs tailored for various applications, including social media captions and image notes. By integrating multiple modalities within a unified semantic space through contrastive learning, this work aims to advance the field of image captioning beyond conventional classification tasks, offering a generalized model performance that addresses the complexities of language and vision. One of the major applications of the INSPIR model is image retrieval, where the system enhances capabilities by annotating uploaded images and enabling users to conduct text-based searches, facilitating efficient access to relevant visual content.

Downloads

Download data is not yet available.

Downloads

Published

2025-12-31

How to Cite

Singh, V., Jadhav, P., Maikap, H., Jadhav, S., Ingale, C., & Jadhav, S. (2025). INSIPR: Image-Text Synthesis Pipeline for Intelligent Retrieval and Generation. International Journal of Computer Information Systems and Industrial Management Applications, 17, 730–745. https://doi.org/10.70917/ijcisim-2025-0044

Issue

Section

Original Articles