INSIPR: Image-Text Synthesis Pipeline for Intelligent Retrieval and Generation

Viomesh Singh; Prithviraj Jadhav; Hritesh Maikap; Sarvesh Jadhav; Chinmay Ingale; Sahil Jadhav

doi:10.70917/ijcisim-2025-0044

Authors

Viomesh Singh
Prithviraj Jadhav
Hritesh Maikap
Sarvesh Jadhav
Chinmay Ingale
Sahil Jadhav

DOI:

https://doi.org/10.70917/ijcisim-2025-0044

Abstract

The INSPIR (Image-Text Synthesis Pipeline for Intelligent Retrieval and Generation) framework introduces an innovative approach for image captioning and image retrieval by leveraging an ensemble of state-of-the-art models. This research proposes a method that generates descriptive captions from images using an ensemble of BLIP (Bootstrapping Language-Image Pre-training), ViT-GPT2 (Vision Transformer combined with GPT-2), and GIT (Generative Image Text) and employing CLIP (Contrastive Language-Image Pre-training) for ranking the generated captions based on their relevance. The impact of temperature scaling and ensemble weights on the generated caption ranking was analyzed to evaluate the system, revealing insights regarding the balance of relevance and diversity. Testing on the Flickr8k dataset demonstrated the model's effectiveness, achieving cosine similarity, BLEU, and METEOR scores on randomly selected photos. The top-ranked captions are utilized by Llama3.1 to produce creative outputs tailored for various applications, including social media captions and image notes. By integrating multiple modalities within a unified semantic space through contrastive learning, this work aims to advance the field of image captioning beyond conventional classification tasks, offering a generalized model performance that addresses the complexities of language and vision. One of the major applications of the INSPIR model is image retrieval, where the system enhances capabilities by annotating uploaded images and enabling users to conduct text-based searches, facilitating efficient access to relevant visual content.

Downloads

Download data is not yet available.

INSIPR: Image-Text Synthesis Pipeline for Intelligent Retrieval and Generation

Authors

DOI:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Information