Transformer-Based Model for Radiology Text Reports Generation from Frontal and Lateral Chest X-ray Images
Abstract
A medical radiology report for a patient’s chest X-ray image is a textual description of the abnormalities and normalities of the patient’s lungs, heart, and chest. The report is a translated text according to the radiologist’s diagnosis of the image. It helps to communicate the diagnosis to no radiologist or medical experts and track the patient’s health condition in the future. In general, a medical report intended for use by busy medical practitioners in a stressful work environment requires a comprehensive report written with simple and grammatically correct language. The manual writing of reports is a tedious task that consumes time and is vulnerable to possible human errors. In this work, we propose a pipeline for building based-transformer models to process the task of automatic report generation. We included the experiments that justified the choice of the models and configurations of the feature extractor encoder, transformer attention layers, and training parameters. We show that (1) First, the pretrained ViT feature extractor performs better than the CNN-based encoder (DensNet121); and (2) Second, the dual-view input (Frontal and lateral images) give better result than one input (Frontal or lateral).