Explicable Multi-Scale Attention-Based Deep Learning Framework for Medical Image Fusion and Diagnostic Feature Safeguarding Analysis in Multi-Modal Radiological Imaging
DOI:
https://doi.org/10.70917/ijcisim-2026-2490Abstract
Multi-modal radiological image fusion is intended to combine multiple complementary anatomical and functional modalities, such as computed tomography, magnetic resonance imaging, positron emission tomography and single-photon emission computed tomography, in a way that is diagnostic. While transform-domain based methods and the recent deep learning approaches have enhanced picture quality, these have yet not been interpretable, nor have they preserved the picture's diagnostic features well, nor have they provided sufficient multi-scale context modelling. In this paper, an Explainable Multi-Scale Attention Fusion Network (EMA-FuseNet) for simulation-based medical image fusion and diagnostic feature-preservation analysis is proposed. The framework features modality-specific pre-processing, affine+deformable registration, multi-scale encoder blocks, channel attention, spatial attention, cross-modal attention, edge-aware reconstruction, and build a explainability layer with Grad-CAM-style saliency, attention heatmaps and feature-attribution summaries. Hypothetical data of 1,000 paired radiological images were simulated for experimental validation and validation, consisting of 700 CT-MRI cases, 300 PET-MRI cases, and 200 CT-SPECT cases in 70/15/15 allocations for training, validation and test sets, respectively. The performance of the proposed scheme was evaluated by employing the PSNR, SSIM, MSE, entropy, mutual information, edge preservation index, feature similarity index along with the proposed Diagnostic Feature Retention Score. The simulated results demonstrated that EMA-FuseNet has superior SSIM (0.956), PSNR (39.2 dB), and DFRS (94.6) relative to the PCA baseline, the DWT baseline, the CNN baseline, the DenseFuse baseline, the SwinFusion baseline, and the generic TF baseline. The statistical testing showed that significant improvements would be obtained over the best transformer baseline. The study presents an equivalent clinically interpretable unified design that could be tested via future large-scale real multi-institutional radiological datasets.