Explainable Multimodal Deep Transfer Learning Framework for Real-Time Indoor Navigation and Assistive Guidance of Visually Impaired Individuals
DOI:
https://doi.org/10.70917/ijcisim-2026-2534Keywords:
Indoor Navigation, Visually Impaired Assistance, Deep Transfer Learning, Explainable Artificial Intelligence (XAI), Object Detection and Depth Estimation, Multimodal Speech–Haptic GuidanceAbstract
Indoor navigation is still a big problem for the visually impaired because of the lack of awareness of obstacles, estimation of distances of objects, understanding of the indoor environment, and selecting safe indoor navigation paths on the fly. The current assistive systems have some drawbacks such as limited environmental awareness and interpretability, and lack of integration of multimodal guidance mechanisms. To tackle these issues, we introduce an Explainable Multimodal Deep Transfer Learning Framework (EMDTLF) to enable the real-time indoor navigation and assistive guidance for people with BLIND by combining transfer learning-based object detection, monocular depth estimation, intelligent path planning and explainable artificial intelligence (XAI). The framework takes advantage of the NYU Depth V2 and Indoor Location & Navigation datasets to fuse RGB images, depth data, and sensor data for a detailed understanding of the scene and obstacle avoidance. Feature extraction and object detection are done using EfficientNet-B3 and YOLOv8, respectively and the SHAP based explainability increases transparency of the decision. Experimental results show that the performance is superior over previous approaches with 96.4% precision, 95.7% recall, 96.0% F1 score and 97.2% mAP for object detection. The proposed framework also achieved a depth estimation accuracy of 95.8%, path planning success rate of 97.6%, navigation accuracy of 96.8%, and obstacle avoidance performance of 97.3% which are much better than the conventional methods. The novelty of this work is threefold: firstly, the integration of explainable multimodal learning and secondly, depth-aware navigation into a single framework, and lastly, the integration of adaptive speech-haptic guidance. The proposed system provides an effective, interpretable and reliable solution to improve the independent mobility and navigation safety of blind and low vision people in their home environment.