A Multi-Class Classification Architecture for Diabetic Retinopathy Detection Using Hybrid EfficientNetV2-S and Spatial Attention
DOI:
https://doi.org/10.70917/ijcisim-2026-2539Keywords:
Diabetic Retinopathy, Deep Learning, EfficientNetV2, Spatial Attention Mechanism, Test-Time Augmentation, Edge ComputingAbstract
Background and Objective: Automated diabetic retinopathy (DR) screening in low-resource settings demands high diagnostic precision without the computational bottlenecks of heavy multi-model ensembles. This study introduces a lightweight, single-weight hybrid architecture engineered to match ensemble-level accuracy while maintaining edge-deployable inference speeds.
Methods: A custom Spatial Attention module was integrated into an EfficientNetV2-S backbone to enforce the biological localization of pathological features. To maximize ordinal consistency without increasing structural parameters, a 4-pass geometric Test-Time Augmentation (TTA) consensus wrapper was deployed. The model underwent zero-shot validation on the unseen, external APTOS 2019 dataset (N=3,662) to rigorously evaluate crossdataset generalization.
Results: On the independent holdout set, the architecture achieved a global accuracy of 90.36% and an elite Quadratic Weighted Kappa (QWK) of 0.9580, including a near-perfect 0.99 triage recall for healthy retinas. Computational benchmarking on an NVIDIA Tesla T4 GPU yielded a per-image inference latency of 67.07 ms (± 2.19 ms) and a throughput of 14.91 FPS. This allows a standard 500-patient clinic workload (1,000 images) to be completely processed in approximately 67 seconds.
Conclusion: By coupling targeted spatial attention with inference-time geometric wrappers, this highly interpretable framework achieves state-of-the-art ordinal DR grading within realtime execution bounds, satisfying the strict hardware constraints of point-of-care screening.