Combining Multimodal Learning Models to Enhance Contextual Comprehension in English Translation
DOI:
https://doi.org/10.70917/ijcisim-2026-0096Keywords:
multimodal learning; English translation; context theory; cross-cultural communication; artificial intelligenceAbstract
This study proposes a novel English translation method that integrates multimodal learning with contextual theory to enhance the system's ability to understand context and cultural context. By introducing a multimodal knowledge graph, integrating information sources such as text and images, and employing a relational graph attention network for deep encoding, the method effectively constructs semantic association embedding representations. The innovatively designed cross-modal alignment module significantly optimizes the semantic alignment capability between images and text, enabling the system to handle cultural metaphors and professional terminology with greater precision. Experimental results show that this method achieves scores of 87.6% and 86.3% in semantic accuracy and cultural equivalence, respectively, with fluency scores improving from 7.2 to 8.9, demonstrating its strong advantages in translating complex contextual tasks. This research not only expands the boundaries of translation technology but also provides a new paradigm for the integrated development of linguistics, artificial intelligence, and cross-cultural studies. The findings are expected to have broad application value in international communication, translation education, and the development of intelligent translation systems.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Rui Shi

This work is licensed under a Creative Commons Attribution 4.0 International License.