College English Translation Intelligent Systems Technology Based on Multi-Source Information Corpus Acquisition and Data Fusion

Authors

  • Wentao Meng Department of Basic Education, Beihai Campus, Guilin University of Electronic Technology, Beihai 536000, China
  • Lei Yu School of Sports and Arts, Harbin Sport University, Harbin 230000, China
  • Yunyun Zhu Department of Economics and Management, Beihai Campus, Guilin University of Electronic Technology, Beihai 536000, China

Abstract

Translation scenarios for college English are moving in interdisciplinary and cross-disciplinary directions, meaning that traditional methods that rely on limited textbook materials are unable to meet the diverse needs of students. To solve the problems of insufficient coverage of multi-source corpora and limited adaptability of translation models, a corpus acquisition framework integrating knowledge discovery and domain discrimination was developed, and a translation model based on data fusion was designed. By using a web crawling system to capture bilingual discourse level corpora in new fields, and combining sentence segmentation strategies and dynamic programming to optimize alignment accuracy, high-quality parallel corpora
were generated. At the same time, a bidirectional encoder representation combined with WordPiece model was proposed to enhance sequence annotation performance by integrating part-of-speech and syntactic dependency features. The experiment outcomes showed that the proposed model had an accuracy of 0.92 and a processing time of only 12 seconds. For the translation, the proposed model had an accuracy close to 1.0 after 900 iterations, a false positive rate reduced to 0.05, and a translation time of 16 seconds, significantly better than traditional models. The results indicate that multi-source corpus acquisition and data fusion techniques can effectively enhance the processing capability of translation systems for complex contexts, providing high-precision solutions for interdisciplinary English translation.

Keywords: Multi-source information; Corpus; Data fusion; Bidirectional encoder; WordPiece

Cite As

W. Meng, L. Yu, Y. Zhu, "College English Translation Intelligent Systems Technology Based on Multi-Source
Information Corpus Acquisition and Data Fusion", Engineering Intelligent Systems, vol. 34 no. 1, pp. 109-118,
2026.

Published

2026-01-01