Indexed by:
Abstract:
[Objective] To reduce semantic deviation and loss caused by language differences and text feature selection in the text classification process while preserving more textual information. [Methods] Firstly, we used a pre-trained SBERT model for sentence representation. Secondly, we calculated the sentence similarity between texts with a Sentence Vectors Rotator’s Similarity method. We also applied sentence weighting within texts to form vectors. Finally, we combined machine learning and neural network classification methods to achieve cross-lingual text classification. [Results] We conducted experiments on multiple cross-lingual text datasets in Chinese, English, Russian, French, and Spanish, and the multilingual public dataset Reuters demonstrated that the proposed method significantly improved accuracy compared to existing methods. Additionally, recall, precision, and F1 scores also showed enhancements. [Limitations] The study does not consider the impact of sentence position within the text on its weight. [Conclusions] The proposed model could reduce semantic deviation and loss, thus improving the performance of cross-lingual text classification. © 2025 Chinese Academy of Sciences. All rights reserved.
Keyword:
Reprint 's Address:
Email:
Source :
Data Analysis and Knowledge Discovery
ISSN: 2096-3467
Year: 2025
Issue: 2
Volume: 9
Page: 39-47
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: