Indexed by:
Abstract:
Long tail effect and excessive out-of-vocabulary(OOV) words in social media texts result in severe feature sparsity and reduce classification accuracy. To solve the problem, a social media text classification method based on character-word feature self-attention learning is proposed. Global features are constructed at the character level to learn attention weight distribution, and the existing multi-head attention mechanism is improved to reduce parameter scale and computational complexity. To further analyze character-word feature fusion, OOV sensitivity is proposed to measure the impact of OOV words on different types of features. Experiments on several social media text classification tasks indicate that the effectiveness and classification accuracy of the proposed method are obviously improved in terms of fusing word features and character features. Moreover, the quantitative results of OOV vocabulary sensitivity index verify the feasiblity and effectiveness of the proposed method. © 2020, Science Press. All right reserved.
Keyword:
Reprint 's Address:
Email:
Source :
Pattern Recognition and Artificial Intelligence
ISSN: 1003-6059
Year: 2020
Issue: 4
Volume: 33
Page: 287-294
Affiliated Colleges: