• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Lin, Yuqin (Lin, Yuqin.) [1] | Wang, Longbiao (Wang, Longbiao.) [2] | Dang, Jianwu (Dang, Jianwu.) [3] | Minematsu, Nobuaki (Minematsu, Nobuaki.) [4]

Indexed by:

EI

Abstract:

Speech disorders can significant impact speakers’ articulation, resulting in large variations in speech. These variations can affect the performance of Automatic Speech Recognition (ASR), limiting the access of individuals with speech disorders to the benefits provided by this technology. Previous research on human speech perception has shown that both auditory and articulatory information play important roles, with the latter being more effective when the input speech is distorted. When a sound is perceived, the brain processes its auditory features and activates neural simulations of the articulatory movements associated with that sound. Throughout this process, acoustic and articulatory information often enhance each other, improving the overall comprehension and processing of the auditory stimulus. Motivated by these findings, this study proposes an Inclusive Gestural Feature Extraction (InGesFE) method and a Multi-Feature Co-Activation Module (MF-CoAct) to address the challenge of large variability in dysarthric ASR. The InGesFE method extracts features using a richness constraint and a phoneme distinctiveness constraint, enabling them to share similar characteristics with articulatory gestures, including: (1) rich aspects of input speech, (2) phonemic distinctiveness, and (3) robustness in conveying intent. Meanwhile, the MF-CoAct facilitates the co-activation of auditory and articulatory (gestural) features through a statistical variable-based activation network. Additionally, a continual pre-training method is designed to support faster and more effective adaptation to highly variable speech. To evaluate the effectiveness of the proposed method, two widely used dysarthria datasets, TORGO and UASpeech, are employed. Across both datasets, our approach led to a relative word error rate reduction (WERR) of 13.75%–15.37% for single-word recognition and 36.48% for multiword recognition compared to the baseline. It outperformed existing methods for speakers with severe dysarthria and very low intelligibility, reaching a word error rate (WER) of 51.41% on the UASpeech dataset. It also demonstrated increased robustness in noisy environments, achieving a 19.16% WERR in single-word recognition and a 38.49% WERR in multiword recognition under noisy conditions. Further analysis indicates that the features extracted by InGesFE capture richer articulatory information beyond auditory features alone, particularly improving the representation of co-articulatory cues. © 2025 Elsevier B.V.

Keyword:

Audition Chemical activation Errors Extraction Feature extraction Speech communication Speech intelligibility Speech processing Speech recognition

Community:

  • [ 1 ] [Lin, Yuqin]College of Computer and Data Science, Fuzhou University, Fujian; 350108, China
  • [ 2 ] [Lin, Yuqin]Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin; 300350, China
  • [ 3 ] [Wang, Longbiao]Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin; 300350, China
  • [ 4 ] [Wang, Longbiao]Huiyan Technology (Tianjin) Co., Ltd, Tianjin; 300350, China
  • [ 5 ] [Dang, Jianwu]Shenzhen Institute of Advanced Technology, Chinese Academy of Science, Shenzhen; 518055, China
  • [ 6 ] [Minematsu, Nobuaki]University of Tokyo, Tokyo; 113-8656, Japan

Reprint 's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Information Fusion

ISSN: 1566-2535

Year: 2026

Volume: 125

1 4 . 8 0 0

JCR@2023

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 0

Affiliated Colleges:

Online/Total:554/11081610
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1