Gestural feature extraction and multi-feature co-activation for dysarthric speech recognition - Details

author：

Lin, Yuqin (Lin, Yuqin.) ^[1] | Wang, Longbiao (Wang, Longbiao.) ^[2] | Dang, Jianwu (Dang, Jianwu.) ^[3] | Minematsu, Nobuaki (Minematsu, Nobuaki.) ^[4]

Indexed by：

Abstract：

Speech　disorders　can　significant　impact　speakers’　articulation,　resulting　in　large　variations　in　speech.　These　variations　can　affect　the　performance　of　Automatic　Speech　Recognition　(ASR),　limiting　the　access　of　individuals　with　speech　disorders　to　the　benefits　provided　by　this　technology.　Previous　research　on　human　speech　perception　has　shown　that　both　auditory　and　articulatory　information　play　important　roles,　with　the　latter　being　more　effective　when　the　input　speech　is　distorted.　When　a　sound　is　perceived,　the　brain　processes　its　auditory　features　and　activates　neural　simulations　of　the　articulatory　movements　associated　with　that　sound.　Throughout　this　process,　acoustic　and　articulatory　information　often　enhance　each　other,　improving　the　overall　comprehension　and　processing　of　the　auditory　stimulus.　Motivated　by　these　findings,　this　study　proposes　an　Inclusive　Gestural　Feature　Extraction　(InGesFE)　method　and　a　Multi-Feature　Co-Activation　Module　(MF-CoAct)　to　address　the　challenge　of　large　variability　in　dysarthric　ASR.　The　InGesFE　method　extracts　features　using　a　richness　constraint　and　a　phoneme　distinctiveness　constraint,　enabling　them　to　share　similar　characteristics　with　articulatory　gestures,　including:　(1)　rich　aspects　of　input　speech,　(2)　phonemic　distinctiveness,　and　(3)　robustness　in　conveying　intent.　Meanwhile,　the　MF-CoAct　facilitates　the　co-activation　of　auditory　and　articulatory　(gestural)　features　through　a　statistical　variable-based　activation　network.　Additionally,　a　continual　pre-training　method　is　designed　to　support　faster　and　more　effective　adaptation　to　highly　variable　speech.　To　evaluate　the　effectiveness　of　the　proposed　method,　two　widely　used　dysarthria　datasets,　TORGO　and　UASpeech,　are　employed.　Across　both　datasets,　our　approach　led　to　a　relative　word　error　rate　reduction　(WERR)　of　13.75%–15.37%　for　single-word　recognition　and　36.48%　for　multiword　recognition　compared　to　the　baseline.　It　outperformed　existing　methods　for　speakers　with　severe　dysarthria　and　very　low　intelligibility,　reaching　a　word　error　rate　(WER)　of　51.41%　on　the　UASpeech　dataset.　It　also　demonstrated　increased　robustness　in　noisy　environments,　achieving　a　19.16%　WERR　in　single-word　recognition　and　a　38.49%　WERR　in　multiword　recognition　under　noisy　conditions.　Further　analysis　indicates　that　the　features　extracted　by　InGesFE　capture　richer　articulatory　information　beyond　auditory　features　alone,　particularly　improving　the　representation　of　co-articulatory　cues.　©　2025　Elsevier　B.V.

Keyword：

Audition Chemical activation Errors Extraction Feature extraction Speech communication Speech intelligibility Speech processing Speech recognition

Community：

[ 1 ] [Lin, Yuqin]College of Computer and Data Science, Fuzhou University, Fujian; 350108, China
[ 2 ] [Lin, Yuqin]Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin; 300350, China
[ 3 ] [Wang, Longbiao]Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin; 300350, China
[ 4 ] [Wang, Longbiao]Huiyan Technology (Tianjin) Co., Ltd, Tianjin; 300350, China
[ 5 ] [Dang, Jianwu]Shenzhen Institute of Advanced Technology, Chinese Academy of Science, Shenzhen; 518055, China
[ 6 ] [Minematsu, Nobuaki]University of Tokyo, Tokyo; 113-8656, Japan

Reprint 's Address：

Email：

Show more details

Related Keywords：

Two speech enhancement-based hearing aid systems and comparative study
2015，5th International Conference on Information Science and Technology, ICIST 2015
Using AI-Based Speech Recognition Systems to Advance Listening and Speaking Skills in French Education
2025，9th International Symposium on Innovative Approaches in Smart Technologies, ISAS 2025
Research on the methods of building brand image from the multimedia perspective
2014，2nd International Conference on Innovation, Communication and Engineering, ICICE 2013
A hearing aid on-chip system based on accuracy optimized front- and back-end blocks
2014，Journal of Semiconductors

Source ：

Information Fusion

ISSN： 1566-2535

Year： 2026

Volume： 125

1 4 . 8 0 0

JCR@2023

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 7

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to