Indexed by:
Abstract:
Vocal pattern recognition provides an objective reference for subthreshold depression recognition diagnosis and intervention evaluation. In this study, we used different speech modalities (reading /a:/ tone, text reading, picture description, free interview) and emotional stimuli (positive, neutral, negative), fused four types of speech features such as rhyme, timbre, spectrum, and resonance peak, extracted 16 feature parameters such as Mel-frequency cepstrum coefficient, speed of sound, fundamental frequency, and resonance peak, established a subthreshold depression risk prediction model using random forest, and compared the performance with other classifiers. The results showed that the recognition rate of picture description and free interview before fusing features was higher than other speech modalities, in which the prediction results of positive stimuli were better with 72.50% and 67.39% accuracy; the high accuracy rates of 93.00% and 85.00% were obtained for reading /a:/ tone and free interview after fusing feature layers, respectively. It can be seen that the phonetic information learned by the model after fusing features contains not only the subject's emotional state but also the interrelationship between feature types; the reading /a:/ tone and free interview retain more vocal tract information, where the reading /a:/ tone vocalization is persistent and the sound intensity is sustained, and the free interview speech volume and features are comprehensive and close to natural speech, which are informative for early risk prediction of subthreshold depression. © 2024 Journal of Fudan University (Natural Science). All rights reserved.
Keyword:
Reprint 's Address:
Email:
Source :
Journal of Fudan University (Natural Science)
ISSN: 0427-7104
Year: 2024
Issue: 3
Volume: 63
Page: 344-350
Cited Count:
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1