• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Wang, Jiaxin (Wang, Jiaxin.) [1] | Yin, Hao (Yin, Hao.) [2] | Zhou, Yiding (Zhou, Yiding.) [3] | Xi, Wei (Xi, Wei.) [4]

Indexed by:

EI Scopus

Abstract:

As the importance of human-computer interaction (HCI) continues to strengthen and the field of deep learning evolves, numerous models have found their application in the realm of Speech Emotion Recognition (SER), leading to significant advancements in recent years. However, effectively recognizing and processing human emotions through computational systems remains a complex and formidable challenge. This review aims to provide a comprehensive summary of the latest accomplishments in SER, encompassing a diverse range of application scenarios, from education and healthcare to criminal investigation. Additionally, it delves into various models and preprocessing techniques such as Convolutional Neural Networks (CNN), Convolutional Recurrent Neural Networks (CRNN), Long Short-Term Memory (LSTM), and datasets like RAVDESS and RECOLA, which encompass a wide array of scenes and languages. While the recent strides in SER have undeniably achieved impressive accuracy rates, a notable gap exists in research that addresses more intricate emotional contexts, including situations involving irony or sarcasm. Consequently, this review focuses on a comprehensive analysis of the limitations inherent in different feature engineering strategies. Moreover, it investigates the challenge of interpretability posed by complex models, the constraint posed by singular and hard-to-gather datasets, and the expansive scope of potential applications SER could serve. Considering these complexities, a potential pathway to further enhance SER's effectiveness and applicability is proposed. This involves exploring the concept of non-binary emotion classification, harnessing rich contextual information, and integrating datasets that incorporate gesture and textual data. By adapting feature extraction techniques to align with the unique demands of specific scenarios, the performance of SER models could be markedly improved. © 2024 SPIE.

Keyword:

Classification (of information) Complex networks Convolution Convolutional neural networks Emotion Recognition Human computer interaction Long short-term memory Speech recognition

Community:

  • [ 1 ] [Wang, Jiaxin]Maynooth College, Fuzhou University, Fuzhou; 350108, China
  • [ 2 ] [Yin, Hao]Sino-British College, University of Shanghai for Science and Technology, Shanghai; 200129, China
  • [ 3 ] [Zhou, Yiding]Computer College, Chongqing University of Posts and Telecommunications, Chongqing; 400000, China
  • [ 4 ] [Xi, Wei]Physics College, Nanjing University, Nanjing; 210008, China

Reprint 's Address:

Email:

Show more details

Version:

Related Keywords:

Related Article:

Source :

ISSN: 0277-786X

Year: 2024

Volume: 13077

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 4

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Online/Total:32/10095666
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1