Advancements and challenges in speech emotion recognition: A comprehensive review - Details

author：

Wang, Jiaxin (Wang, Jiaxin.) ^[1] | Yin, Hao (Yin, Hao.) ^[2] | Zhou, Yiding (Zhou, Yiding.) ^[3] | Xi, Wei (Xi, Wei.) ^[4]

Indexed by：

EI Scopus

Abstract：

As　the　importance　of　human-computer　interaction　(HCI)　continues　to　strengthen　and　the　field　of　deep　learning　evolves,　numerous　models　have　found　their　application　in　the　realm　of　Speech　Emotion　Recognition　(SER),　leading　to　significant　advancements　in　recent　years.　However,　effectively　recognizing　and　processing　human　emotions　through　computational　systems　remains　a　complex　and　formidable　challenge.　This　review　aims　to　provide　a　comprehensive　summary　of　the　latest　accomplishments　in　SER,　encompassing　a　diverse　range　of　application　scenarios,　from　education　and　healthcare　to　criminal　investigation.　Additionally,　it　delves　into　various　models　and　preprocessing　techniques　such　as　Convolutional　Neural　Networks　(CNN),　Convolutional　Recurrent　Neural　Networks　(CRNN),　Long　Short-Term　Memory　(LSTM),　and　datasets　like　RAVDESS　and　RECOLA,　which　encompass　a　wide　array　of　scenes　and　languages.　While　the　recent　strides　in　SER　have　undeniably　achieved　impressive　accuracy　rates,　a　notable　gap　exists　in　research　that　addresses　more　intricate　emotional　contexts,　including　situations　involving　irony　or　sarcasm.　Consequently,　this　review　focuses　on　a　comprehensive　analysis　of　the　limitations　inherent　in　different　feature　engineering　strategies.　Moreover,　it　investigates　the　challenge　of　interpretability　posed　by　complex　models,　the　constraint　posed　by　singular　and　hard-to-gather　datasets,　and　the　expansive　scope　of　potential　applications　SER　could　serve.　Considering　these　complexities,　a　potential　pathway　to　further　enhance　SER＇s　effectiveness　and　applicability　is　proposed.　This　involves　exploring　the　concept　of　non-binary　emotion　classification,　harnessing　rich　contextual　information,　and　integrating　datasets　that　incorporate　gesture　and　textual　data.　By　adapting　feature　extraction　techniques　to　align　with　the　unique　demands　of　specific　scenarios,　the　performance　of　SER　models　could　be　markedly　improved.　©　2024　SPIE.

Keyword：

Classification (of information) Complex networks Convolution Convolutional neural networks Emotion Recognition Human computer interaction Long short-term memory Speech recognition

Community：

[ 1 ] [Wang, Jiaxin]Maynooth College, Fuzhou University, Fuzhou; 350108, China
[ 2 ] [Yin, Hao]Sino-British College, University of Shanghai for Science and Technology, Shanghai; 200129, China
[ 3 ] [Zhou, Yiding]Computer College, Chongqing University of Posts and Telecommunications, Chongqing; 400000, China
[ 4 ] [Xi, Wei]Physics College, Nanjing University, Nanjing; 210008, China

Reprint 's Address：

Email：

Show more details

Version：

Advancements and challenges in speech emotion recognition: A comprehensive review
2024，Proceedings of SPIE - The International Society for Optical Engineering

Related Keywords：

Emotion recognition by deeply learned multi-channel textual and EEG features
2021，Future Generation Computer Systems
Speech Emotion Analysis Based on Vision Transformer
2023，2022 2nd Conference on High Performance Computing and Communication Engineering, HPCCE 2022
A model for legal judgment prediction based on multi-model fusion
2019，3rd IEEE International Conference on Electronic Information Technology and Computer Engineering, EITCE 2019
Research on Image Scene Classification Using Lightweight Convolutional Neural Networks Based on Fusion Attention Mechanism
2023，2023 International Conference on Computer Simulation and Modeling, Information Security, CSMIS 2023

Source ：

ISSN： 0277-786X

Year： 2024

Volume： 13077

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

梅努斯国际工程学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to