Indexed by:
Abstract:
Blind quality assessment for user-generated content (UGC) or consumer videos is challenging in computer vision. Two open issues are yet to be addressed: how to effectively extract high-dimensional spatial-temporal features of consumer videos and how to appropriately model the relationship between these features and user perceptions within a unified blind video quality assessment (BVQA). To tackle these issues, we propose a novel BVQA model with spatial-temporal perception and fusion. Firstly, we develop two perception modules to extract the perceptual-distortion-related features separately from the spatial and temporal domains. In particular, the temporal-domain features are obtained with a combination of 3D ConvNet and residual frames for their high efficiencies in capturing the motion-specific temporal features. Secondly, we propose a feature fusion module that adaptively combines spatial-temporal features. Finally, we map the fused features onto perceptual quality. Experimental results demonstrate that our model outperforms other advanced methods in conducting subjective video quality prediction.
Keyword:
Reprint 's Address:
Version:
Source :
MULTIMEDIA TOOLS AND APPLICATIONS
ISSN: 1380-7501
Year: 2023
Issue: 7
Volume: 83
Page: 18969-18986
3 . 0
JCR@2023
3 . 0 0 0
JCR@2023
ESI Discipline: COMPUTER SCIENCE;
ESI HC Threshold:32
JCR Journal Grade:2
CAS Journal Grade:3