Progressive fusion of local and global image features for cross-modal image aesthetic assessment - Details

author：

Niu, Yuzhen (Niu, Yuzhen.) ^[1] | Chen, Siling (Chen, Siling.) ^[2] | Chen, Shanshan (Chen, Shanshan.) ^[3] | Li, Fusheng (Li, Fusheng.) ^[4]

Indexed by：

Abstract：

Image　aesthetic　assessment　(IAA)　has　drawn　wide　attention　in　recent　years.　This　task　aims　to　predict　the　aesthetic　quality　of　images　by　simulating　human　aesthetic　perception　mechanism,　thereby　assisting　users　in　selecting　images　with　higher　aesthetic　value.　For　IAA,　the　local　information　and　various　global　semantic　information　contained　in　an　image,　such　as　composition,　theme,　and　emotion,　all　play　a　crucial　role.　Existing　CNN-based　methods　attempt　to　use　multi-branch　strategies　to　extract　local　and　global　semantic　information　related　to　IAA　from　images.　However,　these　methods　can　only　extract　limited　and　specific　global　semantic　information,　and　requiring　additional　labeled　datasets.　Furthermore,　some　cross-modal　IAA　methods　have　been　proposed　to　use　both　images　and　user　comments,　but　they　often　fail　to　fully　explore　the　valuable　information　within　each　modality　and　the　correlations　between　cross-modal　features,　affecting　cross-modal　IAA　accuracy.　Considering　these　limitations,　in　this　paper,　we　propose　a　cross-modal　IAA　model　that　progressively　fuses　local　and　global　image　features.　The　model　consists　of　a　progressive　local　and　global　image　feature　fusion　branch,　a　text　feature　enhancement　branch,　and　a　cross-modal　feature　fusion　module.　In　the　image　branch,　we　introduce　an　inter-layer　feature　fusion　module　(IFFM)　and　adopt　a　progressive　way　to　interact　and　fuse　the　extracted　local　and　global　features　to　obtain　more　comprehensive　image　features.　In　the　text　branch,　we　propose　a　text　feature　enhancement　module　(TFEM)　to　strengthen　the　extracted　text　features,　so　as　to　mine　more　effective　textual　information.　Meanwhile,　considering　the　intrinsic　correlation　between　image　and　text　features,　we　propose　a　cross-modal　feature　fusion　module　(CFFM)　to　integrate　and　fuse　image　features　with　text　features　for　aesthetic　assessment.　Experimental　results　on　the　AVA　(Aesthetic　Visual　Analysis)　dataset　validate　the　superiority　of　our　method　for　IAA　task.　©　The　Author(s),　under　exclusive　licence　to　Springer-Verlag　GmbH　Germany,　part　of　Springer　Nature　2025.

Keyword：

Feature extraction Image denoising Image enhancement Image fusion Labeled data Modal analysis Photointerpretation Semantics Text mining

Community：

[ 1 ] [Niu, Yuzhen]Fujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fujian, Fuzhou; 350108, China
[ 2 ] [Niu, Yuzhen]Engineering Research Center of Big Data Intelligence, Ministry of Education, Fujian, Fuzhou; 350108, China
[ 3 ] [Chen, Siling]Fujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fujian, Fuzhou; 350108, China
[ 4 ] [Chen, Shanshan]Fujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fujian, Fuzhou; 350108, China
[ 5 ] [Li, Fusheng]Fujian Key Laboratory of Network Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fujian, Fuzhou; 350108, China

Reprint 's Address：

[li, fusheng]fujian key laboratory of network computing and intelligent information processing, college of computer and data science, fuzhou university, fujian, fuzhou; 350108, china;;

Email：

lifusheng.chn@gmail.com

Show more details

Related Keywords：

Extracting Surface Defect Contours of Bridge Underwater Pile-pier Structures based on Lightweight Network and Transfer Learning
2024，China Journal of Highway and Transport
Integrating Height Features for Multi-scale Urban Building Type Classification from High- Resolution Remote Sensing Images
2021，Journal of Geo-Information Science
EAFNet: Feature Enhancement and Self-Adaptive Guided Feature Fusion for Object Detection in Haze Conditions
2023，3rd International Conference on Electronic Information Engineering and Computer, EIECT 2023
A JPEG image blind steganography detection method using KCCA feature fusion
2012，2012 International Conference on Wavelet Analysis and Pattern Recognition, ICWAPR 2012
Semantic Segmentation of Remote Sensing Image Based on Contextual U-Net
2023，2nd International Conference on Applied Statistics, Computational Mathematics, and Software Engineering, ASCMSE 2023

Source ：

Multimedia Systems

ISSN： 0942-4962

Year： 2025

Issue： 2

Volume： 31

3 . 5 0 0

JCR@2023

CAS Journal Grade：4

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to