Indexed by:
Abstract:
Image aesthetic assessment (IAA) has drawn wide attention in recent years. This task aims to predict the aesthetic quality of images by simulating human aesthetic perception mechanism, thereby assisting users in selecting images with higher aesthetic value. For IAA, the local information and various global semantic information contained in an image, such as composition, theme, and emotion, all play a crucial role. Existing CNN-based methods attempt to use multi-branch strategies to extract local and global semantic information related to IAA from images. However, these methods can only extract limited and specific global semantic information, and requiring additional labeled datasets. Furthermore, some cross-modal IAA methods have been proposed to use both images and user comments, but they often fail to fully explore the valuable information within each modality and the correlations between cross-modal features, affecting cross-modal IAA accuracy. Considering these limitations, in this paper, we propose a cross-modal IAA model that progressively fuses local and global image features. The model consists of a progressive local and global image feature fusion branch, a text feature enhancement branch, and a cross-modal feature fusion module. In the image branch, we introduce an inter-layer feature fusion module (IFFM) and adopt a progressive way to interact and fuse the extracted local and global features to obtain more comprehensive image features. In the text branch, we propose a text feature enhancement module (TFEM) to strengthen the extracted text features, so as to mine more effective textual information. Meanwhile, considering the intrinsic correlation between image and text features, we propose a cross-modal feature fusion module (CFFM) to integrate and fuse image features with text features for aesthetic assessment. Experimental results on the AVA (Aesthetic Visual Analysis) dataset validate the superiority of our method for IAA task.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
MULTIMEDIA SYSTEMS
ISSN: 0942-4962
Year: 2025
Issue: 2
Volume: 31
3 . 5 0 0
JCR@2023
CAS Journal Grade:4
Affiliated Colleges:
查看更多>>操作日志
管理员 2025-06-25 20:35:47 追加
管理员 2025-05-27 16:00:20 追加
闫春丽 2025-05-23 15:33:35 数据初审
管理员 2025-04-25 18:19:22 追加