Progressive fusion of local and global image features for cross-modal image aesthetic assessment - Details

author：

Niu, Yuzhen (Niu, Yuzhen.) ^[1] (Scholars：牛玉贞) | Chen, Siling (Chen, Siling.) ^[2] | Chen, Shanshan (Chen, Shanshan.) ^[3] | Li, Fusheng (Li, Fusheng.) ^[4] Unfold

Indexed by：

EI Scopus SCIE

Abstract：

Image　aesthetic　assessment　(IAA)　has　drawn　wide　attention　in　recent　years.　This　task　aims　to　predict　the　aesthetic　quality　of　images　by　simulating　human　aesthetic　perception　mechanism,　thereby　assisting　users　in　selecting　images　with　higher　aesthetic　value.　For　IAA,　the　local　information　and　various　global　semantic　information　contained　in　an　image,　such　as　composition,　theme,　and　emotion,　all　play　a　crucial　role.　Existing　CNN-based　methods　attempt　to　use　multi-branch　strategies　to　extract　local　and　global　semantic　information　related　to　IAA　from　images.　However,　these　methods　can　only　extract　limited　and　specific　global　semantic　information,　and　requiring　additional　labeled　datasets.　Furthermore,　some　cross-modal　IAA　methods　have　been　proposed　to　use　both　images　and　user　comments,　but　they　often　fail　to　fully　explore　the　valuable　information　within　each　modality　and　the　correlations　between　cross-modal　features,　affecting　cross-modal　IAA　accuracy.　Considering　these　limitations,　in　this　paper,　we　propose　a　cross-modal　IAA　model　that　progressively　fuses　local　and　global　image　features.　The　model　consists　of　a　progressive　local　and　global　image　feature　fusion　branch,　a　text　feature　enhancement　branch,　and　a　cross-modal　feature　fusion　module.　In　the　image　branch,　we　introduce　an　inter-layer　feature　fusion　module　(IFFM)　and　adopt　a　progressive　way　to　interact　and　fuse　the　extracted　local　and　global　features　to　obtain　more　comprehensive　image　features.　In　the　text　branch,　we　propose　a　text　feature　enhancement　module　(TFEM)　to　strengthen　the　extracted　text　features,　so　as　to　mine　more　effective　textual　information.　Meanwhile,　considering　the　intrinsic　correlation　between　image　and　text　features,　we　propose　a　cross-modal　feature　fusion　module　(CFFM)　to　integrate　and　fuse　image　features　with　text　features　for　aesthetic　assessment.　Experimental　results　on　the　AVA　(Aesthetic　Visual　Analysis)　dataset　validate　the　superiority　of　our　method　for　IAA　task.

Keyword：

Cross-modality Feature fusion Image aesthetic assessment Local and global features Textual information

Community：

[ 1 ] [Niu, Yuzhen]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350108, Fujian, Peoples R China
[ 2 ] [Chen, Siling]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350108, Fujian, Peoples R China
[ 3 ] [Chen, Shanshan]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350108, Fujian, Peoples R China
[ 4 ] [Li, Fusheng]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350108, Fujian, Peoples R China
[ 5 ] [Niu, Yuzhen]Minist Educ, Engn Res Ctr Big Data Intelligence, Fuzhou 350108, Fujian, Peoples R China