Cross-Modal Remote Sensing Image-Audio Retrieval With Adaptive Learning for Aligning Correlation - Details

author：

Huang, Jinghao (Huang, Jinghao.) ^[1] | Chen, Yaxiong (Chen, Yaxiong.) ^[2] | Xiong, Shengwu (Xiong, Shengwu.) ^[3] | Lu, Xiaoqiang (Lu, Xiaoqiang.) ^[4] (Scholars：卢孝强)

Indexed by：

EI Scopus SCIE

Abstract：

An　important　challenge　that　existing　work　has　yet　to　address　is　the　relatively　small　differences　in　audio　representations　compared　with　the　rich　content　provided　by　remote　sensing　(RS)　images,　making　it　easy　to　overlook　certain　details　in　the　images.　This　imbalance　in　information　between　modalities　poses　a　challenge　in　maintaining　consistent　representations.　In　response　to　this　challenge,　we　propose　a　novel　cross-modal　RS　image-audio　(RSIA)　retrieval　method　called　adaptive　learning　for　aligning　correlation　(ALAC).　ALAC　integrates　region-level　learning　into　image　annotation　through　a　region-enhanced　learning　attention　(RELA)　module.　By　collaboratively　suppressing　features　at　different　region　levels,　ALAC　is　able　to　provide　a　more　comprehensive　visual　feature　representation.　In　addition,　a　novel　adaptive　knowledge　transfer　(AKT)　strategy　has　been　proposed,　which　guides　the　learning　process　of　the　frontend　network　using　aligned　feature　vectors.　This　approach　allows　the　model　to　adaptively　acquire　alignment　information　during　the　learning　process,　thereby　facilitating　better　alignment　between　the　two　modalities.　Finally,　to　better　use　mutual　information　between　different　modalities,　we　introduce　a　plug-and-play　result　rerank　module.　This　module　optimizes　the　similarity　matrix　using　retrieval　mutual　information　between　modalities　as　weights,　significantly　improving　retrieval　accuracy.　Experimental　results　on　four　RSIA　datasets　demonstrate　that　ALAC　outperforms　other　methods　in　retrieval　performance.　Compared　with　state-of-the-art　methods,　improvements　of　1.49%,　2.25%,　4.24%,　and　1.33%　were,　respectively,　achieved　by　ALAC.　The　codes　are　accessible　at　https://github.com/huangjh98/ALAC.

Keyword：

Adaptive learning cross-modal remote sensing (RS) retrieval knowledge transfer mutual information region-enhanced learning attention (RELA)

Community：

[ 1 ] [Huang, Jinghao]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 2 ] [Chen, Yaxiong]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 3 ] [Xiong, Shengwu]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 4 ] [Huang, Jinghao]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[ 5 ] [Chen, Yaxiong]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[ 6 ] [Xiong, Shengwu]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[ 7 ] [Huang, Jinghao]Wuhan Univ Technol Chongqing Res Inst, Chongqing 401122, Peoples R China
[ 8 ] [Chen, Yaxiong]Wuhan Univ Technol Chongqing Res Inst, Chongqing 401122, Peoples R China
[ 9 ] [Xiong, Shengwu]Wuhan Univ Technol Chongqing Res Inst, Chongqing 401122, Peoples R China
[ 10 ] [Huang, Jinghao]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 11 ] [Chen, Yaxiong]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 12 ] [Xiong, Shengwu]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 13 ] [Lu, Xiaoqiang]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350108, Peoples R China

Reprint 's Address：

[Chen, Yaxiong]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China;;[Xiong, Shengwu]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China;;

Email：

chenyaxiong@whut.edu.cn |
xiongsw@whut.edu.cn

Show more details

Version：