Multi-granularity interaction and feature recombination network for fine-grained visual classification - Details

author：

Ke, Xiao (Ke, Xiao.) ^[1] (Scholars：柯逍) | Cai, Yuhang (Cai, Yuhang.) ^[2] | Chen, Baitao (Chen, Baitao.) ^[3] | Liu, Hao (Liu, Hao.) ^[4] | Guo, Wenzhong (Guo, Wenzhong.) ^[5] (Scholars：郭文忠)

Indexed by：

EI Scopus SCIE

Abstract：

Fine-grained　visual　classification　(FGVC)　is　a　highly　challenging　task　that　aims　to　learn　subtle　differences　between　visually　similar　objects.　Most　existing　methods　for　FGVC　rely　on　deep　convolutional　neural　networks　to　mine　local　fine-grained　features,　which　neglect　the　learning　of　relationships　between　global　and　local　semantics.　Moreover,　the　feature　encoding　stage　inevitably　constructs　complex　feature　representations,　leading　to　overfitting　to　specific　feature　patterns,　which　is　not　beneficial　for　fine-grained　visual　classification.　To　address　these　issues,　we　propose　a　Transformer-based　FGVC　model,　called　the　Multi-Granularity　Interaction　and　Feature　Recombination　Network(MGIFR-Net),　which　consists　of　three　modules.　Firstly,　a　self-attention　guided　localization　module　is　designed　to　locate　and　amplify　discriminative　local　regions,　enabling　the　sufficient　learning　of　local　detail　information.　Secondly,　to　enhance　the　perception　of　multi-granularity　semantic　interaction　information,　we　construct　a　multi-granularity　feature　interaction　learning　module　to　jointly　learn　local　and　global　feature　representations.　Finally,　a　dynamic　feature　recombination　enhancement　method　is　proposed,　which　explores　diverse　feature　pattern　combinations　while　retaining　invariant　features,　effectively　alleviating　the　overfitting　problem　caused　by　complex　feature　representations.　Our　method　achieves　stateof-the-art　performance　on　four　benchmark　FGVC　datasets　(CUB-200-2011,　Stanford　Cars,　FGVC-Aircraft,　and　NAbirds),　and　experimental　results　demonstrate　the　superiority　of　our　method　on　different　visual　classification　benchmarks.

Keyword：

Feature recombination Fine-grained visual classification Multi-granularity feature interaction Vision transformer

Community：

[ 1 ] [Cai, Yuhang]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350116, Peoples R China
[ 2 ] [Cai, Yuhang]Fuzhou Univ, Engn Res Ctr Big Data Intelligence, Minist Educ, Fuzhou 350116, Peoples R China

Reprint 's Address：