Bilinear Parallel Fourier Transformer for Multimodal Remote Sensing Classification - Details

author：

Chen, Yaxiong (Chen, Yaxiong.) ^[1] | Wang, Qicong (Wang, Qicong.) ^[2] | Zhao, Yichen (Zhao, Yichen.) ^[3] | Xiong, Shengwu (Xiong, Shengwu.) ^[4] | Lu, Xiaoqiang (Lu, Xiaoqiang.) ^[5] (Scholars：卢孝强)

Indexed by：

EI Scopus SCIE

Abstract：

Vision　Transformers　(ViTs)　have　shown　promise　in　multimodal　fusion　image　classification,　yet　face　performance　challenges　in　complex　remote　sensing　scenarios.　Single　fusion　frameworks　often　fail　to　fully　utilize　multimodal　diversity,　and　the　uneven　distribution　of　image　categories　complicates　the　accurate　construction　of　spatial　structures　by　Transformers.　Additionally,　traditional　cross-entropy　tends　to　favor　majority　classes,　neglecting　minority　classes,　resulting　in　suboptimal　predictions　and　reduced　overall　accuracy　(OA).　To　solve　these　challenges,　we　propose　a　novel　deep　neural　network,　a　bilinear　parallel　Fourier　Transformer　(BPFT).　We　propose　a　novel　dual-fusion　feature　interaction　(DFFI)　module　that　utilizes　two　distinct　types　of　fused　features　for　learning,　namely　the　spatial-spectral　fusion　feature　and　the　global　fusion　feature.　Besides,　we　introduce　a　dual-feature　interaction　(DFI)　module　to　improve　the　utilization　of　fused　feature　information.　To　enable　the　Transformer　to　better　establish　spatial　structural　relationships,　we　employ　the　Fourier　transform　in　place　of　the　self-attention　mechanism.　To　address　the　focus　on　minority　class　labels,　we　propose　an　exponential　label　smoothing　cross-entropy　loss　function.　This　loss　function　comprises　two　components:　exponential　cross-entropy　and　label　smoothing.　The　exponential　cross-entropy　component　applies　a　strong　penalty　to　misclassified　samples,　thereby　increasing　attention　on　minority　class　labels.　To　validate　the　efficacy　of　our　approach,　extensive　experiments　are　conducted　across　two　multimodal　remote　sensing　datasets:　Augsburg　and　Berlin,　encompassing　hyperspectral　imaging　(HSI)　data　and　synthetic　aperture　radar　(SAR)　data.　The　results　of　these　experiments　affirm　the　superior　performance　of　our　proposed　BPFT　model　compared　to　existing　state-of-the-art　models　in　multimodal　remote　sensing　image　classification　tasks.

Keyword：

Accuracy Artificial intelligence Convolutional neural networks Cross-modal retrieval Data mining Data models Feature extraction Fourier transforms Image classification prior similarity Remote sensing saliency learning Transformers

Community：

[ 1 ] [Chen, Yaxiong]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 2 ] [Wang, Qicong]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 3 ] [Zhao, Yichen]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 4 ] [Chen, Yaxiong]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 5 ] [Wang, Qicong]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 6 ] [Zhao, Yichen]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 7 ] [Xiong, Shengwu]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 8 ] [Xiong, Shengwu]Wuhan Coll, Interdisciplinary Artificial Intelligence Res Inst, Wuhan 430212, Peoples R China
[ 9 ] [Lu, Xiaoqiang]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350108, Peoples R China

Reprint 's Address：

[Xiong, Shengwu]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China;;[Xiong, Shengwu]Wuhan Coll, Interdisciplinary Artificial Intelligence Res Inst, Wuhan 430212, Peoples R China

Email：

xiongsw@whut.edu.cn

Show more details

Version：

Bilinear Parallel Fourier Transformer for Multimodal Remote Sensing Classification
2025，IEEE Transactions on Geoscience and Remote Sensing
Bilinear Parallel Fourier Transformer for Multimodal Remote Sensing Classification
2025，IEEE Transactions on Geoscience and Remote Sensing
Bilinear Parallel Fourier Transformer for Multimodal Remote Sensing Classification
2025，IEEE Transactions on Geoscience and Remote Sensing

Related Keywords：

Hyperspectral Image Classification via Cascaded Spatial Cross-Attention Network
2025，IEEE TRANSACTIONS ON IMAGE PROCESSING
Context-Aware Local-Global Semantic Alignment for Remote Sensing Image-Text Retrieval
2025，IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
VGRSS: Datasets and Models for Visual Grounding in Remote Sensing Ship Images
2025，IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
Building Type Classification Using CNN-Transformer Cross-Encoder Adaptive Learning From Very High Resolution Satellite Images
2025，IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

Source ：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

ISSN： 0196-2892

Year： 2025

Volume： 63

7 . 5 0 0

JCR@2023

CAS Journal Grade：1

Cited Count：

WoS CC Cited Count： 4

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 7

Affiliated Colleges：

物理与信息工程学院、微电子学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to