CATrans: A Cross-Scale Attention Transformer for Land Cover Semantic Segmentation in High-Resolution Remote Sensing Images - Details

author：

Indexed by：

EI Scopus

Abstract：

[Objectives]　High-resolution　remote　sensing　image　segmentation　provides　essential　data　support　for　urban　planning,　land　use,　and　land　cover　analysis　by　accurately　extracting　terrain　information.　However,　traditional　methods　face　challenges　in　predicting　object　categories　at　the　pixel　level　due　to　the　high　computational　cost　of　processing　high-resolution　images.　Current　segmentation　approaches　often　divide　remote　sensing　images　into　a　series　of　standard　blocks　and　perform　multi-scale　local　segmentation,　which　captures　semantic　information　at　different　granularities.　However,　these　methods　exhibit　weak　feature　interaction　between　blocks,　as　they　do　not　consider　contextual　prior　knowledge,　ultimately　reducing　local　segmentation　performance.　[Methods]　To　address　this　issue,　this　paper　proposes　a　high-resolution　remote　sensing　image　segmentation　framework　named　CATrans　(Cross-scale　Attention　Transformer),　which　combines　cross-scale　attention　with　a　semantic-based　visual　Transformer.　CATrans　first　predicts　the　segmentation　results　of　local　blocks　and　then　merges　them　to　produce　the　final　global　image　segmentation.　It　introduces　contextual　prior　knowledge　to　enhance　local　feature　representation.　Specifically,　we　propose　a　cross-scale　attention　mechanism　to　integrate　contextual　semantic　information　with　multi-level　features.　The　multi-branch　parallel　structure　of　the　cross-scale　attention　module　enhances　focus　on　objects　of　varying　granularities　by　analyzing　shallow-deep　and　local-global　dependencies.　This　mechanism　aggregates　cross-spatial　information　across　various　dimensions　and　weights　multi-scale　kernels　to　strengthen　multi-level　feature　representations,　enabling　the　model　to　avoid　deep　stacking　and　multiple　sequential　processes.　Additionally,　a　semantic-based　visual　Transformer　is　adopted　to　couple　multi-level　contextual　semantic　information.　Spatial　attention　is　used　to　reinforce　these　semantic　representations.　The　multi-level　contextual　information　is　grouped　to　form　abstract　semantic　concepts,　which　are　then　fed　into　the　Transformer　for　sequence　modeling.　The　self-attention　mechanism　within　the　Transformer　captures　dependencies　between　different　positions　in　the　input　sequence,　thereby　enhancing　the　correlation　between　contextual　semantics　and　spatial　positions.　Finally,　enhanced　contextual　semantics　are　generated　through　feature　mapping.　[Results]　This　paper　conducts　comparative　experiments　on　the　DeepGlobe,　Inria　Aerial,　and　LoveDA　datasets.　The　results　show　that　CATrans　outperforms　existing　segmentation　methods,　including　Discrete　Wavelet　Smooth　Network　(WSDNet)　and　Integrating　Shallow　and　Deep　Network　(ISDNet).　CATrans　achieves　a　Mean　Intersection　over　Union　(mIoU)　of　76.2%,　79.2%,　and　54.2%,　and　a　Mean　F1　Score　(mF1)　of　86.5,　87.8%,　and　66.8%,　with　inference　speeds　of　38.1　FPS,　13.2　FPS,　and　95.22　FPS　on　the　respective　datasets.　Compared　to　the　best-performing　method,　WSDNet,　CATrans　improves　segmentation　performance　across　all　classes,　with　mIoU　gains　of　2.1%,　4.0%,　and　5.3%,　and　mF1　gains　of　1.3%,　1.8%,　and　5.6%.　[Conclusions]　These　findings　highlight　that　the　proposed　CATrans　framework　significantly　enhances　high-resolution　remote　sensing　image　segmentation　by　incorporating　contextual　prior　knowledge　to　improve　local　feature　representation.　It　achieves　an　effective　balance　between　segmentation　performance　and　computational　efficiency.　©　2025　Science　Press.　All　rights　reserved.

Keyword：

Behavioral research Image enhancement Land use Latent semantic analysis Remote sensing Scales (weighing instruments) Semantics Semantic Segmentation Semantic Web Urban planning

Community：

[ 1 ] [Chen, Lijia]College of Art and Design, Fujian Business University, Fuzhou; 350599, China
[ 2 ] [Chen, Honghui]College of Physics and Information Engineering, Fuzhou University, Fuzhou; 350108, China
[ 3 ] [Xie, Yanqiu]College of Landscape Architecture and Art, Fujian Agriculture and Forestry University, Fuzhou; 350002, China
[ 4 ] [He, Tianyou]College of Landscape Architecture and Art, Fujian Agriculture and Forestry University, Fuzhou; 350002, China
[ 5 ] [Ye, Jing]College of Landscape Architecture and Art, Fujian Agriculture and Forestry University, Fuzhou; 350002, China
[ 6 ] [Wu, Linhuang]College of Physics and Information Engineering, Fuzhou University, Fuzhou; 350108, China

Reprint 's Address：

Email：

Show more details

Version：

CATrans: A Cross-Scale Attention Transformer for Land Cover Semantic Segmentation in High-Resolution Remote Sensing Images; [CATrans:基于跨尺度注意力 Transformer 的高分辨率遥感影像土地覆盖语义分割框架]
2025，Journal of Geo-Information Science

Related Keywords：

Integrating open geographic data for urban land use classification using graph neural networks from high-resolution remote sensing imagery
2024，5th International Conference on Geoscience and Remote Sensing Mapping, ICGRSM 2023
Semantic Segmentation of Remote Sensing Image Based on Contextual U-Net
2023，2nd International Conference on Applied Statistics, Computational Mathematics, and Software Engineering, ASCMSE 2023
Exploiting Emotion-Semantic Correlations for Empathetic Response Generation
2023，2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Multi-View Image Tampering Detection and Localization in Real Scene Based on Spatial-Channel Attention
2024，15th International Conference on Graphics and Image Processing, ICGIP 2023

Source ：

Journal of Geo-Information Science

ISSN： 1560-8999

Year： 2025

Issue： 7

Volume： 27

Page： 1624-1637

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to