Indexed by:
Abstract:
Being able to estimate monocular depth for spherical panoramas is of fundamental importance in 3D scene perception. However, spherical distortion severely limits the effectiveness of vanilla convolutions. To push the envelope of accuracy, recent approaches attempt to utilize Tangent projection (TP) to estimate the depth of 360° images. Yet, these methods still suffer from discrepancies and inconsistencies among patch-wise tangent images, as well as the lack of accurate ground truth depth maps under a supervised fashion. In this paper, we propose a geometry-aware self-supervised 360° image depth estimation methodology that explores the complementary advantages of TP and Equirectangular projection (ERP) by an asymmetric dual-domain collaborative learning strategy. Especially, we first develop a lightweight asymmetric dual-domain depth estimation network, which enables to aggregate depth-related features from a single TP domain, and then produce depth distributions of the TP and ERP domains via collaborative learning. This effectively mitigates stitching artifacts and preserves fine details in depth inference without overspending model parameters. In addition, a frequent-spatial feature concentration module is devised to simultaneously capture non-local Fourier features and local spatial features, such that facilitating the efficient exploration of monocular depth cues. Moreover, we introduce a geometric structural alignment module to further improve geometric structural consistency among tangent images. Extensive experiments illustrate that our designed approach outperforms existing self-supervised 360° depth estimation methods on three publicly available benchmark datasets. © 1999-2012 IEEE.
Keyword:
Reprint 's Address:
Email:
Source :
IEEE Transactions on Multimedia
ISSN: 1520-9210
Year: 2025
8 . 4 0 0
JCR@2023
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: