Indexed by:
Abstract:
With the widespread adoption of pose estimation technology in computer vision, traditional methods often struggle with insufficient accuracy and poor robustness when dealing with complex dynamic scenes and diverse backgrounds. To overcome these challenges, we propose a novel pose estimation approach that integrates a dynamic graph convolutional network (DGCN), a spatiotemporal interleaved attention mechanism (STIA), and a variable-length Transformer encoder (VLTE). The DGCN module captures the spatial dependencies of human poses and enhances the model's ability to represent complex spatial relationships by dynamically adjusting the graph structure. The STIA module effectively captures spatiotemporal dependencies by combining both temporal and spatial information. The VLTE module improves the model's adaptability to varying time scales by processing variable-length sequences and multi-scale information. We performed extensive experimental validation on the MPI-INF-3DHP dataset, and the results demonstrate that removing any of the modules significantly reduces the model's performance, highlighting the importance of each component in the overall architecture. Furthermore, inference speed analysis reveals that while streamlining the model can enhance inference speed, it comes at the cost of accuracy and robustness. Our work presents a pose estimation solution that strikes a balance between computational efficiency and high accuracy, offering strong support for practical pose estimation tasks. © 2025
Keyword:
Reprint 's Address:
Email:
Source :
Image and Vision Computing
ISSN: 0262-8856
Year: 2025
Volume: 162
4 . 2 0 0
JCR@2023
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: