Indexed by:
Abstract:
As a deep residual network model, Resnet50 has significant practical significance in image classification, target recognition, and image semantic recognition. In this paper, Nvidia RTX 4090 GPU is used to conduct detailed performance testing and bottleneck analysis for Resnet50 training and inference, including specific calculation delay and data processing delay under different batch sizes. In order to verify the overall acceleration effect of Resnet50, we use two optimization methods on the basis of GPU computing acceleration: the first is to use mixed precision to improve GPU training and inference efficiency, and the second is to use DALI to optimize data preprocessing and reduce data loading delay. The experimental results show that when the batch size is 256, the mixed precision is improved by about 90% compared with FP32, but the overall performance improvement is not obvious. When using mixed precision and DALI for GPU computing and data loading optimization at the same time, it can bring 1.4 and 2.5 times improvement in the overall performance of training and inference. The experimental results show that only using the mixed precision can not improve the overall computing efficiency of the system, and the data loading time cost frequently limits the end-to-end performance. Therefore, only by optimizing GPU computation and data loading delay at the same time can end users get a significant speed increase. This paper is of great significance to evaluate and improve the computational acceleration performance of GPU-based deep neural networks. © 2024 SPIE.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
ISSN: 0277-786X
Year: 2024
Volume: 13184
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: