• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Guo, Cuixia (Guo, Cuixia.) [1] | Xu, Yongtao (Xu, Yongtao.) [2] | Zou, Zhanghuang (Zou, Zhanghuang.) [3] | Pan, Zhijie (Pan, Zhijie.) [4] | Huang, Feng (Huang, Feng.) [5]

Indexed by:

EI

Abstract:

With the continuous development of computer vision research, the application of deep learning has become more and more widespread. The introduction of deep learning into object detection research has greatly improved detection performance. However, in order to further improve the detection accuracy of the model, the depth and width of the network are continuously increased, which leads to models containing a large number of parameters and complex structures, posing challenges for practical deployment. Aiming at the problems of high computational cost, high memory consumption, and difficulty in efficient deployment on edge devices of visible-infrared dual-modal fusion detection models, this paper proposes a lightweight pedestrian-vehicle detection model based on visible and infrared modality fusion. The multimodal input greatly improves the stability of the algorithm during all-weather operation, and the detection task can be well accomplished regardless of snowy, foggy, or low-illumination conditions. The proposed model uses the MobileNetV2 lightweight network instead of the YOLOv7-tiny network backbone. The MobileNetV2 network employs an inverted residual structure and a linear bottleneck layer, both of which effectively enhance its feature representation and learning capability. It also uses depthwise separable convolution, which, unlike conventional convolution, divides the convolution into pointwise and depthwise components. This paper also proposes a differential modal fusion module inspired by the principle of differential amplification circuits. This module differentiates the two modal images, extracts the differential and common mode information, and amplifies the differential mode information to fully utilize the complementary advantages of visible and infrared modalities. The illumination-aware module is introduced due to the fact that under low illumination and bad weather conditions, infrared and visible images have different impacts on model performance. This module dynamically assigns weights to visible and infrared features based on illumination conditions, thus maximizing the feature information of each modality. Three public datasets are used for the experiments: FLIR ADAS, LLVIP, and KAIST. We conduct comparative experiments between lightweight single-modal algorithms such as YOLOv5s and YOLOv7-tiny, and dual-modal algorithms such as ICAFusion and CFT. The results show that the bimodal detection models outperform the unimodal detection models in terms of detection performance. On the FLIR ADAS dataset, the proposed model improves the detection accuracy by 11.6% and 15.3%, respectively, compared to the unimodal YOLOv5s and YOLOv7-tiny models with RGB input. Compared to the unimodal YOLOv5s and YOLOv7-tiny models with infrared input, the detection accuracy is improved by 3.3% and 4.9%, respectively. Compared with the baseline model, the accuracy of the proposed model is improved by 3.8%. Compared with the bimodal models ICAFusion and CFT, the proposed model improves the detection accuracy by 3.8% and 1.9%, respectively. On the LLVIP dataset, the proposed model improves the detection accuracy by 6.9% and 1.1%, respectively, compared to YOLOv7-tiny when using visible and infrared unimodal inputs. The proposed model improves the detection accuracy by 5.4% and 2.0%, respectively, compared to YOLOv5s with visible and infrared input. Compared with the baseline model, the accuracy is improved by 1.3%. Compared with the bimodal models ICAFusion and SLBAF, the proposed model improves the detection accuracy by 9.6% and 1.5%, respectively. On the KAIST dataset, the proposed model improves detection accuracy by 26.9% and 7.8% on visible and infrared inputs, respectively, compared to YOLOv7-tiny. Compared with the baseline model, the accuracy is improved by 3.4%. Compared to other bimodal models, the proposed model shows the most advantageous results in terms of detection accuracy, achieving as high as 76.2%. In terms of inference speed, the proposed model achieves 208 FPS, 103 FPS, and 113 FPS on the FLIR ADAS, LLVIP, and KAIST datasets, respectively. These results show that the proposed model has significant advantages in both detection accuracy and speed, as well as robustness. © 2025, Chinese Optical Society. All rights reserved.

Keyword:

Complex networks Computer vision Convolution Fusion reactions Image enhancement Light weight vehicles Modal analysis Network layers Object recognition Signal detection Vehicle detection

Community:

  • [ 1 ] [Guo, Cuixia]School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou; 350000, China
  • [ 2 ] [Xu, Yongtao]School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou; 350000, China
  • [ 3 ] [Zou, Zhanghuang]School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou; 350000, China
  • [ 4 ] [Pan, Zhijie]School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou; 350000, China
  • [ 5 ] [Huang, Feng]School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou; 350000, China

Reprint 's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Acta Photonica Sinica

ISSN: 1004-4213

Year: 2025

Issue: 6

Volume: 54

0 . 6 0 0

JCR@2023

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Affiliated Colleges:

Online/Total:327/11079525
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1