Indexed by:
Abstract:
The aim of Knowledge Distillation (KD) is to train lightweight student models through extra supervision from large teacher models. Most previous KD methods transfer feature information from teacher models to student models via connections of feature maps at the same layers. This paper proposes a novel multi-level knowledge distillation method, referred to as Normalized Feature Fusion Knowledge distillation(NFFKD). The proposed model learns different levels of knowledge to improve the network performance. We proposed to use the hierarchical mixed loss(HML) module to minimize the gap between the intermediate feature layers of the teacher and the student, and the teacher-student gap is reduced by normalizing the logits. Experimental results have demonstrated that the proposed NFFKD shows superiority over several state-of-the-art KD methods on public datasets under different settings.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
2022 IEEE THE 5TH INTERNATIONAL CONFERENCE ON BIG DATA AND ARTIFICIAL INTELLIGENCE (BDAI 2022)
Year: 2022
Page: 111-116
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2