Dual bidirectional mutual distillation based on dense connections

Abstract In the field of deep neural network, model compression has been widely used. As a model compression method, knowledge distillation has been studied more and more widely. Most of the existing knowledge distillation methods need a lot of data to train the teacher model in advance, and the int...

Full description

Bibliographic Details
Published in:Journal of Physics: Conference Series
Main Author: Lu, Bin
Format: Article in Journal/Newspaper
Language:unknown
Published: IOP Publishing 2021
Subjects:
DML
Online Access:http://dx.doi.org/10.1088/1742-6596/1952/2/022042
https://iopscience.iop.org/article/10.1088/1742-6596/1952/2/022042
https://iopscience.iop.org/article/10.1088/1742-6596/1952/2/022042/pdf
Description
Summary:Abstract In the field of deep neural network, model compression has been widely used. As a model compression method, knowledge distillation has been studied more and more widely. Most of the existing knowledge distillation methods need a lot of data to train the teacher model in advance, and the interaction between their teacher model and student model is often weak. The existing methods such as DML are constrained by loss to realize soft weight sharing, and there is little work to discuss hard weight sharing. Therefore, a new deep interactive online distillation method is proposed in this paper. In this method, the teacher network generates the auxiliary student network and establishes the connection between the two, and the student network generates the auxiliary teacher network and establishes the connection between the two. Then the two auxiliary networks are combined and trained from scratch. Finally, the performance of the model is improved by using back gradient propagation. The effectiveness of this method is verified by experimental tests on common image classification tasks.