Research on the Theory and Application of Deep Interactive Learning

Abstract Knowledge distillation (KD), in which a small network (students) is trained to mimic a larger one(teachers), with high precision, has been widely used in various fields. However, the interaction between teachers and students is still weak. It is found in this study that most existing method...

Full description

Bibliographic Details
Published in:Journal of Physics: Conference Series
Main Authors: Wang, Ziyuan, Guo, Fan
Format: Article in Journal/Newspaper
Language:unknown
Published: IOP Publishing 2021
Subjects:
DML
Online Access:http://dx.doi.org/10.1088/1742-6596/1982/1/012085
https://iopscience.iop.org/article/10.1088/1742-6596/1982/1/012085
https://iopscience.iop.org/article/10.1088/1742-6596/1982/1/012085/pdf
Description
Summary:Abstract Knowledge distillation (KD), in which a small network (students) is trained to mimic a larger one(teachers), with high precision, has been widely used in various fields. However, the interaction between teachers and students is still weak. It is found in this study that most existing methods, such as Deep Mutual Learning (DML), mainly construct loss function through soft weight indexes. Few researchers pay attention to the sharing of hard and heavy ones. As an improvement of DML, a new online learning distillation method, namely, Deep Interactive Learning (hereinafter DIL), was proposed in this research, which has deeper interaction than DML. We not only output the features of layers, but also disclose the features of hidden layers. We transfer the features to other models to obtain the corresponding softer distribution or features for distillation. Extensive experiments on various data sets show that the accuracy of our method is improved by almost 3% in CIFAR and 2% in ImageNet, which proves the validity of our method.