Knowledge Transfer via Dense Cross-Layer Mutual-Distillation

Knowledge Distillation (KD) based methods adopt the one-way Knowledge Transfer (KT) scheme in which training a lower-capacity student network is guided by a pre-trained high-capacity teacher network. Recently, Deep Mutual Learning (DML) presented a two-way KT strategy, showing that the student netwo...

Full description

Bibliographic Details
Main Authors: Yao, Anbang, Sun, Dawei
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2020
Subjects:
DML
Online Access:https://dx.doi.org/10.48550/arxiv.2008.07816
https://arxiv.org/abs/2008.07816
id ftdatacite:10.48550/arxiv.2008.07816
record_format openpolar
spelling ftdatacite:10.48550/arxiv.2008.07816 2023-05-15T16:01:58+02:00 Knowledge Transfer via Dense Cross-Layer Mutual-Distillation Yao, Anbang Sun, Dawei 2020 https://dx.doi.org/10.48550/arxiv.2008.07816 https://arxiv.org/abs/2008.07816 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Computer Vision and Pattern Recognition cs.CV Machine Learning cs.LG FOS Computer and information sciences Article CreativeWork article Preprint 2020 ftdatacite https://doi.org/10.48550/arxiv.2008.07816 2022-03-10T15:43:30Z Knowledge Distillation (KD) based methods adopt the one-way Knowledge Transfer (KT) scheme in which training a lower-capacity student network is guided by a pre-trained high-capacity teacher network. Recently, Deep Mutual Learning (DML) presented a two-way KT strategy, showing that the student network can be also helpful to improve the teacher network. In this paper, we propose Dense Cross-layer Mutual-distillation (DCM), an improved two-way KT method in which the teacher and student networks are trained collaboratively from scratch. To augment knowledge representation learning, well-designed auxiliary classifiers are added to certain hidden layers of both teacher and student networks. To boost KT performance, we introduce dense bidirectional KD operations between the layers appended with classifiers. After training, all auxiliary classifiers are discarded, and thus there are no extra parameters introduced to final models. We test our method on a variety of KT tasks, showing its superiorities over related methods. Code is available at https://github.com/sundw2014/DCM : Accepted by ECCV 2020. The code is available at https://github.com/sundw2014/DCM, which is based on the implementation of our DKS work https://github.com/sundw2014/DKS Article in Journal/Newspaper DML DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Computer Vision and Pattern Recognition cs.CV
Machine Learning cs.LG
FOS Computer and information sciences
spellingShingle Computer Vision and Pattern Recognition cs.CV
Machine Learning cs.LG
FOS Computer and information sciences
Yao, Anbang
Sun, Dawei
Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
topic_facet Computer Vision and Pattern Recognition cs.CV
Machine Learning cs.LG
FOS Computer and information sciences
description Knowledge Distillation (KD) based methods adopt the one-way Knowledge Transfer (KT) scheme in which training a lower-capacity student network is guided by a pre-trained high-capacity teacher network. Recently, Deep Mutual Learning (DML) presented a two-way KT strategy, showing that the student network can be also helpful to improve the teacher network. In this paper, we propose Dense Cross-layer Mutual-distillation (DCM), an improved two-way KT method in which the teacher and student networks are trained collaboratively from scratch. To augment knowledge representation learning, well-designed auxiliary classifiers are added to certain hidden layers of both teacher and student networks. To boost KT performance, we introduce dense bidirectional KD operations between the layers appended with classifiers. After training, all auxiliary classifiers are discarded, and thus there are no extra parameters introduced to final models. We test our method on a variety of KT tasks, showing its superiorities over related methods. Code is available at https://github.com/sundw2014/DCM : Accepted by ECCV 2020. The code is available at https://github.com/sundw2014/DCM, which is based on the implementation of our DKS work https://github.com/sundw2014/DKS
format Article in Journal/Newspaper
author Yao, Anbang
Sun, Dawei
author_facet Yao, Anbang
Sun, Dawei
author_sort Yao, Anbang
title Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
title_short Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
title_full Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
title_fullStr Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
title_full_unstemmed Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
title_sort knowledge transfer via dense cross-layer mutual-distillation
publisher arXiv
publishDate 2020
url https://dx.doi.org/10.48550/arxiv.2008.07816
https://arxiv.org/abs/2008.07816
genre DML
genre_facet DML
op_rights arXiv.org perpetual, non-exclusive license
http://arxiv.org/licenses/nonexclusive-distrib/1.0/
op_doi https://doi.org/10.48550/arxiv.2008.07816
_version_ 1766397628905422848