Deep Mutual Learning
Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this pap...
Main Authors: | , , , |
---|---|
Format: | Report |
Language: | unknown |
Published: |
arXiv
2017
|
Subjects: | |
Online Access: | https://dx.doi.org/10.48550/arxiv.1706.00384 https://arxiv.org/abs/1706.00384 |
id |
ftdatacite:10.48550/arxiv.1706.00384 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.48550/arxiv.1706.00384 2023-05-15T16:01:52+02:00 Deep Mutual Learning Zhang, Ying Xiang, Tao Hospedales, Timothy M. Lu, Huchuan 2017 https://dx.doi.org/10.48550/arxiv.1706.00384 https://arxiv.org/abs/1706.00384 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Preprint Article article CreativeWork 2017 ftdatacite https://doi.org/10.48550/arxiv.1706.00384 2022-04-01T10:18:46Z Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks. Surprisingly, it is revealed that no prior powerful teacher network is necessary -- mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher. : 10 pages, 4 figures Report DML DataCite Metadata Store (German National Library of Science and Technology) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences |
spellingShingle |
Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Zhang, Ying Xiang, Tao Hospedales, Timothy M. Lu, Huchuan Deep Mutual Learning |
topic_facet |
Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences |
description |
Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks. Surprisingly, it is revealed that no prior powerful teacher network is necessary -- mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher. : 10 pages, 4 figures |
format |
Report |
author |
Zhang, Ying Xiang, Tao Hospedales, Timothy M. Lu, Huchuan |
author_facet |
Zhang, Ying Xiang, Tao Hospedales, Timothy M. Lu, Huchuan |
author_sort |
Zhang, Ying |
title |
Deep Mutual Learning |
title_short |
Deep Mutual Learning |
title_full |
Deep Mutual Learning |
title_fullStr |
Deep Mutual Learning |
title_full_unstemmed |
Deep Mutual Learning |
title_sort |
deep mutual learning |
publisher |
arXiv |
publishDate |
2017 |
url |
https://dx.doi.org/10.48550/arxiv.1706.00384 https://arxiv.org/abs/1706.00384 |
genre |
DML |
genre_facet |
DML |
op_rights |
arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ |
op_doi |
https://doi.org/10.48550/arxiv.1706.00384 |
_version_ |
1766397560803557376 |