R\'{e}nyi Divergence Deep Mutual Learning

This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla D...

Full description

Bibliographic Details
Main Authors: Huang, Weipeng, Tao, Junjie, Deng, Changbo, Fan, Ming, Wan, Wenqiang, Xiong, Qi, Piao, Guangyuan
Format: Text
Language:unknown
Published: 2022
Subjects:
DML
Online Access:http://arxiv.org/abs/2209.05732
id ftarxivpreprints:oai:arXiv.org:2209.05732
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:2209.05732 2023-09-05T13:19:04+02:00 R\'{e}nyi Divergence Deep Mutual Learning Huang, Weipeng Tao, Junjie Deng, Changbo Fan, Ming Wan, Wenqiang Xiong, Qi Piao, Guangyuan 2022-09-13 http://arxiv.org/abs/2209.05732 unknown http://arxiv.org/abs/2209.05732 Computer Science - Machine Learning Computer Science - Artificial Intelligence text 2022 ftarxivpreprints 2023-08-16T17:16:42Z This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla DML with limited additional complexity. The convergence properties of the proposed paradigm are analyzed theoretically, and Stochastic Gradient Descent with a constant learning rate is shown to converge with $\mathcal{O}(1)$-bias in the worst case scenario for nonconvex optimization tasks. That is, learning will reach nearby local optima but continue searching within a bounded scope, which may help mitigate overfitting. Finally, our extensive empirical results demonstrate the advantage of combining DML and R\'{e}nyi divergence, leading to further improvement in model generalization. Text DML ArXiv.org (Cornell University Library)
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Machine Learning
Computer Science - Artificial Intelligence
spellingShingle Computer Science - Machine Learning
Computer Science - Artificial Intelligence
Huang, Weipeng
Tao, Junjie
Deng, Changbo
Fan, Ming
Wan, Wenqiang
Xiong, Qi
Piao, Guangyuan
R\'{e}nyi Divergence Deep Mutual Learning
topic_facet Computer Science - Machine Learning
Computer Science - Artificial Intelligence
description This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla DML with limited additional complexity. The convergence properties of the proposed paradigm are analyzed theoretically, and Stochastic Gradient Descent with a constant learning rate is shown to converge with $\mathcal{O}(1)$-bias in the worst case scenario for nonconvex optimization tasks. That is, learning will reach nearby local optima but continue searching within a bounded scope, which may help mitigate overfitting. Finally, our extensive empirical results demonstrate the advantage of combining DML and R\'{e}nyi divergence, leading to further improvement in model generalization.
format Text
author Huang, Weipeng
Tao, Junjie
Deng, Changbo
Fan, Ming
Wan, Wenqiang
Xiong, Qi
Piao, Guangyuan
author_facet Huang, Weipeng
Tao, Junjie
Deng, Changbo
Fan, Ming
Wan, Wenqiang
Xiong, Qi
Piao, Guangyuan
author_sort Huang, Weipeng
title R\'{e}nyi Divergence Deep Mutual Learning
title_short R\'{e}nyi Divergence Deep Mutual Learning
title_full R\'{e}nyi Divergence Deep Mutual Learning
title_fullStr R\'{e}nyi Divergence Deep Mutual Learning
title_full_unstemmed R\'{e}nyi Divergence Deep Mutual Learning
title_sort r\'{e}nyi divergence deep mutual learning
publishDate 2022
url http://arxiv.org/abs/2209.05732
genre DML
genre_facet DML
op_relation http://arxiv.org/abs/2209.05732
_version_ 1776199884611780608