Coding for Large-Scale Distributed Machine Learning

This article aims to give a comprehensive and rigorous review of the principles and recent development of coding for large-scale distributed machine learning (DML). With increasing data volumes and the pervasive deployment of sensors and computing machines, machine learning has become more distribut...

Full description

Bibliographic Details
Published in:Entropy
Main Authors: Ming Xiao, Mikael Skoglund
Format: Text
Language:English
Published: Multidisciplinary Digital Publishing Institute 2022
Subjects:
DML
Online Access:https://doi.org/10.3390/e24091284
id ftmdpi:oai:mdpi.com:/1099-4300/24/9/1284/
record_format openpolar
spelling ftmdpi:oai:mdpi.com:/1099-4300/24/9/1284/ 2023-08-20T04:06:08+02:00 Coding for Large-Scale Distributed Machine Learning Ming Xiao Mikael Skoglund 2022-09-12 application/pdf https://doi.org/10.3390/e24091284 EN eng Multidisciplinary Digital Publishing Institute Information Theory, Probability and Statistics https://dx.doi.org/10.3390/e24091284 https://creativecommons.org/licenses/by/4.0/ Entropy; Volume 24; Issue 9; Pages: 1284 error-control coding gradient coding random codes ADMM Text 2022 ftmdpi https://doi.org/10.3390/e24091284 2023-08-01T06:26:31Z This article aims to give a comprehensive and rigorous review of the principles and recent development of coding for large-scale distributed machine learning (DML). With increasing data volumes and the pervasive deployment of sensors and computing machines, machine learning has become more distributed. Moreover, the involved computing nodes and data volumes for learning tasks have also increased significantly. For large-scale distributed learning systems, significant challenges have appeared in terms of delay, errors, efficiency, etc. To address the problems, various error-control or performance-boosting schemes have been proposed recently for different aspects, such as the duplication of computing nodes. More recently, error-control coding has been investigated for DML to improve reliability and efficiency. The benefits of coding for DML include high-efficiency, low complexity, etc. Despite the benefits and recent progress, however, there is still a lack of comprehensive survey on this topic, especially for large-scale learning. This paper seeks to introduce the theories and algorithms of coding for DML. For primal-based DML schemes, we first discuss the gradient coding with the optimal code distance. Then, we introduce random coding for gradient-based DML. For primal–dual-based DML, i.e., ADMM (alternating direction method of multipliers), we propose a separate coding method for two steps of distributed optimization. Then coding schemes for different steps are discussed. Finally, a few potential directions for future works are also given. Text DML MDPI Open Access Publishing Entropy 24 9 1284
institution Open Polar
collection MDPI Open Access Publishing
op_collection_id ftmdpi
language English
topic error-control coding
gradient coding
random codes
ADMM
spellingShingle error-control coding
gradient coding
random codes
ADMM
Ming Xiao
Mikael Skoglund
Coding for Large-Scale Distributed Machine Learning
topic_facet error-control coding
gradient coding
random codes
ADMM
description This article aims to give a comprehensive and rigorous review of the principles and recent development of coding for large-scale distributed machine learning (DML). With increasing data volumes and the pervasive deployment of sensors and computing machines, machine learning has become more distributed. Moreover, the involved computing nodes and data volumes for learning tasks have also increased significantly. For large-scale distributed learning systems, significant challenges have appeared in terms of delay, errors, efficiency, etc. To address the problems, various error-control or performance-boosting schemes have been proposed recently for different aspects, such as the duplication of computing nodes. More recently, error-control coding has been investigated for DML to improve reliability and efficiency. The benefits of coding for DML include high-efficiency, low complexity, etc. Despite the benefits and recent progress, however, there is still a lack of comprehensive survey on this topic, especially for large-scale learning. This paper seeks to introduce the theories and algorithms of coding for DML. For primal-based DML schemes, we first discuss the gradient coding with the optimal code distance. Then, we introduce random coding for gradient-based DML. For primal–dual-based DML, i.e., ADMM (alternating direction method of multipliers), we propose a separate coding method for two steps of distributed optimization. Then coding schemes for different steps are discussed. Finally, a few potential directions for future works are also given.
format Text
author Ming Xiao
Mikael Skoglund
author_facet Ming Xiao
Mikael Skoglund
author_sort Ming Xiao
title Coding for Large-Scale Distributed Machine Learning
title_short Coding for Large-Scale Distributed Machine Learning
title_full Coding for Large-Scale Distributed Machine Learning
title_fullStr Coding for Large-Scale Distributed Machine Learning
title_full_unstemmed Coding for Large-Scale Distributed Machine Learning
title_sort coding for large-scale distributed machine learning
publisher Multidisciplinary Digital Publishing Institute
publishDate 2022
url https://doi.org/10.3390/e24091284
genre DML
genre_facet DML
op_source Entropy; Volume 24; Issue 9; Pages: 1284
op_relation Information Theory, Probability and Statistics
https://dx.doi.org/10.3390/e24091284
op_rights https://creativecommons.org/licenses/by/4.0/
op_doi https://doi.org/10.3390/e24091284
container_title Entropy
container_volume 24
container_issue 9
container_start_page 1284
_version_ 1774717071227092992