Reliable and Efficient Distributed Machine Learning

With the ever-increasing penetration and proliferation of various smart Internet of Things (IoT) applications, machine learning (ML) is envisioned to be a key technique for big-data-driven modelling and analysis. Since massive data generated from these IoT devices are commonly collected and stored i...

Full description

Bibliographic Details
Main Author: Chen, Hao
Format: Doctoral or Postdoctoral Thesis
Language:English
Published: KTH, Teknisk informationsvetenskap 2022
Subjects:
DML
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-310374
Description
Summary:With the ever-increasing penetration and proliferation of various smart Internet of Things (IoT) applications, machine learning (ML) is envisioned to be a key technique for big-data-driven modelling and analysis. Since massive data generated from these IoT devices are commonly collected and stored in a distributed manner, ML at the networks, e.g., distributed machine learning (DML), has been a promising emerging paradigm, especially for large-scale model training. In this thesis, we explore the optimization and design of DML algorithms under different network conditions. Our main research with regards to DML can be sorted into the following four aspects/papers as detailed below. In the first part of the thesis, we explore fully-decentralized ML by utilizing alternating direction method of multipliers (ADMM). Specifically, to address the two main critical challenges in DML systems, i.e., communication bottleneck and stragglers (nodes/devices with slow responses), an error-control-coding-based stochastic incremental ADMM (csI-ADMM) is proposed. Given an appropriate mini-batch size, it is proved that the proposed csI-ADMM method has a $O( 1/\sqrt{k})$) convergence rate and $O(1/{\mu ^2})$ communication cost, where $k$ denotes the number of iterations and $\mu$ is the target accuracy. In addition, tradeoff between the convergence rate and the number of stragglers, as well as the relationship between mini-batch size and number of stragglers, are both theoretically and experimentally analyzed. In the second part of the thesis, we investigate the asynchronous approach for fully-decentralized federated learning (FL). Specifically, an asynchronous parallel incremental block-coordinate descent (API-BCD) algorithm is proposed, where multiple nodes/devices are active in an asynchronous fashion to accelerate the convergence speed. The solution convergence of API-BCD is theoretically proved and simulation results demonstrate its superior performance in terms of both running speed and communication costs compared with ...