2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning

Gradient synchronization, a process of communication among machines in large-scale distributed machine learning (DML), plays a crucial role in improving DML performance. Since the scale of distributed clusters is continuously expanding, state-of-the-art DML synchronization algorithms suffer from lat...

Full description

Bibliographic Details
Published in:	IEEE Access
Main Authors:	Youhe Jiang, Huaxi Gu, Yunfeng Lu, Xiaoshan Yu
Format:	Article in Journal/Newspaper
Language:	English
Published:	IEEE 2020
Subjects:	Distributed machine learning large-scale cluster topology communication overhead all-reduce Electrical engineering. Electronics. Nuclear engineering TK1-9971 DML
Online Access:	https://doi.org/10.1109/ACCESS.2020.3028367 https://doaj.org/article/2a7d18f741b04137bd1063f720f5f800

Description
Summary:	Gradient synchronization, a process of communication among machines in large-scale distributed machine learning (DML), plays a crucial role in improving DML performance. Since the scale of distributed clusters is continuously expanding, state-of-the-art DML synchronization algorithms suffer from latency for thousands of GPUs. In this article, we propose 2D-HRA, a two-dimensional hierarchical ring-based all-reduce algorithm in large-scale DML. 2D-HRA combines the ring with more latency-optimal hierarchical methods, and synchronizes parameters on two dimensions to make full use of the bandwidth. Simulation results show that 2D-HRA can efficiently alleviate the high latency and accelerate the synchronization process in large-scale clusters. Compared with traditional algorithms (ring based), 2D-HRA achieves up to 76.9% reduction in gradient synchronization time in clusters of different scale.

2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning

Similar Items