2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning
Gradient synchronization, a process of communication among machines in large-scale distributed machine learning (DML), plays a crucial role in improving DML performance. Since the scale of distributed clusters is continuously expanding, state-of-the-art DML synchronization algorithms suffer from lat...
Published in: | IEEE Access |
---|---|
Main Authors: | , , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
IEEE
2020
|
Subjects: | |
Online Access: | https://doi.org/10.1109/ACCESS.2020.3028367 https://doaj.org/article/2a7d18f741b04137bd1063f720f5f800 |
id |
ftdoajarticles:oai:doaj.org/article:2a7d18f741b04137bd1063f720f5f800 |
---|---|
record_format |
openpolar |
spelling |
ftdoajarticles:oai:doaj.org/article:2a7d18f741b04137bd1063f720f5f800 2023-05-15T16:01:14+02:00 2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning Youhe Jiang Huaxi Gu Yunfeng Lu Xiaoshan Yu 2020-01-01T00:00:00Z https://doi.org/10.1109/ACCESS.2020.3028367 https://doaj.org/article/2a7d18f741b04137bd1063f720f5f800 EN eng IEEE https://ieeexplore.ieee.org/document/9211480/ https://doaj.org/toc/2169-3536 2169-3536 doi:10.1109/ACCESS.2020.3028367 https://doaj.org/article/2a7d18f741b04137bd1063f720f5f800 IEEE Access, Vol 8, Pp 183488-183494 (2020) Distributed machine learning large-scale cluster topology communication overhead all-reduce Electrical engineering. Electronics. Nuclear engineering TK1-9971 article 2020 ftdoajarticles https://doi.org/10.1109/ACCESS.2020.3028367 2022-12-31T05:35:45Z Gradient synchronization, a process of communication among machines in large-scale distributed machine learning (DML), plays a crucial role in improving DML performance. Since the scale of distributed clusters is continuously expanding, state-of-the-art DML synchronization algorithms suffer from latency for thousands of GPUs. In this article, we propose 2D-HRA, a two-dimensional hierarchical ring-based all-reduce algorithm in large-scale DML. 2D-HRA combines the ring with more latency-optimal hierarchical methods, and synchronizes parameters on two dimensions to make full use of the bandwidth. Simulation results show that 2D-HRA can efficiently alleviate the high latency and accelerate the synchronization process in large-scale clusters. Compared with traditional algorithms (ring based), 2D-HRA achieves up to 76.9% reduction in gradient synchronization time in clusters of different scale. Article in Journal/Newspaper DML Directory of Open Access Journals: DOAJ Articles IEEE Access 8 183488 183494 |
institution |
Open Polar |
collection |
Directory of Open Access Journals: DOAJ Articles |
op_collection_id |
ftdoajarticles |
language |
English |
topic |
Distributed machine learning large-scale cluster topology communication overhead all-reduce Electrical engineering. Electronics. Nuclear engineering TK1-9971 |
spellingShingle |
Distributed machine learning large-scale cluster topology communication overhead all-reduce Electrical engineering. Electronics. Nuclear engineering TK1-9971 Youhe Jiang Huaxi Gu Yunfeng Lu Xiaoshan Yu 2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning |
topic_facet |
Distributed machine learning large-scale cluster topology communication overhead all-reduce Electrical engineering. Electronics. Nuclear engineering TK1-9971 |
description |
Gradient synchronization, a process of communication among machines in large-scale distributed machine learning (DML), plays a crucial role in improving DML performance. Since the scale of distributed clusters is continuously expanding, state-of-the-art DML synchronization algorithms suffer from latency for thousands of GPUs. In this article, we propose 2D-HRA, a two-dimensional hierarchical ring-based all-reduce algorithm in large-scale DML. 2D-HRA combines the ring with more latency-optimal hierarchical methods, and synchronizes parameters on two dimensions to make full use of the bandwidth. Simulation results show that 2D-HRA can efficiently alleviate the high latency and accelerate the synchronization process in large-scale clusters. Compared with traditional algorithms (ring based), 2D-HRA achieves up to 76.9% reduction in gradient synchronization time in clusters of different scale. |
format |
Article in Journal/Newspaper |
author |
Youhe Jiang Huaxi Gu Yunfeng Lu Xiaoshan Yu |
author_facet |
Youhe Jiang Huaxi Gu Yunfeng Lu Xiaoshan Yu |
author_sort |
Youhe Jiang |
title |
2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning |
title_short |
2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning |
title_full |
2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning |
title_fullStr |
2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning |
title_full_unstemmed |
2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning |
title_sort |
2d-hra: two-dimensional hierarchical ring-based all-reduce algorithm in large-scale distributed machine learning |
publisher |
IEEE |
publishDate |
2020 |
url |
https://doi.org/10.1109/ACCESS.2020.3028367 https://doaj.org/article/2a7d18f741b04137bd1063f720f5f800 |
genre |
DML |
genre_facet |
DML |
op_source |
IEEE Access, Vol 8, Pp 183488-183494 (2020) |
op_relation |
https://ieeexplore.ieee.org/document/9211480/ https://doaj.org/toc/2169-3536 2169-3536 doi:10.1109/ACCESS.2020.3028367 https://doaj.org/article/2a7d18f741b04137bd1063f720f5f800 |
op_doi |
https://doi.org/10.1109/ACCESS.2020.3028367 |
container_title |
IEEE Access |
container_volume |
8 |
container_start_page |
183488 |
op_container_end_page |
183494 |
_version_ |
1766397178931052544 |