H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training

Deep neural networks have become one of the popular techniques used in many research and application areas including computer vision, natural language processing, etc. As the complexity of neural networks continuously increasing, the training process takes a much longer time and requires more comput...

Full description

Bibliographic Details
Published in:	IEEE Access
Main Authors:	Lintao Xian, Bingzhe Li, Jing Liu, Zhongwen Guo, David H. C. Du
Format:	Article in Journal/Newspaper
Language:	English
Published:	IEEE 2021
Subjects:	Distributed machine learning (DML) heterogeneous environments dynamically scheduling tasks pipeline communication and computation dynamic quantization parameter Electrical engineering. Electronics. Nuclear engineering TK1-9971 DML
Online Access:	https://doi.org/10.1109/ACCESS.2021.3060154 https://doaj.org/article/f4fe240ef8134af6bcf1ea9587e06172

id	ftdoajarticles:oai:doaj.org/article:f4fe240ef8134af6bcf1ea9587e06172
record_format	openpolar
spelling	ftdoajarticles:oai:doaj.org/article:f4fe240ef8134af6bcf1ea9587e06172 2023-05-15T16:02:06+02:00 H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training Lintao Xian Bingzhe Li Jing Liu Zhongwen Guo David H. C. Du 2021-01-01T00:00:00Z https://doi.org/10.1109/ACCESS.2021.3060154 https://doaj.org/article/f4fe240ef8134af6bcf1ea9587e06172 EN eng IEEE https://ieeexplore.ieee.org/document/9356607/ https://doaj.org/toc/2169-3536 2169-3536 doi:10.1109/ACCESS.2021.3060154 https://doaj.org/article/f4fe240ef8134af6bcf1ea9587e06172 IEEE Access, Vol 9, Pp 44049-44058 (2021) Distributed machine learning (DML) heterogeneous environments dynamically scheduling tasks pipeline communication and computation dynamic quantization parameter Electrical engineering. Electronics. Nuclear engineering TK1-9971 article 2021 ftdoajarticles https://doi.org/10.1109/ACCESS.2021.3060154 2022-12-31T07:55:49Z Deep neural networks have become one of the popular techniques used in many research and application areas including computer vision, natural language processing, etc. As the complexity of neural networks continuously increasing, the training process takes a much longer time and requires more computation resources. To speed up the training process, a centralized distributed training structure named Parameter Server (PS) is widely used to assign training tasks to different workers/nodes. Most existing studies considered all workers having the same computation resources. However, in a heterogeneous environment, fast workers (i.e., workers having more computation resources) can complete tasks earlier than slow workers and thus the system does not fully utilize the resources of fast workers. In this paper, we propose a PS model with heterogeneous types of workers/nodes, called H-PS, which can fully utilize the resources of each worker by dynamically scheduling tasks based on the current status of the workers (e.g., available memory). By doing so, the workers will complete their tasks at the same time and the stragglers (i.e., workers fall behind others) can be avoided. In addition, a pipeline scheme is proposed to further improve the effectiveness of workers by fully utilizing the resources of workers during the time of parameters transmitting between PS and workers. Moreover, a flexible quantization scheme is proposed to reduce the communication overhead between the PS and workers. Finally, the H-PS is implemented using Containers which is an emerging lightweight technology. The experimental results indicate that the proposed H-PS can reduce the overall training time by 1.4x – 3.5x when compared with existing methods. Article in Journal/Newspaper DML Directory of Open Access Journals: DOAJ Articles IEEE Access 9 44049 44058
institution	Open Polar
collection	Directory of Open Access Journals: DOAJ Articles
op_collection_id	ftdoajarticles
language	English
topic	Distributed machine learning (DML) heterogeneous environments dynamically scheduling tasks pipeline communication and computation dynamic quantization parameter Electrical engineering. Electronics. Nuclear engineering TK1-9971
spellingShingle	Distributed machine learning (DML) heterogeneous environments dynamically scheduling tasks pipeline communication and computation dynamic quantization parameter Electrical engineering. Electronics. Nuclear engineering TK1-9971 Lintao Xian Bingzhe Li Jing Liu Zhongwen Guo David H. C. Du H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
topic_facet	Distributed machine learning (DML) heterogeneous environments dynamically scheduling tasks pipeline communication and computation dynamic quantization parameter Electrical engineering. Electronics. Nuclear engineering TK1-9971
description	Deep neural networks have become one of the popular techniques used in many research and application areas including computer vision, natural language processing, etc. As the complexity of neural networks continuously increasing, the training process takes a much longer time and requires more computation resources. To speed up the training process, a centralized distributed training structure named Parameter Server (PS) is widely used to assign training tasks to different workers/nodes. Most existing studies considered all workers having the same computation resources. However, in a heterogeneous environment, fast workers (i.e., workers having more computation resources) can complete tasks earlier than slow workers and thus the system does not fully utilize the resources of fast workers. In this paper, we propose a PS model with heterogeneous types of workers/nodes, called H-PS, which can fully utilize the resources of each worker by dynamically scheduling tasks based on the current status of the workers (e.g., available memory). By doing so, the workers will complete their tasks at the same time and the stragglers (i.e., workers fall behind others) can be avoided. In addition, a pipeline scheme is proposed to further improve the effectiveness of workers by fully utilizing the resources of workers during the time of parameters transmitting between PS and workers. Moreover, a flexible quantization scheme is proposed to reduce the communication overhead between the PS and workers. Finally, the H-PS is implemented using Containers which is an emerging lightweight technology. The experimental results indicate that the proposed H-PS can reduce the overall training time by 1.4x – 3.5x when compared with existing methods.
format	Article in Journal/Newspaper
author	Lintao Xian Bingzhe Li Jing Liu Zhongwen Guo David H. C. Du
author_facet	Lintao Xian Bingzhe Li Jing Liu Zhongwen Guo David H. C. Du
author_sort	Lintao Xian
title	H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
title_short	H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
title_full	H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
title_fullStr	H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
title_full_unstemmed	H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training
title_sort	h-ps: a heterogeneous-aware parameter server with distributed neural network training
publisher	IEEE
publishDate	2021
url	https://doi.org/10.1109/ACCESS.2021.3060154 https://doaj.org/article/f4fe240ef8134af6bcf1ea9587e06172
genre	DML
genre_facet	DML
op_source	IEEE Access, Vol 9, Pp 44049-44058 (2021)
op_relation	https://ieeexplore.ieee.org/document/9356607/ https://doaj.org/toc/2169-3536 2169-3536 doi:10.1109/ACCESS.2021.3060154 https://doaj.org/article/f4fe240ef8134af6bcf1ea9587e06172
op_doi	https://doi.org/10.1109/ACCESS.2021.3060154
container_title	IEEE Access
container_volume	9
container_start_page	44049
op_container_end_page	44058
_version_	1766397715652018176

H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training

Similar Items