Characterizing Distributed Machine Learning and Deep Learning Workloads

Cet article a été publié dans la Conférence francophone d'informatique en Parallélisme, Architecture et Système 2021. International audience Nowadays, machine learning (ML) is widely used in many application domains to analyze datasets and build decision making systems. With the rapid growth of...

Full description

Bibliographic Details
Main Authors:	Djebrouni, Yasmine, Rocha, Isabelly, Bouchenak, Sara, Chen, Lydia, y, Felber, Pascal, Marangozova-Martin, Vania, Schiavoni, Valerio
Other Authors:	Université Grenoble Alpes (UGA), Université de Neuchâtel = University of Neuchatel (UNINE), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA), Delft University of Technology (TU Delft)
Format:	Conference Object
Language:	English
Published:	HAL CCSD 2021
Subjects:	Distributed Machine Learning Distributed Deep Learning Workload Characterization [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] DML
Online Access:	https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf

id	ftunivlyon:oai:HAL:hal-03344132v1
record_format	openpolar
spelling	ftunivlyon:oai:HAL:hal-03344132v1 2024-04-28T08:17:04+00:00 Characterizing Distributed Machine Learning and Deep Learning Workloads Djebrouni, Yasmine Rocha, Isabelly Bouchenak, Sara Chen, Lydia, y Felber, Pascal Marangozova-Martin, Vania Schiavoni, Valerio Université Grenoble Alpes (UGA) Université de Neuchâtel = University of Neuchatel (UNINE) Institut National des Sciences Appliquées de Lyon (INSA Lyon) Université de Lyon-Institut National des Sciences Appliquées (INSA) Delft University of Technology (TU Delft) Lyon, France 2021-07-06 https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf en eng HAL CCSD hal-03344132 https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf info:eu-repo/semantics/OpenAccess Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021) https://hal.science/hal-03344132 Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021), Jul 2021, Lyon, France Distributed Machine Learning Distributed Deep Learning Workload Characterization [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] info:eu-repo/semantics/conferenceObject Conference papers 2021 ftunivlyon 2024-04-10T15:07:09Z Cet article a été publié dans la Conférence francophone d'informatique en Parallélisme, Architecture et Système 2021. International audience Nowadays, machine learning (ML) is widely used in many application domains to analyze datasets and build decision making systems. With the rapid growth of data, ML users switched to distributed machine learning (DML) platforms for faster executions and large-scale training datasets. However, DML platforms introduce complex execution environments that are overwhelming for uninitiated users. To provide guidance for the tuning of DML platforms and achieve good performance, it is crucial to characterize DML workloads. In this work, we focus on popular DML and distributed deep learning (DDL) workloads leveraging Apache Spark. We characterize the impact of several platform parameters related to distributed executions such as parallelization, data shuffle and scheduling on performance. Based on our analysis, we derive key takeaways on DML/DDL workload patterns, as well as unexpected behavior of workloads based on ensemble learning methods. Conference Object DML Université de Lyon: HAL
institution	Open Polar
collection	Université de Lyon: HAL
op_collection_id	ftunivlyon
language	English
topic	Distributed Machine Learning Distributed Deep Learning Workload Characterization [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]
spellingShingle	Distributed Machine Learning Distributed Deep Learning Workload Characterization [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] Djebrouni, Yasmine Rocha, Isabelly Bouchenak, Sara Chen, Lydia, y Felber, Pascal Marangozova-Martin, Vania Schiavoni, Valerio Characterizing Distributed Machine Learning and Deep Learning Workloads
topic_facet	Distributed Machine Learning Distributed Deep Learning Workload Characterization [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]
description	Cet article a été publié dans la Conférence francophone d'informatique en Parallélisme, Architecture et Système 2021. International audience Nowadays, machine learning (ML) is widely used in many application domains to analyze datasets and build decision making systems. With the rapid growth of data, ML users switched to distributed machine learning (DML) platforms for faster executions and large-scale training datasets. However, DML platforms introduce complex execution environments that are overwhelming for uninitiated users. To provide guidance for the tuning of DML platforms and achieve good performance, it is crucial to characterize DML workloads. In this work, we focus on popular DML and distributed deep learning (DDL) workloads leveraging Apache Spark. We characterize the impact of several platform parameters related to distributed executions such as parallelization, data shuffle and scheduling on performance. Based on our analysis, we derive key takeaways on DML/DDL workload patterns, as well as unexpected behavior of workloads based on ensemble learning methods.
author2	Université Grenoble Alpes (UGA) Université de Neuchâtel = University of Neuchatel (UNINE) Institut National des Sciences Appliquées de Lyon (INSA Lyon) Université de Lyon-Institut National des Sciences Appliquées (INSA) Delft University of Technology (TU Delft)
format	Conference Object
author	Djebrouni, Yasmine Rocha, Isabelly Bouchenak, Sara Chen, Lydia, y Felber, Pascal Marangozova-Martin, Vania Schiavoni, Valerio
author_facet	Djebrouni, Yasmine Rocha, Isabelly Bouchenak, Sara Chen, Lydia, y Felber, Pascal Marangozova-Martin, Vania Schiavoni, Valerio
author_sort	Djebrouni, Yasmine
title	Characterizing Distributed Machine Learning and Deep Learning Workloads
title_short	Characterizing Distributed Machine Learning and Deep Learning Workloads
title_full	Characterizing Distributed Machine Learning and Deep Learning Workloads
title_fullStr	Characterizing Distributed Machine Learning and Deep Learning Workloads
title_full_unstemmed	Characterizing Distributed Machine Learning and Deep Learning Workloads
title_sort	characterizing distributed machine learning and deep learning workloads
publisher	HAL CCSD
publishDate	2021
url	https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf
op_coverage	Lyon, France
genre	DML
genre_facet	DML
op_source	Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021) https://hal.science/hal-03344132 Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021), Jul 2021, Lyon, France
op_relation	hal-03344132 https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf
op_rights	info:eu-repo/semantics/OpenAccess
_version_	1797581875224510464

Characterizing Distributed Machine Learning and Deep Learning Workloads

Similar Items