Characterizing Distributed Machine Learning and Deep Learning Workloads

Cet article a été publié dans la Conférence francophone d'informatique en Parallélisme, Architecture et Système 2021. International audience Nowadays, machine learning (ML) is widely used in many application domains to analyze datasets and build decision making systems. With the rapid growth of...

Full description

Bibliographic Details
Main Authors: Djebrouni, Yasmine, Rocha, Isabelly, Bouchenak, Sara, Chen, Lydia, y, Felber, Pascal, Marangozova-Martin, Vania, Schiavoni, Valerio
Other Authors: Université Grenoble Alpes (UGA), Université de Neuchâtel = University of Neuchatel (UNINE), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA), Delft University of Technology (TU Delft)
Format: Conference Object
Language:English
Published: HAL CCSD 2021
Subjects:
DML
Online Access:https://hal.science/hal-03344132
https://hal.science/hal-03344132/document
https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf
id ftunivlyon:oai:HAL:hal-03344132v1
record_format openpolar
spelling ftunivlyon:oai:HAL:hal-03344132v1 2024-04-28T08:17:04+00:00 Characterizing Distributed Machine Learning and Deep Learning Workloads Djebrouni, Yasmine Rocha, Isabelly Bouchenak, Sara Chen, Lydia, y Felber, Pascal Marangozova-Martin, Vania Schiavoni, Valerio Université Grenoble Alpes (UGA) Université de Neuchâtel = University of Neuchatel (UNINE) Institut National des Sciences Appliquées de Lyon (INSA Lyon) Université de Lyon-Institut National des Sciences Appliquées (INSA) Delft University of Technology (TU Delft) Lyon, France 2021-07-06 https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf en eng HAL CCSD hal-03344132 https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf info:eu-repo/semantics/OpenAccess Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021) https://hal.science/hal-03344132 Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021), Jul 2021, Lyon, France Distributed Machine Learning Distributed Deep Learning Workload Characterization [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] info:eu-repo/semantics/conferenceObject Conference papers 2021 ftunivlyon 2024-04-10T15:07:09Z Cet article a été publié dans la Conférence francophone d'informatique en Parallélisme, Architecture et Système 2021. International audience Nowadays, machine learning (ML) is widely used in many application domains to analyze datasets and build decision making systems. With the rapid growth of data, ML users switched to distributed machine learning (DML) platforms for faster executions and large-scale training datasets. However, DML platforms introduce complex execution environments that are overwhelming for uninitiated users. To provide guidance for the tuning of DML platforms and achieve good performance, it is crucial to characterize DML workloads. In this work, we focus on popular DML and distributed deep learning (DDL) workloads leveraging Apache Spark. We characterize the impact of several platform parameters related to distributed executions such as parallelization, data shuffle and scheduling on performance. Based on our analysis, we derive key takeaways on DML/DDL workload patterns, as well as unexpected behavior of workloads based on ensemble learning methods. Conference Object DML Université de Lyon: HAL
institution Open Polar
collection Université de Lyon: HAL
op_collection_id ftunivlyon
language English
topic Distributed Machine Learning
Distributed Deep Learning
Workload Characterization
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[INFO.INFO-DC]Computer Science [cs]/Distributed
Parallel
and Cluster Computing [cs.DC]
spellingShingle Distributed Machine Learning
Distributed Deep Learning
Workload Characterization
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[INFO.INFO-DC]Computer Science [cs]/Distributed
Parallel
and Cluster Computing [cs.DC]
Djebrouni, Yasmine
Rocha, Isabelly
Bouchenak, Sara
Chen, Lydia, y
Felber, Pascal
Marangozova-Martin, Vania
Schiavoni, Valerio
Characterizing Distributed Machine Learning and Deep Learning Workloads
topic_facet Distributed Machine Learning
Distributed Deep Learning
Workload Characterization
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[INFO.INFO-DC]Computer Science [cs]/Distributed
Parallel
and Cluster Computing [cs.DC]
description Cet article a été publié dans la Conférence francophone d'informatique en Parallélisme, Architecture et Système 2021. International audience Nowadays, machine learning (ML) is widely used in many application domains to analyze datasets and build decision making systems. With the rapid growth of data, ML users switched to distributed machine learning (DML) platforms for faster executions and large-scale training datasets. However, DML platforms introduce complex execution environments that are overwhelming for uninitiated users. To provide guidance for the tuning of DML platforms and achieve good performance, it is crucial to characterize DML workloads. In this work, we focus on popular DML and distributed deep learning (DDL) workloads leveraging Apache Spark. We characterize the impact of several platform parameters related to distributed executions such as parallelization, data shuffle and scheduling on performance. Based on our analysis, we derive key takeaways on DML/DDL workload patterns, as well as unexpected behavior of workloads based on ensemble learning methods.
author2 Université Grenoble Alpes (UGA)
Université de Neuchâtel = University of Neuchatel (UNINE)
Institut National des Sciences Appliquées de Lyon (INSA Lyon)
Université de Lyon-Institut National des Sciences Appliquées (INSA)
Delft University of Technology (TU Delft)
format Conference Object
author Djebrouni, Yasmine
Rocha, Isabelly
Bouchenak, Sara
Chen, Lydia, y
Felber, Pascal
Marangozova-Martin, Vania
Schiavoni, Valerio
author_facet Djebrouni, Yasmine
Rocha, Isabelly
Bouchenak, Sara
Chen, Lydia, y
Felber, Pascal
Marangozova-Martin, Vania
Schiavoni, Valerio
author_sort Djebrouni, Yasmine
title Characterizing Distributed Machine Learning and Deep Learning Workloads
title_short Characterizing Distributed Machine Learning and Deep Learning Workloads
title_full Characterizing Distributed Machine Learning and Deep Learning Workloads
title_fullStr Characterizing Distributed Machine Learning and Deep Learning Workloads
title_full_unstemmed Characterizing Distributed Machine Learning and Deep Learning Workloads
title_sort characterizing distributed machine learning and deep learning workloads
publisher HAL CCSD
publishDate 2021
url https://hal.science/hal-03344132
https://hal.science/hal-03344132/document
https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf
op_coverage Lyon, France
genre DML
genre_facet DML
op_source Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021)
https://hal.science/hal-03344132
Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021), Jul 2021, Lyon, France
op_relation hal-03344132
https://hal.science/hal-03344132
https://hal.science/hal-03344132/document
https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1797581875224510464