Characterizing Distributed Machine Learning and Deep Learning Workloads
Cet article a été publié dans la Conférence francophone d'informatique en Parallélisme, Architecture et Système 2021. International audience Nowadays, machine learning (ML) is widely used in many application domains to analyze datasets and build decision making systems. With the rapid growth of...
Main Authors: | , , , , , , |
---|---|
Other Authors: | , , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2021
|
Subjects: | |
Online Access: | https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf |
id |
ftunivlyon:oai:HAL:hal-03344132v1 |
---|---|
record_format |
openpolar |
spelling |
ftunivlyon:oai:HAL:hal-03344132v1 2024-04-28T08:17:04+00:00 Characterizing Distributed Machine Learning and Deep Learning Workloads Djebrouni, Yasmine Rocha, Isabelly Bouchenak, Sara Chen, Lydia, y Felber, Pascal Marangozova-Martin, Vania Schiavoni, Valerio Université Grenoble Alpes (UGA) Université de Neuchâtel = University of Neuchatel (UNINE) Institut National des Sciences Appliquées de Lyon (INSA Lyon) Université de Lyon-Institut National des Sciences Appliquées (INSA) Delft University of Technology (TU Delft) Lyon, France 2021-07-06 https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf en eng HAL CCSD hal-03344132 https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf info:eu-repo/semantics/OpenAccess Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021) https://hal.science/hal-03344132 Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021), Jul 2021, Lyon, France Distributed Machine Learning Distributed Deep Learning Workload Characterization [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] info:eu-repo/semantics/conferenceObject Conference papers 2021 ftunivlyon 2024-04-10T15:07:09Z Cet article a été publié dans la Conférence francophone d'informatique en Parallélisme, Architecture et Système 2021. International audience Nowadays, machine learning (ML) is widely used in many application domains to analyze datasets and build decision making systems. With the rapid growth of data, ML users switched to distributed machine learning (DML) platforms for faster executions and large-scale training datasets. However, DML platforms introduce complex execution environments that are overwhelming for uninitiated users. To provide guidance for the tuning of DML platforms and achieve good performance, it is crucial to characterize DML workloads. In this work, we focus on popular DML and distributed deep learning (DDL) workloads leveraging Apache Spark. We characterize the impact of several platform parameters related to distributed executions such as parallelization, data shuffle and scheduling on performance. Based on our analysis, we derive key takeaways on DML/DDL workload patterns, as well as unexpected behavior of workloads based on ensemble learning methods. Conference Object DML Université de Lyon: HAL |
institution |
Open Polar |
collection |
Université de Lyon: HAL |
op_collection_id |
ftunivlyon |
language |
English |
topic |
Distributed Machine Learning Distributed Deep Learning Workload Characterization [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] |
spellingShingle |
Distributed Machine Learning Distributed Deep Learning Workload Characterization [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] Djebrouni, Yasmine Rocha, Isabelly Bouchenak, Sara Chen, Lydia, y Felber, Pascal Marangozova-Martin, Vania Schiavoni, Valerio Characterizing Distributed Machine Learning and Deep Learning Workloads |
topic_facet |
Distributed Machine Learning Distributed Deep Learning Workload Characterization [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] |
description |
Cet article a été publié dans la Conférence francophone d'informatique en Parallélisme, Architecture et Système 2021. International audience Nowadays, machine learning (ML) is widely used in many application domains to analyze datasets and build decision making systems. With the rapid growth of data, ML users switched to distributed machine learning (DML) platforms for faster executions and large-scale training datasets. However, DML platforms introduce complex execution environments that are overwhelming for uninitiated users. To provide guidance for the tuning of DML platforms and achieve good performance, it is crucial to characterize DML workloads. In this work, we focus on popular DML and distributed deep learning (DDL) workloads leveraging Apache Spark. We characterize the impact of several platform parameters related to distributed executions such as parallelization, data shuffle and scheduling on performance. Based on our analysis, we derive key takeaways on DML/DDL workload patterns, as well as unexpected behavior of workloads based on ensemble learning methods. |
author2 |
Université Grenoble Alpes (UGA) Université de Neuchâtel = University of Neuchatel (UNINE) Institut National des Sciences Appliquées de Lyon (INSA Lyon) Université de Lyon-Institut National des Sciences Appliquées (INSA) Delft University of Technology (TU Delft) |
format |
Conference Object |
author |
Djebrouni, Yasmine Rocha, Isabelly Bouchenak, Sara Chen, Lydia, y Felber, Pascal Marangozova-Martin, Vania Schiavoni, Valerio |
author_facet |
Djebrouni, Yasmine Rocha, Isabelly Bouchenak, Sara Chen, Lydia, y Felber, Pascal Marangozova-Martin, Vania Schiavoni, Valerio |
author_sort |
Djebrouni, Yasmine |
title |
Characterizing Distributed Machine Learning and Deep Learning Workloads |
title_short |
Characterizing Distributed Machine Learning and Deep Learning Workloads |
title_full |
Characterizing Distributed Machine Learning and Deep Learning Workloads |
title_fullStr |
Characterizing Distributed Machine Learning and Deep Learning Workloads |
title_full_unstemmed |
Characterizing Distributed Machine Learning and Deep Learning Workloads |
title_sort |
characterizing distributed machine learning and deep learning workloads |
publisher |
HAL CCSD |
publishDate |
2021 |
url |
https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf |
op_coverage |
Lyon, France |
genre |
DML |
genre_facet |
DML |
op_source |
Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021) https://hal.science/hal-03344132 Conférence francophone d'informatique en Parallélisme, Architecture et Système (ComPAS'2021), Jul 2021, Lyon, France |
op_relation |
hal-03344132 https://hal.science/hal-03344132 https://hal.science/hal-03344132/document https://hal.science/hal-03344132/file/COMPAS2021_paper_12%20%2810%29.pdf |
op_rights |
info:eu-repo/semantics/OpenAccess |
_version_ |
1797581875224510464 |