High-performance multi-GPU analytic RI-MP2 energy gradients
This article presents a novel algorithm for the calculation of analytic energy gradients from second order Møller Plesset perturbation theory within the Resolution-of-the-Identity approximation (RI-MP2) which is designed to achieve high performance on multi-GPU clusters. The algorithm uses GPUs for...
Main Authors: | , , |
---|---|
Format: | Other/Unknown Material |
Language: | unknown |
Published: |
American Chemical Society (ACS)
2024
|
Subjects: | |
Online Access: | http://dx.doi.org/10.26434/chemrxiv-2024-hr1hf https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/65cebfff66c1381729be393a/original/high-performance-multi-gpu-analytic-ri-mp2-energy-gradients.pdf |
id |
cracsoc:10.26434/chemrxiv-2024-hr1hf |
---|---|
record_format |
openpolar |
spelling |
cracsoc:10.26434/chemrxiv-2024-hr1hf 2024-04-07T07:55:15+00:00 High-performance multi-GPU analytic RI-MP2 energy gradients Stocks, Ryan Palethorpe, Elise Barca, Giuseppe Maria Junior 2024 http://dx.doi.org/10.26434/chemrxiv-2024-hr1hf https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/65cebfff66c1381729be393a/original/high-performance-multi-gpu-analytic-ri-mp2-energy-gradients.pdf unknown American Chemical Society (ACS) https://creativecommons.org/licenses/by-nc-nd/4.0/ posted-content 2024 cracsoc https://doi.org/10.26434/chemrxiv-2024-hr1hf 2024-03-08T00:14:51Z This article presents a novel algorithm for the calculation of analytic energy gradients from second order Møller Plesset perturbation theory within the Resolution-of-the-Identity approximation (RI-MP2) which is designed to achieve high performance on multi-GPU clusters. The algorithm uses GPUs for all major steps of the calculation, including integral generation, formation of all required intermediate tensors, solution of the Z-vector equation and gradient accumulation. The implementation in the EXtreme Scale Electronic Structure System (EXESS) software package includes a tailored, highly efficient, multi-stream scheduling system to hide CPU-GPU data transfer latencies and allows nodes with 8 A100 GPUs to operate at over 80% of theoretical peak floating-point performance. Comparative performance analysis shows a significant reduction in computational time relative to traditional multi-core CPU-based methods, with our approach achieving up to a 95-fold speedup over the single-node performance of established software such as Q-Chem and ORCA. Additionally, we demonstrate that pairing our implementation with the molecular fragmentation framework in EXESS can drastically lower the computational scaling of RI-MP2 gradient calculations from quintic to sub-quadratic, enabling further substantial savings in runtime while retaining high numerical accuracy in the resulting gradients. Other/Unknown Material Orca ACS Publications |
institution |
Open Polar |
collection |
ACS Publications |
op_collection_id |
cracsoc |
language |
unknown |
description |
This article presents a novel algorithm for the calculation of analytic energy gradients from second order Møller Plesset perturbation theory within the Resolution-of-the-Identity approximation (RI-MP2) which is designed to achieve high performance on multi-GPU clusters. The algorithm uses GPUs for all major steps of the calculation, including integral generation, formation of all required intermediate tensors, solution of the Z-vector equation and gradient accumulation. The implementation in the EXtreme Scale Electronic Structure System (EXESS) software package includes a tailored, highly efficient, multi-stream scheduling system to hide CPU-GPU data transfer latencies and allows nodes with 8 A100 GPUs to operate at over 80% of theoretical peak floating-point performance. Comparative performance analysis shows a significant reduction in computational time relative to traditional multi-core CPU-based methods, with our approach achieving up to a 95-fold speedup over the single-node performance of established software such as Q-Chem and ORCA. Additionally, we demonstrate that pairing our implementation with the molecular fragmentation framework in EXESS can drastically lower the computational scaling of RI-MP2 gradient calculations from quintic to sub-quadratic, enabling further substantial savings in runtime while retaining high numerical accuracy in the resulting gradients. |
format |
Other/Unknown Material |
author |
Stocks, Ryan Palethorpe, Elise Barca, Giuseppe Maria Junior |
spellingShingle |
Stocks, Ryan Palethorpe, Elise Barca, Giuseppe Maria Junior High-performance multi-GPU analytic RI-MP2 energy gradients |
author_facet |
Stocks, Ryan Palethorpe, Elise Barca, Giuseppe Maria Junior |
author_sort |
Stocks, Ryan |
title |
High-performance multi-GPU analytic RI-MP2 energy gradients |
title_short |
High-performance multi-GPU analytic RI-MP2 energy gradients |
title_full |
High-performance multi-GPU analytic RI-MP2 energy gradients |
title_fullStr |
High-performance multi-GPU analytic RI-MP2 energy gradients |
title_full_unstemmed |
High-performance multi-GPU analytic RI-MP2 energy gradients |
title_sort |
high-performance multi-gpu analytic ri-mp2 energy gradients |
publisher |
American Chemical Society (ACS) |
publishDate |
2024 |
url |
http://dx.doi.org/10.26434/chemrxiv-2024-hr1hf https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/65cebfff66c1381729be393a/original/high-performance-multi-gpu-analytic-ri-mp2-energy-gradients.pdf |
genre |
Orca |
genre_facet |
Orca |
op_rights |
https://creativecommons.org/licenses/by-nc-nd/4.0/ |
op_doi |
https://doi.org/10.26434/chemrxiv-2024-hr1hf |
_version_ |
1795672281876267008 |