High-performance multi-GPU analytic RI-MP2 energy gradients

This article presents a novel algorithm for the calculation of analytic energy gradients from second order Møller Plesset perturbation theory within the Resolution-of-the-Identity approximation (RI-MP2) which is designed to achieve high performance on multi-GPU clusters. The algorithm uses GPUs for...

Full description

Bibliographic Details
Main Authors: Stocks, Ryan, Palethorpe, Elise, Barca, Giuseppe Maria Junior
Format: Other/Unknown Material
Language:unknown
Published: American Chemical Society (ACS) 2024
Subjects:
Online Access:http://dx.doi.org/10.26434/chemrxiv-2024-hr1hf
https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/65cebfff66c1381729be393a/original/high-performance-multi-gpu-analytic-ri-mp2-energy-gradients.pdf
id cracsoc:10.26434/chemrxiv-2024-hr1hf
record_format openpolar
spelling cracsoc:10.26434/chemrxiv-2024-hr1hf 2024-04-07T07:55:15+00:00 High-performance multi-GPU analytic RI-MP2 energy gradients Stocks, Ryan Palethorpe, Elise Barca, Giuseppe Maria Junior 2024 http://dx.doi.org/10.26434/chemrxiv-2024-hr1hf https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/65cebfff66c1381729be393a/original/high-performance-multi-gpu-analytic-ri-mp2-energy-gradients.pdf unknown American Chemical Society (ACS) https://creativecommons.org/licenses/by-nc-nd/4.0/ posted-content 2024 cracsoc https://doi.org/10.26434/chemrxiv-2024-hr1hf 2024-03-08T00:14:51Z This article presents a novel algorithm for the calculation of analytic energy gradients from second order Møller Plesset perturbation theory within the Resolution-of-the-Identity approximation (RI-MP2) which is designed to achieve high performance on multi-GPU clusters. The algorithm uses GPUs for all major steps of the calculation, including integral generation, formation of all required intermediate tensors, solution of the Z-vector equation and gradient accumulation. The implementation in the EXtreme Scale Electronic Structure System (EXESS) software package includes a tailored, highly efficient, multi-stream scheduling system to hide CPU-GPU data transfer latencies and allows nodes with 8 A100 GPUs to operate at over 80% of theoretical peak floating-point performance. Comparative performance analysis shows a significant reduction in computational time relative to traditional multi-core CPU-based methods, with our approach achieving up to a 95-fold speedup over the single-node performance of established software such as Q-Chem and ORCA. Additionally, we demonstrate that pairing our implementation with the molecular fragmentation framework in EXESS can drastically lower the computational scaling of RI-MP2 gradient calculations from quintic to sub-quadratic, enabling further substantial savings in runtime while retaining high numerical accuracy in the resulting gradients. Other/Unknown Material Orca ACS Publications
institution Open Polar
collection ACS Publications
op_collection_id cracsoc
language unknown
description This article presents a novel algorithm for the calculation of analytic energy gradients from second order Møller Plesset perturbation theory within the Resolution-of-the-Identity approximation (RI-MP2) which is designed to achieve high performance on multi-GPU clusters. The algorithm uses GPUs for all major steps of the calculation, including integral generation, formation of all required intermediate tensors, solution of the Z-vector equation and gradient accumulation. The implementation in the EXtreme Scale Electronic Structure System (EXESS) software package includes a tailored, highly efficient, multi-stream scheduling system to hide CPU-GPU data transfer latencies and allows nodes with 8 A100 GPUs to operate at over 80% of theoretical peak floating-point performance. Comparative performance analysis shows a significant reduction in computational time relative to traditional multi-core CPU-based methods, with our approach achieving up to a 95-fold speedup over the single-node performance of established software such as Q-Chem and ORCA. Additionally, we demonstrate that pairing our implementation with the molecular fragmentation framework in EXESS can drastically lower the computational scaling of RI-MP2 gradient calculations from quintic to sub-quadratic, enabling further substantial savings in runtime while retaining high numerical accuracy in the resulting gradients.
format Other/Unknown Material
author Stocks, Ryan
Palethorpe, Elise
Barca, Giuseppe Maria Junior
spellingShingle Stocks, Ryan
Palethorpe, Elise
Barca, Giuseppe Maria Junior
High-performance multi-GPU analytic RI-MP2 energy gradients
author_facet Stocks, Ryan
Palethorpe, Elise
Barca, Giuseppe Maria Junior
author_sort Stocks, Ryan
title High-performance multi-GPU analytic RI-MP2 energy gradients
title_short High-performance multi-GPU analytic RI-MP2 energy gradients
title_full High-performance multi-GPU analytic RI-MP2 energy gradients
title_fullStr High-performance multi-GPU analytic RI-MP2 energy gradients
title_full_unstemmed High-performance multi-GPU analytic RI-MP2 energy gradients
title_sort high-performance multi-gpu analytic ri-mp2 energy gradients
publisher American Chemical Society (ACS)
publishDate 2024
url http://dx.doi.org/10.26434/chemrxiv-2024-hr1hf
https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/65cebfff66c1381729be393a/original/high-performance-multi-gpu-analytic-ri-mp2-energy-gradients.pdf
genre Orca
genre_facet Orca
op_rights https://creativecommons.org/licenses/by-nc-nd/4.0/
op_doi https://doi.org/10.26434/chemrxiv-2024-hr1hf
_version_ 1795672281876267008