High-performance multi-GPU analytic RI-MP2 energy gradients

This article presents a novel algorithm for the calculation of analytic energy gradients from second order Møller Plesset perturbation theory within the Resolution-of-the-Identity approximation (RI-MP2) which is designed to achieve high performance on multi-GPU clusters. The algorithm uses GPUs for...

Full description

Bibliographic Details
Main Authors: Stocks, Ryan, Palethorpe, Elise, Barca, Giuseppe Maria Junior
Format: Other/Unknown Material
Language:unknown
Published: American Chemical Society (ACS) 2024
Subjects:
Online Access:http://dx.doi.org/10.26434/chemrxiv-2024-hr1hf
https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/65cebfff66c1381729be393a/original/high-performance-multi-gpu-analytic-ri-mp2-energy-gradients.pdf
Description
Summary:This article presents a novel algorithm for the calculation of analytic energy gradients from second order Møller Plesset perturbation theory within the Resolution-of-the-Identity approximation (RI-MP2) which is designed to achieve high performance on multi-GPU clusters. The algorithm uses GPUs for all major steps of the calculation, including integral generation, formation of all required intermediate tensors, solution of the Z-vector equation and gradient accumulation. The implementation in the EXtreme Scale Electronic Structure System (EXESS) software package includes a tailored, highly efficient, multi-stream scheduling system to hide CPU-GPU data transfer latencies and allows nodes with 8 A100 GPUs to operate at over 80% of theoretical peak floating-point performance. Comparative performance analysis shows a significant reduction in computational time relative to traditional multi-core CPU-based methods, with our approach achieving up to a 95-fold speedup over the single-node performance of established software such as Q-Chem and ORCA. Additionally, we demonstrate that pairing our implementation with the molecular fragmentation framework in EXESS can drastically lower the computational scaling of RI-MP2 gradient calculations from quintic to sub-quadratic, enabling further substantial savings in runtime while retaining high numerical accuracy in the resulting gradients.