Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ...

This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev et al. and Barca et al. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry,...

Full description

Bibliographic Details
Main Authors: Palethorpe, Elise, Stocks, Ryan, Barca, Giuseppe M. J.
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2024
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.2407.21445
https://arxiv.org/abs/2407.21445
Description
Summary:This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev et al. and Barca et al. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry, a linear scaling exchange matrix assembly algorithm, and extended capabilities for Hartree-Fock caculations up to $f$-type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-$ζ$ basis sets, while opt-UM is advantageous for large molecular systems. Performance benchmarks on NVIDIA A100 GPUs show that our algorithms in the EXtreme-scale Electronic Structure System (EXESS), when combined, outperform all current GPU and CPU Fock build implementations in TeraChem, QUICK, GPU4PySCF, LibIntX, ORCA, and Q-Chem. The implementations were benchmarked on linear and globular systems and average speed ups across three double-$ζ$ basis sets of 1.5$\times$, 5.2$\times$, and 8.5$\times$ ...