Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ...

This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev et al. and Barca et al. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry,...

Full description

Bibliographic Details
Main Authors: Palethorpe, Elise, Stocks, Ryan, Barca, Giuseppe M. J.
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2024
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.2407.21445
https://arxiv.org/abs/2407.21445
id ftdatacite:10.48550/arxiv.2407.21445
record_format openpolar
spelling ftdatacite:10.48550/arxiv.2407.21445 2024-09-30T14:40:58+00:00 Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ... Palethorpe, Elise Stocks, Ryan Barca, Giuseppe M. J. 2024 https://dx.doi.org/10.48550/arxiv.2407.21445 https://arxiv.org/abs/2407.21445 unknown arXiv Creative Commons Attribution Non Commercial No Derivatives 4.0 International https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode cc-by-nc-nd-4.0 Computational Physics physics.comp-ph Materials Science cond-mat.mtrl-sci Chemical Physics physics.chem-ph Quantum Physics quant-ph FOS: Physical sciences CreativeWork Preprint Article article 2024 ftdatacite https://doi.org/10.48550/arxiv.2407.21445 2024-09-02T07:49:42Z This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev et al. and Barca et al. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry, a linear scaling exchange matrix assembly algorithm, and extended capabilities for Hartree-Fock caculations up to $f$-type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-$ζ$ basis sets, while opt-UM is advantageous for large molecular systems. Performance benchmarks on NVIDIA A100 GPUs show that our algorithms in the EXtreme-scale Electronic Structure System (EXESS), when combined, outperform all current GPU and CPU Fock build implementations in TeraChem, QUICK, GPU4PySCF, LibIntX, ORCA, and Q-Chem. The implementations were benchmarked on linear and globular systems and average speed ups across three double-$ζ$ basis sets of 1.5$\times$, 5.2$\times$, and 8.5$\times$ ... Article in Journal/Newspaper Orca DataCite Hartree ENVELOPE(-44.716,-44.716,-60.783,-60.783)
institution Open Polar
collection DataCite
op_collection_id ftdatacite
language unknown
topic Computational Physics physics.comp-ph
Materials Science cond-mat.mtrl-sci
Chemical Physics physics.chem-ph
Quantum Physics quant-ph
FOS: Physical sciences
spellingShingle Computational Physics physics.comp-ph
Materials Science cond-mat.mtrl-sci
Chemical Physics physics.chem-ph
Quantum Physics quant-ph
FOS: Physical sciences
Palethorpe, Elise
Stocks, Ryan
Barca, Giuseppe M. J.
Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ...
topic_facet Computational Physics physics.comp-ph
Materials Science cond-mat.mtrl-sci
Chemical Physics physics.chem-ph
Quantum Physics quant-ph
FOS: Physical sciences
description This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev et al. and Barca et al. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry, a linear scaling exchange matrix assembly algorithm, and extended capabilities for Hartree-Fock caculations up to $f$-type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-$ζ$ basis sets, while opt-UM is advantageous for large molecular systems. Performance benchmarks on NVIDIA A100 GPUs show that our algorithms in the EXtreme-scale Electronic Structure System (EXESS), when combined, outperform all current GPU and CPU Fock build implementations in TeraChem, QUICK, GPU4PySCF, LibIntX, ORCA, and Q-Chem. The implementations were benchmarked on linear and globular systems and average speed ups across three double-$ζ$ basis sets of 1.5$\times$, 5.2$\times$, and 8.5$\times$ ...
format Article in Journal/Newspaper
author Palethorpe, Elise
Stocks, Ryan
Barca, Giuseppe M. J.
author_facet Palethorpe, Elise
Stocks, Ryan
Barca, Giuseppe M. J.
author_sort Palethorpe, Elise
title Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ...
title_short Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ...
title_full Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ...
title_fullStr Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ...
title_full_unstemmed Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ...
title_sort advanced techniques for high-performance fock matrix construction on gpu clusters ...
publisher arXiv
publishDate 2024
url https://dx.doi.org/10.48550/arxiv.2407.21445
https://arxiv.org/abs/2407.21445
long_lat ENVELOPE(-44.716,-44.716,-60.783,-60.783)
geographic Hartree
geographic_facet Hartree
genre Orca
genre_facet Orca
op_rights Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
cc-by-nc-nd-4.0
op_doi https://doi.org/10.48550/arxiv.2407.21445
_version_ 1811643419539275776