Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ...
This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev et al. and Barca et al. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry,...
Main Authors: | , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
arXiv
2024
|
Subjects: | |
Online Access: | https://dx.doi.org/10.48550/arxiv.2407.21445 https://arxiv.org/abs/2407.21445 |
id |
ftdatacite:10.48550/arxiv.2407.21445 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.48550/arxiv.2407.21445 2024-09-30T14:40:58+00:00 Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ... Palethorpe, Elise Stocks, Ryan Barca, Giuseppe M. J. 2024 https://dx.doi.org/10.48550/arxiv.2407.21445 https://arxiv.org/abs/2407.21445 unknown arXiv Creative Commons Attribution Non Commercial No Derivatives 4.0 International https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode cc-by-nc-nd-4.0 Computational Physics physics.comp-ph Materials Science cond-mat.mtrl-sci Chemical Physics physics.chem-ph Quantum Physics quant-ph FOS: Physical sciences CreativeWork Preprint Article article 2024 ftdatacite https://doi.org/10.48550/arxiv.2407.21445 2024-09-02T07:49:42Z This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev et al. and Barca et al. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry, a linear scaling exchange matrix assembly algorithm, and extended capabilities for Hartree-Fock caculations up to $f$-type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-$ζ$ basis sets, while opt-UM is advantageous for large molecular systems. Performance benchmarks on NVIDIA A100 GPUs show that our algorithms in the EXtreme-scale Electronic Structure System (EXESS), when combined, outperform all current GPU and CPU Fock build implementations in TeraChem, QUICK, GPU4PySCF, LibIntX, ORCA, and Q-Chem. The implementations were benchmarked on linear and globular systems and average speed ups across three double-$ζ$ basis sets of 1.5$\times$, 5.2$\times$, and 8.5$\times$ ... Article in Journal/Newspaper Orca DataCite Hartree ENVELOPE(-44.716,-44.716,-60.783,-60.783) |
institution |
Open Polar |
collection |
DataCite |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Computational Physics physics.comp-ph Materials Science cond-mat.mtrl-sci Chemical Physics physics.chem-ph Quantum Physics quant-ph FOS: Physical sciences |
spellingShingle |
Computational Physics physics.comp-ph Materials Science cond-mat.mtrl-sci Chemical Physics physics.chem-ph Quantum Physics quant-ph FOS: Physical sciences Palethorpe, Elise Stocks, Ryan Barca, Giuseppe M. J. Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ... |
topic_facet |
Computational Physics physics.comp-ph Materials Science cond-mat.mtrl-sci Chemical Physics physics.chem-ph Quantum Physics quant-ph FOS: Physical sciences |
description |
This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev et al. and Barca et al. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry, a linear scaling exchange matrix assembly algorithm, and extended capabilities for Hartree-Fock caculations up to $f$-type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-$ζ$ basis sets, while opt-UM is advantageous for large molecular systems. Performance benchmarks on NVIDIA A100 GPUs show that our algorithms in the EXtreme-scale Electronic Structure System (EXESS), when combined, outperform all current GPU and CPU Fock build implementations in TeraChem, QUICK, GPU4PySCF, LibIntX, ORCA, and Q-Chem. The implementations were benchmarked on linear and globular systems and average speed ups across three double-$ζ$ basis sets of 1.5$\times$, 5.2$\times$, and 8.5$\times$ ... |
format |
Article in Journal/Newspaper |
author |
Palethorpe, Elise Stocks, Ryan Barca, Giuseppe M. J. |
author_facet |
Palethorpe, Elise Stocks, Ryan Barca, Giuseppe M. J. |
author_sort |
Palethorpe, Elise |
title |
Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ... |
title_short |
Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ... |
title_full |
Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ... |
title_fullStr |
Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ... |
title_full_unstemmed |
Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters ... |
title_sort |
advanced techniques for high-performance fock matrix construction on gpu clusters ... |
publisher |
arXiv |
publishDate |
2024 |
url |
https://dx.doi.org/10.48550/arxiv.2407.21445 https://arxiv.org/abs/2407.21445 |
long_lat |
ENVELOPE(-44.716,-44.716,-60.783,-60.783) |
geographic |
Hartree |
geographic_facet |
Hartree |
genre |
Orca |
genre_facet |
Orca |
op_rights |
Creative Commons Attribution Non Commercial No Derivatives 4.0 International https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode cc-by-nc-nd-4.0 |
op_doi |
https://doi.org/10.48550/arxiv.2407.21445 |
_version_ |
1811643419539275776 |