H2CO Dataset

The deposited data sets were used to compare three state-of-the art machine learning (ML) approaches to obtain representations of potential energy surfaces (PESs). The comparison is meant to be representative as it examines a purely kernel-based approach (reproducing kernel Hilbert space plus forces...

Full description

Bibliographic Details
Main Authors: Käser, Silvan, Koner, Debasish, Christensen, Anders S., von Lilienfeld, O. Anatole, Meuwly, Markus
Format: Dataset
Language:English
Published: Zenodo 2020
Subjects:
Online Access:https://dx.doi.org/10.5281/zenodo.3923823
https://zenodo.org/record/3923823
id ftdatacite:10.5281/zenodo.3923823
record_format openpolar
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language English
topic Machine Learning
Formaldehyde
Neural Network
Quantum Chemistry
Potential Energy Surface
spellingShingle Machine Learning
Formaldehyde
Neural Network
Quantum Chemistry
Potential Energy Surface
Käser, Silvan
Koner, Debasish
Christensen, Anders S.
von Lilienfeld, O. Anatole
Meuwly, Markus
H2CO Dataset
topic_facet Machine Learning
Formaldehyde
Neural Network
Quantum Chemistry
Potential Energy Surface
description The deposited data sets were used to compare three state-of-the art machine learning (ML) approaches to obtain representations of potential energy surfaces (PESs). The comparison is meant to be representative as it examines a purely kernel-based approach (reproducing kernel Hilbert space plus forces (RKHS+F))[1], a purely neural network based approach (PhysNet)[2] and includes the FCHL representation [3] within kernel ridge regression. Formaldehyde, H2CO, is used as a benchmark system. H2CO is a small molecule for which PESs can be calculated at different levels of theory and, thus, suitable for an in-depth theoretical study. Also, very high-level calculations have already been presented (see e.g. Ref. [4]) and experimental reference data is available to compare with [5]. Using reference data calculated at three different levels of quantum chemical theory (B3LYP/cc-pVDZ, MP2/aug-cc-pVTZ and CCSD(T)-F12/aug-cc-pVTZ-F12) ML models are trained using the different ML methods. The performance of the models is then examined by considering energy and force learning curves, harmonic frequencies and IR spectra from finite-Temperature molecular dynamics (MD) simulations. The data sets contain different geometries for the H2CO molecule generated using the normal mode sampling approach [6] performed at different temperatures. Four data sets are deposited: i) "h2co_B3LYP_cc-pVDZ_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using ORCA [7] (B3LYP/cc-pVDZ). ii) "h2co_mp2_avtz_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (MP2/aug-cc-pVTZ). iii) "h2co_ccsdt_avtz_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (CCSD(T)-F12/aug-cc-pVTZ-F12). iv) "h2co_ccsdt_avtz_2500_extrapol.npz": 2500 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (CCSD(T)-F12/aug-cc-pVTZ-F12). This sampling was carried out at higher temperature (5000 K compared to 2000K) to test the extrapolation ability of the ML methods. For more details, see http://arxiv.org/abs/2006.16752 --------------------------------------------------------------------------------------- HOW TO CITE: When using this dataset, please cite the following paper: Käser, S. and Koner, D. and Christensen, A. S. and von Lilienfeld, O. A. and Meuwly, M. "ML Models of Vibrating H2CO: Comparing Reproducing Kernels, FCHL and PhysNet" arXiv:2006.16752 and the digital object identifier (DOI): Käser, S. and Koner, D. and Christensen, A. S. and von Lilienfeld, O. A. and Meuwly, M. (2020). H2CO Dataset. Zenodo. http://doi.org/10.5281/zenodo.3923823 --------------------------------------------------------------------------------------- [1] Koner, D.; Meuwly, M. arXiv e-prints 2020, arXiv:2005.04667 [2] Unke, O. T.; Meuwly, M. J. Chem. Theory Comput. 2019, 15, 3678–3693 [3] Faber, F. A.; Christensen, A. S.; Huang, B.; von Lilienfeld, O. A. J. Chem. Phys. 2018, 148, 241717 [4] Zhang, X.; Zou, S.; Harding, L. B.; Bowman, J. M. J. Phys. Chem. A 2004, 108, 8980–8986 [5] Herndon, S. C.; Nelson Jr, D. D.; Li, Y.; Zahniser, M. S. J. Quant. Spectrosc. Radiat. Transf. 2005, 90, 207–216 [6] Smith, J. S.; Isayev, O.; Roitberg, A. E. Sci. Data 2017, 4, 170193 [7] Neese, F. The ORCA program system. WIREs Comput. Mol. Sci. 2012, 2, 73–78 [8] Werner, H.-J.; Knowles, P. J.; Knizia, G.; Manby, F. R.; Schütz, M.; et al. https://www.molpro.net
format Dataset
author Käser, Silvan
Koner, Debasish
Christensen, Anders S.
von Lilienfeld, O. Anatole
Meuwly, Markus
author_facet Käser, Silvan
Koner, Debasish
Christensen, Anders S.
von Lilienfeld, O. Anatole
Meuwly, Markus
author_sort Käser, Silvan
title H2CO Dataset
title_short H2CO Dataset
title_full H2CO Dataset
title_fullStr H2CO Dataset
title_full_unstemmed H2CO Dataset
title_sort h2co dataset
publisher Zenodo
publishDate 2020
url https://dx.doi.org/10.5281/zenodo.3923823
https://zenodo.org/record/3923823
long_lat ENVELOPE(47.867,47.867,-67.967,-67.967)
ENVELOPE(75.033,75.033,-72.900,-72.900)
ENVELOPE(-60.883,-60.883,-71.800,-71.800)
geographic Christensen
Harding
Knowles
geographic_facet Christensen
Harding
Knowles
genre Orca
genre_facet Orca
op_relation https://dx.doi.org/10.5281/zenodo.3923822
op_rights Open Access
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
CC-BY-4.0
info:eu-repo/semantics/openAccess
op_rightsnorm CC-BY
op_doi https://doi.org/10.5281/zenodo.3923823
https://doi.org/10.5281/zenodo.3923822
_version_ 1766161628970090496
spelling ftdatacite:10.5281/zenodo.3923823 2023-05-15T17:53:55+02:00 H2CO Dataset Käser, Silvan Koner, Debasish Christensen, Anders S. von Lilienfeld, O. Anatole Meuwly, Markus 2020 https://dx.doi.org/10.5281/zenodo.3923823 https://zenodo.org/record/3923823 en eng Zenodo https://dx.doi.org/10.5281/zenodo.3923822 Open Access Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode CC-BY-4.0 info:eu-repo/semantics/openAccess CC-BY Machine Learning Formaldehyde Neural Network Quantum Chemistry Potential Energy Surface dataset Dataset 2020 ftdatacite https://doi.org/10.5281/zenodo.3923823 https://doi.org/10.5281/zenodo.3923822 2021-11-05T12:55:41Z The deposited data sets were used to compare three state-of-the art machine learning (ML) approaches to obtain representations of potential energy surfaces (PESs). The comparison is meant to be representative as it examines a purely kernel-based approach (reproducing kernel Hilbert space plus forces (RKHS+F))[1], a purely neural network based approach (PhysNet)[2] and includes the FCHL representation [3] within kernel ridge regression. Formaldehyde, H2CO, is used as a benchmark system. H2CO is a small molecule for which PESs can be calculated at different levels of theory and, thus, suitable for an in-depth theoretical study. Also, very high-level calculations have already been presented (see e.g. Ref. [4]) and experimental reference data is available to compare with [5]. Using reference data calculated at three different levels of quantum chemical theory (B3LYP/cc-pVDZ, MP2/aug-cc-pVTZ and CCSD(T)-F12/aug-cc-pVTZ-F12) ML models are trained using the different ML methods. The performance of the models is then examined by considering energy and force learning curves, harmonic frequencies and IR spectra from finite-Temperature molecular dynamics (MD) simulations. The data sets contain different geometries for the H2CO molecule generated using the normal mode sampling approach [6] performed at different temperatures. Four data sets are deposited: i) "h2co_B3LYP_cc-pVDZ_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using ORCA [7] (B3LYP/cc-pVDZ). ii) "h2co_mp2_avtz_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (MP2/aug-cc-pVTZ). iii) "h2co_ccsdt_avtz_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (CCSD(T)-F12/aug-cc-pVTZ-F12). iv) "h2co_ccsdt_avtz_2500_extrapol.npz": 2500 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (CCSD(T)-F12/aug-cc-pVTZ-F12). This sampling was carried out at higher temperature (5000 K compared to 2000K) to test the extrapolation ability of the ML methods. For more details, see http://arxiv.org/abs/2006.16752 --------------------------------------------------------------------------------------- HOW TO CITE: When using this dataset, please cite the following paper: Käser, S. and Koner, D. and Christensen, A. S. and von Lilienfeld, O. A. and Meuwly, M. "ML Models of Vibrating H2CO: Comparing Reproducing Kernels, FCHL and PhysNet" arXiv:2006.16752 and the digital object identifier (DOI): Käser, S. and Koner, D. and Christensen, A. S. and von Lilienfeld, O. A. and Meuwly, M. (2020). H2CO Dataset. Zenodo. http://doi.org/10.5281/zenodo.3923823 --------------------------------------------------------------------------------------- [1] Koner, D.; Meuwly, M. arXiv e-prints 2020, arXiv:2005.04667 [2] Unke, O. T.; Meuwly, M. J. Chem. Theory Comput. 2019, 15, 3678–3693 [3] Faber, F. A.; Christensen, A. S.; Huang, B.; von Lilienfeld, O. A. J. Chem. Phys. 2018, 148, 241717 [4] Zhang, X.; Zou, S.; Harding, L. B.; Bowman, J. M. J. Phys. Chem. A 2004, 108, 8980–8986 [5] Herndon, S. C.; Nelson Jr, D. D.; Li, Y.; Zahniser, M. S. J. Quant. Spectrosc. Radiat. Transf. 2005, 90, 207–216 [6] Smith, J. S.; Isayev, O.; Roitberg, A. E. Sci. Data 2017, 4, 170193 [7] Neese, F. The ORCA program system. WIREs Comput. Mol. Sci. 2012, 2, 73–78 [8] Werner, H.-J.; Knowles, P. J.; Knizia, G.; Manby, F. R.; Schütz, M.; et al. https://www.molpro.net Dataset Orca DataCite Metadata Store (German National Library of Science and Technology) Christensen ENVELOPE(47.867,47.867,-67.967,-67.967) Harding ENVELOPE(75.033,75.033,-72.900,-72.900) Knowles ENVELOPE(-60.883,-60.883,-71.800,-71.800)