H2CO Dataset
The deposited data sets were used to compare three state-of-the art machine learning (ML) approaches to obtain representations of potential energy surfaces (PESs). The comparison is meant to be representative as it examines a purely kernel-based approach (reproducing kernel Hilbert space plus forces...
Main Authors: | , , , , |
---|---|
Format: | Dataset |
Language: | English |
Published: |
Zenodo
2020
|
Subjects: | |
Online Access: | https://dx.doi.org/10.5281/zenodo.3923823 https://zenodo.org/record/3923823 |
id |
ftdatacite:10.5281/zenodo.3923823 |
---|---|
record_format |
openpolar |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
English |
topic |
Machine Learning Formaldehyde Neural Network Quantum Chemistry Potential Energy Surface |
spellingShingle |
Machine Learning Formaldehyde Neural Network Quantum Chemistry Potential Energy Surface Käser, Silvan Koner, Debasish Christensen, Anders S. von Lilienfeld, O. Anatole Meuwly, Markus H2CO Dataset |
topic_facet |
Machine Learning Formaldehyde Neural Network Quantum Chemistry Potential Energy Surface |
description |
The deposited data sets were used to compare three state-of-the art machine learning (ML) approaches to obtain representations of potential energy surfaces (PESs). The comparison is meant to be representative as it examines a purely kernel-based approach (reproducing kernel Hilbert space plus forces (RKHS+F))[1], a purely neural network based approach (PhysNet)[2] and includes the FCHL representation [3] within kernel ridge regression. Formaldehyde, H2CO, is used as a benchmark system. H2CO is a small molecule for which PESs can be calculated at different levels of theory and, thus, suitable for an in-depth theoretical study. Also, very high-level calculations have already been presented (see e.g. Ref. [4]) and experimental reference data is available to compare with [5]. Using reference data calculated at three different levels of quantum chemical theory (B3LYP/cc-pVDZ, MP2/aug-cc-pVTZ and CCSD(T)-F12/aug-cc-pVTZ-F12) ML models are trained using the different ML methods. The performance of the models is then examined by considering energy and force learning curves, harmonic frequencies and IR spectra from finite-Temperature molecular dynamics (MD) simulations. The data sets contain different geometries for the H2CO molecule generated using the normal mode sampling approach [6] performed at different temperatures. Four data sets are deposited: i) "h2co_B3LYP_cc-pVDZ_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using ORCA [7] (B3LYP/cc-pVDZ). ii) "h2co_mp2_avtz_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (MP2/aug-cc-pVTZ). iii) "h2co_ccsdt_avtz_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (CCSD(T)-F12/aug-cc-pVTZ-F12). iv) "h2co_ccsdt_avtz_2500_extrapol.npz": 2500 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (CCSD(T)-F12/aug-cc-pVTZ-F12). This sampling was carried out at higher temperature (5000 K compared to 2000K) to test the extrapolation ability of the ML methods. For more details, see http://arxiv.org/abs/2006.16752 --------------------------------------------------------------------------------------- HOW TO CITE: When using this dataset, please cite the following paper: Käser, S. and Koner, D. and Christensen, A. S. and von Lilienfeld, O. A. and Meuwly, M. "ML Models of Vibrating H2CO: Comparing Reproducing Kernels, FCHL and PhysNet" arXiv:2006.16752 and the digital object identifier (DOI): Käser, S. and Koner, D. and Christensen, A. S. and von Lilienfeld, O. A. and Meuwly, M. (2020). H2CO Dataset. Zenodo. http://doi.org/10.5281/zenodo.3923823 --------------------------------------------------------------------------------------- [1] Koner, D.; Meuwly, M. arXiv e-prints 2020, arXiv:2005.04667 [2] Unke, O. T.; Meuwly, M. J. Chem. Theory Comput. 2019, 15, 3678–3693 [3] Faber, F. A.; Christensen, A. S.; Huang, B.; von Lilienfeld, O. A. J. Chem. Phys. 2018, 148, 241717 [4] Zhang, X.; Zou, S.; Harding, L. B.; Bowman, J. M. J. Phys. Chem. A 2004, 108, 8980–8986 [5] Herndon, S. C.; Nelson Jr, D. D.; Li, Y.; Zahniser, M. S. J. Quant. Spectrosc. Radiat. Transf. 2005, 90, 207–216 [6] Smith, J. S.; Isayev, O.; Roitberg, A. E. Sci. Data 2017, 4, 170193 [7] Neese, F. The ORCA program system. WIREs Comput. Mol. Sci. 2012, 2, 73–78 [8] Werner, H.-J.; Knowles, P. J.; Knizia, G.; Manby, F. R.; Schütz, M.; et al. https://www.molpro.net |
format |
Dataset |
author |
Käser, Silvan Koner, Debasish Christensen, Anders S. von Lilienfeld, O. Anatole Meuwly, Markus |
author_facet |
Käser, Silvan Koner, Debasish Christensen, Anders S. von Lilienfeld, O. Anatole Meuwly, Markus |
author_sort |
Käser, Silvan |
title |
H2CO Dataset |
title_short |
H2CO Dataset |
title_full |
H2CO Dataset |
title_fullStr |
H2CO Dataset |
title_full_unstemmed |
H2CO Dataset |
title_sort |
h2co dataset |
publisher |
Zenodo |
publishDate |
2020 |
url |
https://dx.doi.org/10.5281/zenodo.3923823 https://zenodo.org/record/3923823 |
long_lat |
ENVELOPE(47.867,47.867,-67.967,-67.967) ENVELOPE(75.033,75.033,-72.900,-72.900) ENVELOPE(-60.883,-60.883,-71.800,-71.800) |
geographic |
Christensen Harding Knowles |
geographic_facet |
Christensen Harding Knowles |
genre |
Orca |
genre_facet |
Orca |
op_relation |
https://dx.doi.org/10.5281/zenodo.3923822 |
op_rights |
Open Access Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode CC-BY-4.0 info:eu-repo/semantics/openAccess |
op_rightsnorm |
CC-BY |
op_doi |
https://doi.org/10.5281/zenodo.3923823 https://doi.org/10.5281/zenodo.3923822 |
_version_ |
1766161628970090496 |
spelling |
ftdatacite:10.5281/zenodo.3923823 2023-05-15T17:53:55+02:00 H2CO Dataset Käser, Silvan Koner, Debasish Christensen, Anders S. von Lilienfeld, O. Anatole Meuwly, Markus 2020 https://dx.doi.org/10.5281/zenodo.3923823 https://zenodo.org/record/3923823 en eng Zenodo https://dx.doi.org/10.5281/zenodo.3923822 Open Access Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode CC-BY-4.0 info:eu-repo/semantics/openAccess CC-BY Machine Learning Formaldehyde Neural Network Quantum Chemistry Potential Energy Surface dataset Dataset 2020 ftdatacite https://doi.org/10.5281/zenodo.3923823 https://doi.org/10.5281/zenodo.3923822 2021-11-05T12:55:41Z The deposited data sets were used to compare three state-of-the art machine learning (ML) approaches to obtain representations of potential energy surfaces (PESs). The comparison is meant to be representative as it examines a purely kernel-based approach (reproducing kernel Hilbert space plus forces (RKHS+F))[1], a purely neural network based approach (PhysNet)[2] and includes the FCHL representation [3] within kernel ridge regression. Formaldehyde, H2CO, is used as a benchmark system. H2CO is a small molecule for which PESs can be calculated at different levels of theory and, thus, suitable for an in-depth theoretical study. Also, very high-level calculations have already been presented (see e.g. Ref. [4]) and experimental reference data is available to compare with [5]. Using reference data calculated at three different levels of quantum chemical theory (B3LYP/cc-pVDZ, MP2/aug-cc-pVTZ and CCSD(T)-F12/aug-cc-pVTZ-F12) ML models are trained using the different ML methods. The performance of the models is then examined by considering energy and force learning curves, harmonic frequencies and IR spectra from finite-Temperature molecular dynamics (MD) simulations. The data sets contain different geometries for the H2CO molecule generated using the normal mode sampling approach [6] performed at different temperatures. Four data sets are deposited: i) "h2co_B3LYP_cc-pVDZ_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using ORCA [7] (B3LYP/cc-pVDZ). ii) "h2co_mp2_avtz_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (MP2/aug-cc-pVTZ). iii) "h2co_ccsdt_avtz_4001.npz": 4001 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (CCSD(T)-F12/aug-cc-pVTZ-F12). iv) "h2co_ccsdt_avtz_2500_extrapol.npz": 2500 geometries of H2CO generated using normal mode sampling and calculated using MOLPRO 2019 [8] (CCSD(T)-F12/aug-cc-pVTZ-F12). This sampling was carried out at higher temperature (5000 K compared to 2000K) to test the extrapolation ability of the ML methods. For more details, see http://arxiv.org/abs/2006.16752 --------------------------------------------------------------------------------------- HOW TO CITE: When using this dataset, please cite the following paper: Käser, S. and Koner, D. and Christensen, A. S. and von Lilienfeld, O. A. and Meuwly, M. "ML Models of Vibrating H2CO: Comparing Reproducing Kernels, FCHL and PhysNet" arXiv:2006.16752 and the digital object identifier (DOI): Käser, S. and Koner, D. and Christensen, A. S. and von Lilienfeld, O. A. and Meuwly, M. (2020). H2CO Dataset. Zenodo. http://doi.org/10.5281/zenodo.3923823 --------------------------------------------------------------------------------------- [1] Koner, D.; Meuwly, M. arXiv e-prints 2020, arXiv:2005.04667 [2] Unke, O. T.; Meuwly, M. J. Chem. Theory Comput. 2019, 15, 3678–3693 [3] Faber, F. A.; Christensen, A. S.; Huang, B.; von Lilienfeld, O. A. J. Chem. Phys. 2018, 148, 241717 [4] Zhang, X.; Zou, S.; Harding, L. B.; Bowman, J. M. J. Phys. Chem. A 2004, 108, 8980–8986 [5] Herndon, S. C.; Nelson Jr, D. D.; Li, Y.; Zahniser, M. S. J. Quant. Spectrosc. Radiat. Transf. 2005, 90, 207–216 [6] Smith, J. S.; Isayev, O.; Roitberg, A. E. Sci. Data 2017, 4, 170193 [7] Neese, F. The ORCA program system. WIREs Comput. Mol. Sci. 2012, 2, 73–78 [8] Werner, H.-J.; Knowles, P. J.; Knizia, G.; Manby, F. R.; Schütz, M.; et al. https://www.molpro.net Dataset Orca DataCite Metadata Store (German National Library of Science and Technology) Christensen ENVELOPE(47.867,47.867,-67.967,-67.967) Harding ENVELOPE(75.033,75.033,-72.900,-72.900) Knowles ENVELOPE(-60.883,-60.883,-71.800,-71.800) |