Wailord: Parsers and Reproducibility for Quantum Chemistry

Much of the scientific python ecosystem deals with problems at the level when their structure is already present in memory. However, the generation of input files for driving existing codes, as well as the parsing of results is not typically covered in great detail. This presentation bridges the gap...

Full description

Bibliographic Details
Main Author: Rohit Goswami
Format: Conference Object
Language:unknown
Published: 2022
Subjects:
Online Access:https://zenodo.org/record/7325038
https://doi.org/10.5281/zenodo.7325038
Description
Summary:Much of the scientific python ecosystem deals with problems at the level when their structure is already present in memory. However, the generation of input files for driving existing codes, as well as the parsing of results is not typically covered in great detail. This presentation bridges the gap between external programs and data-structures, demonstrating via a practical example, the utility of code-generation and parsing expression grammar parsers for reproducible results in quantum chemistry. More details at: https://rgoswami.me/posts/scipycon-2022-meta The concept of a crisis of reproducibility in scientific research needs no introduction. Although there are several tooling approaches on can take to reduce the cognitive load of keeping track of various steps of an analysis pipeline [1], there remains an almost linguistic gap when it comes to interfacing with domain specific tools. We demonstrate the role of parsers in the reproducibility workflow. By focusing on the generation of input files and the structured extraction of output data, we will aim to plug a gap in the generation of reproducible reports, namely, interfacing (via file I/O) with existing software. The file I/O interface justifiably has many detractors, especially on an HPC (high performance computing) cluster, I/O can be a bottleneck. However, when faced with an opaque binary which outputs freeform results, powered by an input file which has little to no structure beyond a 1500 page manual of keyword arguments, the utility of a domain specific parser can pay off immensely. In our quest to translate domain intuition into computational input constraints, we will work in a reduced grammar, an intermediate representation (IR). Such an IR can be generated for multiple program specifications, so extensions to other software is not difficult either. As a concrete realization of an abstract concept, we will discuss Wailord [2], which uses parsimonious [3] and cookiecutter [4] to interface with ORCA [5], a popular free (but not open source) ...