Extract, transform, load framework for the conversion of health databases to OMOP.

Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data...

Full description

Bibliographic Details
Published in:PLOS ONE
Main Authors: Juan C Quiroz, Tim Chard, Zhisheng Sa, Angus Ritchie, Louisa Jorm, Blanca Gallego
Format: Article in Journal/Newspaper
Language:English
Published: Public Library of Science (PLoS) 2022
Subjects:
R
Q
DML
Online Access:https://doi.org/10.1371/journal.pone.0266911
https://doaj.org/article/147c7a6db7404f3899591db24d3e7e1d
id ftdoajarticles:oai:doaj.org/article:147c7a6db7404f3899591db24d3e7e1d
record_format openpolar
spelling ftdoajarticles:oai:doaj.org/article:147c7a6db7404f3899591db24d3e7e1d 2023-05-15T16:01:39+02:00 Extract, transform, load framework for the conversion of health databases to OMOP. Juan C Quiroz Tim Chard Zhisheng Sa Angus Ritchie Louisa Jorm Blanca Gallego 2022-01-01T00:00:00Z https://doi.org/10.1371/journal.pone.0266911 https://doaj.org/article/147c7a6db7404f3899591db24d3e7e1d EN eng Public Library of Science (PLoS) https://doi.org/10.1371/journal.pone.0266911 https://doaj.org/toc/1932-6203 1932-6203 doi:10.1371/journal.pone.0266911 https://doaj.org/article/147c7a6db7404f3899591db24d3e7e1d PLoS ONE, Vol 17, Iss 4, p e0266911 (2022) Medicine R Science Q article 2022 ftdoajarticles https://doi.org/10.1371/journal.pone.0266911 2022-12-30T21:30:34Z Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data models. While there is a strong incentive to convert datasets to OMOP, the conversion is time and resource-intensive, leaving the research community in need of tools for mapping data to OMOP. We propose an extract, transform, load (ETL) framework that is metadata-driven and generic across source datasets. The ETL framework uses a new data manipulation language (DML) that organizes SQL snippets in YAML. Our framework includes a compiler that converts YAML files with mapping logic into an ETL script. Access to the ETL framework is available via a web application, allowing users to upload and edit YAML files via web editor and obtain an ETL SQL script for use in development environments. The structure of the DML maximizes readability, refactoring, and maintainability, while minimizing technical debt and standardizing the writing of ETL operations for mapping to OMOP. Our framework also supports transparency of the mapping process and reuse by different institutions. Article in Journal/Newspaper DML Directory of Open Access Journals: DOAJ Articles PLOS ONE 17 4 e0266911
institution Open Polar
collection Directory of Open Access Journals: DOAJ Articles
op_collection_id ftdoajarticles
language English
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Juan C Quiroz
Tim Chard
Zhisheng Sa
Angus Ritchie
Louisa Jorm
Blanca Gallego
Extract, transform, load framework for the conversion of health databases to OMOP.
topic_facet Medicine
R
Science
Q
description Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data models. While there is a strong incentive to convert datasets to OMOP, the conversion is time and resource-intensive, leaving the research community in need of tools for mapping data to OMOP. We propose an extract, transform, load (ETL) framework that is metadata-driven and generic across source datasets. The ETL framework uses a new data manipulation language (DML) that organizes SQL snippets in YAML. Our framework includes a compiler that converts YAML files with mapping logic into an ETL script. Access to the ETL framework is available via a web application, allowing users to upload and edit YAML files via web editor and obtain an ETL SQL script for use in development environments. The structure of the DML maximizes readability, refactoring, and maintainability, while minimizing technical debt and standardizing the writing of ETL operations for mapping to OMOP. Our framework also supports transparency of the mapping process and reuse by different institutions.
format Article in Journal/Newspaper
author Juan C Quiroz
Tim Chard
Zhisheng Sa
Angus Ritchie
Louisa Jorm
Blanca Gallego
author_facet Juan C Quiroz
Tim Chard
Zhisheng Sa
Angus Ritchie
Louisa Jorm
Blanca Gallego
author_sort Juan C Quiroz
title Extract, transform, load framework for the conversion of health databases to OMOP.
title_short Extract, transform, load framework for the conversion of health databases to OMOP.
title_full Extract, transform, load framework for the conversion of health databases to OMOP.
title_fullStr Extract, transform, load framework for the conversion of health databases to OMOP.
title_full_unstemmed Extract, transform, load framework for the conversion of health databases to OMOP.
title_sort extract, transform, load framework for the conversion of health databases to omop.
publisher Public Library of Science (PLoS)
publishDate 2022
url https://doi.org/10.1371/journal.pone.0266911
https://doaj.org/article/147c7a6db7404f3899591db24d3e7e1d
genre DML
genre_facet DML
op_source PLoS ONE, Vol 17, Iss 4, p e0266911 (2022)
op_relation https://doi.org/10.1371/journal.pone.0266911
https://doaj.org/toc/1932-6203
1932-6203
doi:10.1371/journal.pone.0266911
https://doaj.org/article/147c7a6db7404f3899591db24d3e7e1d
op_doi https://doi.org/10.1371/journal.pone.0266911
container_title PLOS ONE
container_volume 17
container_issue 4
container_start_page e0266911
_version_ 1766397427195052032