Extract, transform, load framework for the conversion of health databases to OMOP
Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data...
Published in: | PLOS ONE |
---|---|
Main Authors: | , , , , , |
Format: | Text |
Language: | English |
Published: |
Public Library of Science
2022
|
Subjects: | |
Online Access: | http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000122/ http://www.ncbi.nlm.nih.gov/pubmed/35404974 https://doi.org/10.1371/journal.pone.0266911 |
id |
ftpubmed:oai:pubmedcentral.nih.gov:9000122 |
---|---|
record_format |
openpolar |
spelling |
ftpubmed:oai:pubmedcentral.nih.gov:9000122 2023-05-15T16:01:39+02:00 Extract, transform, load framework for the conversion of health databases to OMOP Quiroz, Juan C. Chard, Tim Sa, Zhisheng Ritchie, Angus Jorm, Louisa Gallego, Blanca 2022-04-11 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000122/ http://www.ncbi.nlm.nih.gov/pubmed/35404974 https://doi.org/10.1371/journal.pone.0266911 en eng Public Library of Science http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000122/ http://www.ncbi.nlm.nih.gov/pubmed/35404974 http://dx.doi.org/10.1371/journal.pone.0266911 © 2022 Quiroz et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. CC-BY PLoS One Research Article Text 2022 ftpubmed https://doi.org/10.1371/journal.pone.0266911 2022-04-17T01:03:15Z Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data models. While there is a strong incentive to convert datasets to OMOP, the conversion is time and resource-intensive, leaving the research community in need of tools for mapping data to OMOP. We propose an extract, transform, load (ETL) framework that is metadata-driven and generic across source datasets. The ETL framework uses a new data manipulation language (DML) that organizes SQL snippets in YAML. Our framework includes a compiler that converts YAML files with mapping logic into an ETL script. Access to the ETL framework is available via a web application, allowing users to upload and edit YAML files via web editor and obtain an ETL SQL script for use in development environments. The structure of the DML maximizes readability, refactoring, and maintainability, while minimizing technical debt and standardizing the writing of ETL operations for mapping to OMOP. Our framework also supports transparency of the mapping process and reuse by different institutions. Text DML PubMed Central (PMC) PLOS ONE 17 4 e0266911 |
institution |
Open Polar |
collection |
PubMed Central (PMC) |
op_collection_id |
ftpubmed |
language |
English |
topic |
Research Article |
spellingShingle |
Research Article Quiroz, Juan C. Chard, Tim Sa, Zhisheng Ritchie, Angus Jorm, Louisa Gallego, Blanca Extract, transform, load framework for the conversion of health databases to OMOP |
topic_facet |
Research Article |
description |
Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data models. While there is a strong incentive to convert datasets to OMOP, the conversion is time and resource-intensive, leaving the research community in need of tools for mapping data to OMOP. We propose an extract, transform, load (ETL) framework that is metadata-driven and generic across source datasets. The ETL framework uses a new data manipulation language (DML) that organizes SQL snippets in YAML. Our framework includes a compiler that converts YAML files with mapping logic into an ETL script. Access to the ETL framework is available via a web application, allowing users to upload and edit YAML files via web editor and obtain an ETL SQL script for use in development environments. The structure of the DML maximizes readability, refactoring, and maintainability, while minimizing technical debt and standardizing the writing of ETL operations for mapping to OMOP. Our framework also supports transparency of the mapping process and reuse by different institutions. |
format |
Text |
author |
Quiroz, Juan C. Chard, Tim Sa, Zhisheng Ritchie, Angus Jorm, Louisa Gallego, Blanca |
author_facet |
Quiroz, Juan C. Chard, Tim Sa, Zhisheng Ritchie, Angus Jorm, Louisa Gallego, Blanca |
author_sort |
Quiroz, Juan C. |
title |
Extract, transform, load framework for the conversion of health databases to OMOP |
title_short |
Extract, transform, load framework for the conversion of health databases to OMOP |
title_full |
Extract, transform, load framework for the conversion of health databases to OMOP |
title_fullStr |
Extract, transform, load framework for the conversion of health databases to OMOP |
title_full_unstemmed |
Extract, transform, load framework for the conversion of health databases to OMOP |
title_sort |
extract, transform, load framework for the conversion of health databases to omop |
publisher |
Public Library of Science |
publishDate |
2022 |
url |
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000122/ http://www.ncbi.nlm.nih.gov/pubmed/35404974 https://doi.org/10.1371/journal.pone.0266911 |
genre |
DML |
genre_facet |
DML |
op_source |
PLoS One |
op_relation |
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000122/ http://www.ncbi.nlm.nih.gov/pubmed/35404974 http://dx.doi.org/10.1371/journal.pone.0266911 |
op_rights |
© 2022 Quiroz et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
op_rightsnorm |
CC-BY |
op_doi |
https://doi.org/10.1371/journal.pone.0266911 |
container_title |
PLOS ONE |
container_volume |
17 |
container_issue |
4 |
container_start_page |
e0266911 |
_version_ |
1766397421642842112 |