Extract, transform, load framework for the conversion of health databases to OMOP
Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data...
Main Authors: | , , , , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Public Library of Science (PLoS)
2022
|
Subjects: | |
Online Access: | https://doaj.org/article/e3925ac16e2e4574a3b5f761c068a5d0 |
id |
ftdoajarticles:oai:doaj.org/article:e3925ac16e2e4574a3b5f761c068a5d0 |
---|---|
record_format |
openpolar |
spelling |
ftdoajarticles:oai:doaj.org/article:e3925ac16e2e4574a3b5f761c068a5d0 2023-05-15T16:01:39+02:00 Extract, transform, load framework for the conversion of health databases to OMOP Juan C. Quiroz Tim Chard Zhisheng Sa Angus Ritchie Louisa Jorm Blanca Gallego 2022-01-01T00:00:00Z https://doaj.org/article/e3925ac16e2e4574a3b5f761c068a5d0 EN eng Public Library of Science (PLoS) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000122/?tool=EBI https://doaj.org/toc/1932-6203 1932-6203 https://doaj.org/article/e3925ac16e2e4574a3b5f761c068a5d0 PLoS ONE, Vol 17, Iss 4 (2022) Medicine R Science Q article 2022 ftdoajarticles 2022-12-30T23:31:09Z Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data models. While there is a strong incentive to convert datasets to OMOP, the conversion is time and resource-intensive, leaving the research community in need of tools for mapping data to OMOP. We propose an extract, transform, load (ETL) framework that is metadata-driven and generic across source datasets. The ETL framework uses a new data manipulation language (DML) that organizes SQL snippets in YAML. Our framework includes a compiler that converts YAML files with mapping logic into an ETL script. Access to the ETL framework is available via a web application, allowing users to upload and edit YAML files via web editor and obtain an ETL SQL script for use in development environments. The structure of the DML maximizes readability, refactoring, and maintainability, while minimizing technical debt and standardizing the writing of ETL operations for mapping to OMOP. Our framework also supports transparency of the mapping process and reuse by different institutions. Article in Journal/Newspaper DML Directory of Open Access Journals: DOAJ Articles |
institution |
Open Polar |
collection |
Directory of Open Access Journals: DOAJ Articles |
op_collection_id |
ftdoajarticles |
language |
English |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Juan C. Quiroz Tim Chard Zhisheng Sa Angus Ritchie Louisa Jorm Blanca Gallego Extract, transform, load framework for the conversion of health databases to OMOP |
topic_facet |
Medicine R Science Q |
description |
Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data models. While there is a strong incentive to convert datasets to OMOP, the conversion is time and resource-intensive, leaving the research community in need of tools for mapping data to OMOP. We propose an extract, transform, load (ETL) framework that is metadata-driven and generic across source datasets. The ETL framework uses a new data manipulation language (DML) that organizes SQL snippets in YAML. Our framework includes a compiler that converts YAML files with mapping logic into an ETL script. Access to the ETL framework is available via a web application, allowing users to upload and edit YAML files via web editor and obtain an ETL SQL script for use in development environments. The structure of the DML maximizes readability, refactoring, and maintainability, while minimizing technical debt and standardizing the writing of ETL operations for mapping to OMOP. Our framework also supports transparency of the mapping process and reuse by different institutions. |
format |
Article in Journal/Newspaper |
author |
Juan C. Quiroz Tim Chard Zhisheng Sa Angus Ritchie Louisa Jorm Blanca Gallego |
author_facet |
Juan C. Quiroz Tim Chard Zhisheng Sa Angus Ritchie Louisa Jorm Blanca Gallego |
author_sort |
Juan C. Quiroz |
title |
Extract, transform, load framework for the conversion of health databases to OMOP |
title_short |
Extract, transform, load framework for the conversion of health databases to OMOP |
title_full |
Extract, transform, load framework for the conversion of health databases to OMOP |
title_fullStr |
Extract, transform, load framework for the conversion of health databases to OMOP |
title_full_unstemmed |
Extract, transform, load framework for the conversion of health databases to OMOP |
title_sort |
extract, transform, load framework for the conversion of health databases to omop |
publisher |
Public Library of Science (PLoS) |
publishDate |
2022 |
url |
https://doaj.org/article/e3925ac16e2e4574a3b5f761c068a5d0 |
genre |
DML |
genre_facet |
DML |
op_source |
PLoS ONE, Vol 17, Iss 4 (2022) |
op_relation |
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000122/?tool=EBI https://doaj.org/toc/1932-6203 1932-6203 https://doaj.org/article/e3925ac16e2e4574a3b5f761c068a5d0 |
_version_ |
1766397425781571584 |