Integrating interdisciplinary data: The EMERGE Database and its broader lessons for data management best practices

In environmental research, cross-disciplinary analyses enable the discovery of novel insights that may not otherwise be evident. Doing these analyses efficiently requires integration of heterogeneous data into a common data structure; however, this type of data integration represents a major challen...

Full description

Bibliographic Details
Main Authors: Hodgkins, Suzanne, Bolduc, Benjamin, Miller, Dustin, Rich, Virginia, Institute, EMERGE Biology Integration
Format: Other/Unknown Material
Language:unknown
Published: Authorea, Inc. 2024
Subjects:
Online Access:http://dx.doi.org/10.22541/essoar.171352167.73281727/v1
id crwinnower:10.22541/essoar.171352167.73281727/v1
record_format openpolar
spelling crwinnower:10.22541/essoar.171352167.73281727/v1 2024-06-02T08:12:13+00:00 Integrating interdisciplinary data: The EMERGE Database and its broader lessons for data management best practices Hodgkins, Suzanne Bolduc, Benjamin Miller, Dustin Rich, Virginia Institute, EMERGE Biology Integration 2024 http://dx.doi.org/10.22541/essoar.171352167.73281727/v1 unknown Authorea, Inc. posted-content 2024 crwinnower https://doi.org/10.22541/essoar.171352167.73281727/v1 2024-05-07T14:19:16Z In environmental research, cross-disciplinary analyses enable the discovery of novel insights that may not otherwise be evident. Doing these analyses efficiently requires integration of heterogeneous data into a common data structure; however, this type of data integration represents a major challenge, especially for large, multi-institutional projects. Not only should the sharing of individual datasets follow FAIR principles (Findable, Accessible, Interoperable, Reusable), but the ideal data management system should also include a central multidisciplinary data organization framework. The EMERGE Database (EMERGE-DB; https://emerge-db.asc.ohio-state.edu/ ) is the central data hub of the EMERGE Biology Integration Institute (NSF award # 2022070), which investigates the changing dynamics of a thawing permafrost ecosystem in Stordalen Mire, northern Sweden. The EMERGE-DB accomplishes the essential tasks of data management (i.e., data storage and sharing), while also offering more advanced functionality to facilitate interdisciplinary collaboration. Data and standardized metadata—including both sample and file metadata—are integrated within a Neo4j graph database, which allows combined datasets from different source files to be obtained via efficient custom queries. A front-end web portal provides access to this data for both the public and for EMERGE project members (who can access non-public data via login), with different pages providing different “views” of the database for different common use cases. Although data are still deposited to external community repositories (e.g. Zenodo, NCBI databases) to ensure cost-effective long-term accessibility, these depositions are tracked within the EMERGE-DB’s standardized metadata system, with all internally- and externally-stored datasets displayed within a centralized page on the web portal. Although this data integration and sharing framework is customized for the EMERGE project’s needs, many of its guiding principles—such as the centralized web access point for all ... Other/Unknown Material Northern Sweden permafrost The Winnower Access Point ENVELOPE(-63.783,-63.783,-64.833,-64.833) Stordalen ENVELOPE(7.337,7.337,62.510,62.510)
institution Open Polar
collection The Winnower
op_collection_id crwinnower
language unknown
description In environmental research, cross-disciplinary analyses enable the discovery of novel insights that may not otherwise be evident. Doing these analyses efficiently requires integration of heterogeneous data into a common data structure; however, this type of data integration represents a major challenge, especially for large, multi-institutional projects. Not only should the sharing of individual datasets follow FAIR principles (Findable, Accessible, Interoperable, Reusable), but the ideal data management system should also include a central multidisciplinary data organization framework. The EMERGE Database (EMERGE-DB; https://emerge-db.asc.ohio-state.edu/ ) is the central data hub of the EMERGE Biology Integration Institute (NSF award # 2022070), which investigates the changing dynamics of a thawing permafrost ecosystem in Stordalen Mire, northern Sweden. The EMERGE-DB accomplishes the essential tasks of data management (i.e., data storage and sharing), while also offering more advanced functionality to facilitate interdisciplinary collaboration. Data and standardized metadata—including both sample and file metadata—are integrated within a Neo4j graph database, which allows combined datasets from different source files to be obtained via efficient custom queries. A front-end web portal provides access to this data for both the public and for EMERGE project members (who can access non-public data via login), with different pages providing different “views” of the database for different common use cases. Although data are still deposited to external community repositories (e.g. Zenodo, NCBI databases) to ensure cost-effective long-term accessibility, these depositions are tracked within the EMERGE-DB’s standardized metadata system, with all internally- and externally-stored datasets displayed within a centralized page on the web portal. Although this data integration and sharing framework is customized for the EMERGE project’s needs, many of its guiding principles—such as the centralized web access point for all ...
format Other/Unknown Material
author Hodgkins, Suzanne
Bolduc, Benjamin
Miller, Dustin
Rich, Virginia
Institute, EMERGE Biology Integration
spellingShingle Hodgkins, Suzanne
Bolduc, Benjamin
Miller, Dustin
Rich, Virginia
Institute, EMERGE Biology Integration
Integrating interdisciplinary data: The EMERGE Database and its broader lessons for data management best practices
author_facet Hodgkins, Suzanne
Bolduc, Benjamin
Miller, Dustin
Rich, Virginia
Institute, EMERGE Biology Integration
author_sort Hodgkins, Suzanne
title Integrating interdisciplinary data: The EMERGE Database and its broader lessons for data management best practices
title_short Integrating interdisciplinary data: The EMERGE Database and its broader lessons for data management best practices
title_full Integrating interdisciplinary data: The EMERGE Database and its broader lessons for data management best practices
title_fullStr Integrating interdisciplinary data: The EMERGE Database and its broader lessons for data management best practices
title_full_unstemmed Integrating interdisciplinary data: The EMERGE Database and its broader lessons for data management best practices
title_sort integrating interdisciplinary data: the emerge database and its broader lessons for data management best practices
publisher Authorea, Inc.
publishDate 2024
url http://dx.doi.org/10.22541/essoar.171352167.73281727/v1
long_lat ENVELOPE(-63.783,-63.783,-64.833,-64.833)
ENVELOPE(7.337,7.337,62.510,62.510)
geographic Access Point
Stordalen
geographic_facet Access Point
Stordalen
genre Northern Sweden
permafrost
genre_facet Northern Sweden
permafrost
op_doi https://doi.org/10.22541/essoar.171352167.73281727/v1
_version_ 1800758579503824896