The SAMI galaxy survey: a prototype data archive for Big Science exploration

We describe the data archive and database for the SAMI Galaxy Survey, an ongoing observational program that will cover ≈3400 galaxies with integral-field (spatially-resolved) spectroscopy. Amounting to some three million spectra, this is the largest sample of its kind to date. The data archive and b...

Full description

Bibliographic Details
Published in:Astronomy and Computing
Main Authors: Konstantopoulos, I. S., Green, A. W., Foster, C., Scott, N., Allen, J. T., Fogarty, L. M. R., Lorente, N. P. F., Sweet, S. M., Hopkins, A. M., Bland-Hawthorn, J., Bryant, J. J., Croom, S. M., Goodwin, M., Lawrence, J. S., Owers, M. S., Richards, S. N.
Other Authors: Swinburne University of Technology
Format: Article in Journal/Newspaper
Language:unknown
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/1959.3/436054
https://doi.org/10.1016/j.ascom.2015.08.002
Description
Summary:We describe the data archive and database for the SAMI Galaxy Survey, an ongoing observational program that will cover ≈3400 galaxies with integral-field (spatially-resolved) spectroscopy. Amounting to some three million spectra, this is the largest sample of its kind to date. The data archive and built-in query engine use the versatile Hierarchical Data Format (HDF5), which precludes the need for external metadata tables and hence the setup and maintenance overhead those carry. The code produces simple outputs that can easily be translated to plots and tables, and the combination of these tools makes for a light system that can handle heavy data. This article acts as a contextual companion to the SAMI Survey Database source code repository, samiDB, which is freely available online and written entirely in Python. We also discuss the decisions related to the selection of tools and the creation of data visualisation modules. It is our aim that the work presented in this article–descriptions, rationale, and source code–will be of use to scientists looking to set up a maintenance-light data archive for a Big Science data load.