Summary: | The Ocean Observatories Initiative(OOI) through a network of sensors, supports critical research in ocean science and marine life. Orcasoundis a community driven project that leverages hydrophone sensors deployed in three locations in the state of Washington (SanJuan Island, Point Bush, and Port Townsend) in order to study Orca whales in the Pacific Northwest region. Throughout the course of this project, code to process and analyze the hydrophone data has been developed, and machine learning models have been trained to automatically identify the whistles of the Orcas. All of the code is available publicly onGitHub, and the hydrophone data are free to access, stored in an AWS bucket. In this paper, we have developed an Orcasound pipeline using Pegasus. This version of the pipeline is based on the GitHub Actions Orcasound workflow ,and incorporates inference components of the OrcaHello AI notification system.The Orcasound Pegasus workflow processes the hydrophone data of one or more sensors in batches for each timestamp, and converts them to a WAV format. Using the WAV output it creates spectrogram images that are stored in the final output location. Furthermore, using the pre trained Orca sound model, the workflow scans the WAV files to identify potential sounds produced by the orcas. These predictions are merged in a JSON file for each sensor, and if data from more than one sensor are being processed the workflow will create a final merged JSON output for all.In our experiments we used data from a single hydrophone sensor over the span of a day. The workflow consumed 8641recordings with a total size of 1.5GBs and median size of 181KB/s.
|