Learning to detect odontocete whistles from generative samples:

We aim to detect Odontoceti (toothed whale) whistles from synthetic samples by learning the underlying distribution of the delphinid sounds and noise environment present in the real spectrograms. We present an unsupervised / self-supervised learning method that generates synthetic data to augment ex...

Full description

Bibliographic Details
Other Authors: Shah, Saumil Mehulbhai (author), Roch, Marie A. (Advisor), Liu, Xiaobai (Committee Member), Bailey, Barbara A. (Committee Member), Computer Science
Format: Thesis
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/20.500.11929/sdsu:89727
Description
Summary:We aim to detect Odontoceti (toothed whale) whistles from synthetic samples by learning the underlying distribution of the delphinid sounds and noise environment present in the real spectrograms. We present an unsupervised / self-supervised learning method that generates synthetic data to augment existing data for extracting whistle contours in time-frequency spectrograms. Our approach is novel compared to existing synthesis based techniques because it relies on deep neural networks to synthesize data that resemble original data. The proposed architecture employs a combination of generative adversarial networks (WGANs+CycleGAN) to generate synthetic whistles and noise spectrogram patches with their corresponding ground truth (GT) labels. These GANs produce synthetic spectrogram patches that look like patches of the real spectrograms created from the underwater hydrophone recordings, and their synthetic GT tonals mimic the actual human analyst annotations. We propose this as an alternative data-generation method for creating an augmented dataset used for training current CNN-based models (e.g., WGT). This CNN-based model produces confidence maps of whistle presence in the spectrogram that serves as the input to the existing whistle extraction system (e.g., Silbido). Our best synthesis method (100% original data + 100% synthetic data) showed a ⇠10-28% improvement in the F1 score of peak whistle energy performance compared to the existing synthesis-based algorithms such as EdgeGT, EdgeCanny which were trained on the entire dataset. Our method trained only a fraction of the data (6.25% original data + 1000% synthetic data) showed a 10% decrease in F1 score compared to existing synthesis-method trained on similar amounts of data like μWGT, and μWGT-RG. Also, The F1-score of the proposed method was 0.172 greater than our baseline whistle extraction algorithm (⇠27% improvement), and the precision was 0.332 considerably higher than the baseline method (⇠52% improvement) when applied to the whistles of long-beaked common ...