Learning stage-wise GANs for whistle extraction in time-frequency spectrograms

Whistle contour extraction aims to derive animal whistles from time-frequency spectrograms as polylines. For toothed whales, whistle extraction results can serve as the basis for analyzing animal abundance, species identity, and social activities. During the last few decades, as long-term recording...

Full description

Bibliographic Details
Published in:IEEE Transactions on Multimedia
Main Authors: Li, Pu, Roch, Marie A., Klinck, Holger, Fleishman, Erica, Gillespie, Douglas, Nosal, Eva-Marie, Shiu, Yu, Liu, Xiaobai
Format: Article in Journal/Newspaper
Language:English
Published: 2023
Subjects:
Online Access:https://research-portal.st-andrews.ac.uk/en/researchoutput/learning-stagewise-gans-for-whistle-extraction-in-timefrequency-spectrograms(19b861ca-09c0-41a4-a51a-36b27838cd48).html
https://doi.org/10.1109/tmm.2023.3251109
id ftunstandrewcris:oai:research-portal.st-andrews.ac.uk:publications/19b861ca-09c0-41a4-a51a-36b27838cd48
record_format openpolar
spelling ftunstandrewcris:oai:research-portal.st-andrews.ac.uk:publications/19b861ca-09c0-41a4-a51a-36b27838cd48 2024-09-09T20:11:52+00:00 Learning stage-wise GANs for whistle extraction in time-frequency spectrograms Li, Pu Roch, Marie A. Klinck, Holger Fleishman, Erica Gillespie, Douglas Nosal, Eva-Marie Shiu, Yu Liu, Xiaobai 2023-03-31 https://research-portal.st-andrews.ac.uk/en/researchoutput/learning-stagewise-gans-for-whistle-extraction-in-timefrequency-spectrograms(19b861ca-09c0-41a4-a51a-36b27838cd48).html https://doi.org/10.1109/tmm.2023.3251109 eng eng https://research-portal.st-andrews.ac.uk/en/researchoutput/learning-stagewise-gans-for-whistle-extraction-in-timefrequency-spectrograms(19b861ca-09c0-41a4-a51a-36b27838cd48).html info:eu-repo/semantics/embargoedAccess Li , P , Roch , M A , Klinck , H , Fleishman , E , Gillespie , D , Nosal , E-M , Shiu , Y & Liu , X 2023 , ' Learning stage-wise GANs for whistle extraction in time-frequency spectrograms ' , IEEE Transactions on Multimedia , vol. 25 , pp. 9302-9314 . https://doi.org/10.1109/tmm.2023.3251109 Electrical and electronic engineering Computer Science applications Media technology Signal processing article 2023 ftunstandrewcris https://doi.org/10.1109/tmm.2023.3251109 2024-06-19T23:51:42Z Whistle contour extraction aims to derive animal whistles from time-frequency spectrograms as polylines. For toothed whales, whistle extraction results can serve as the basis for analyzing animal abundance, species identity, and social activities. During the last few decades, as long-term recording systems have become affordable, automated whistle extraction algorithms were proposed to process large volumes of recording data. Recently, a deep learning-based method demonstrated superior performance in extracting whistles under varying noise conditions. However, training such networks requires a large amount of labor-intensive annotation, which is not available for many species. To overcome this limitation, we present a framework of stage-wise generative adversarial networks (GANs), which compile new whistle data suitable for deep model training via three stages: generation of background noise in the spectrogram, generation of whistle contours, and generation of whistle signals. By separating the generation of different components in the samples, our framework composes visually promising whistle data and labels even when few expert annotated data are available. Regardless of the amount of human-annotated data, the proposed data augmentation framework leads to a consistent improvement in performance of the whistle extraction model, with a maximum increase of 1.69 in the whistle extraction mean F1-score. Our stage-wise GAN also surpasses one single GAN in improving whistle extraction models with augmented data. The data and code will be available at https://github.com/Paul-LiPu/CompositeGAN_WhistleAugment. Article in Journal/Newspaper toothed whales University of St Andrews: Research Portal IEEE Transactions on Multimedia 25 9302 9314
institution Open Polar
collection University of St Andrews: Research Portal
op_collection_id ftunstandrewcris
language English
topic Electrical and electronic engineering
Computer Science applications
Media technology
Signal processing
spellingShingle Electrical and electronic engineering
Computer Science applications
Media technology
Signal processing
Li, Pu
Roch, Marie A.
Klinck, Holger
Fleishman, Erica
Gillespie, Douglas
Nosal, Eva-Marie
Shiu, Yu
Liu, Xiaobai
Learning stage-wise GANs for whistle extraction in time-frequency spectrograms
topic_facet Electrical and electronic engineering
Computer Science applications
Media technology
Signal processing
description Whistle contour extraction aims to derive animal whistles from time-frequency spectrograms as polylines. For toothed whales, whistle extraction results can serve as the basis for analyzing animal abundance, species identity, and social activities. During the last few decades, as long-term recording systems have become affordable, automated whistle extraction algorithms were proposed to process large volumes of recording data. Recently, a deep learning-based method demonstrated superior performance in extracting whistles under varying noise conditions. However, training such networks requires a large amount of labor-intensive annotation, which is not available for many species. To overcome this limitation, we present a framework of stage-wise generative adversarial networks (GANs), which compile new whistle data suitable for deep model training via three stages: generation of background noise in the spectrogram, generation of whistle contours, and generation of whistle signals. By separating the generation of different components in the samples, our framework composes visually promising whistle data and labels even when few expert annotated data are available. Regardless of the amount of human-annotated data, the proposed data augmentation framework leads to a consistent improvement in performance of the whistle extraction model, with a maximum increase of 1.69 in the whistle extraction mean F1-score. Our stage-wise GAN also surpasses one single GAN in improving whistle extraction models with augmented data. The data and code will be available at https://github.com/Paul-LiPu/CompositeGAN_WhistleAugment.
format Article in Journal/Newspaper
author Li, Pu
Roch, Marie A.
Klinck, Holger
Fleishman, Erica
Gillespie, Douglas
Nosal, Eva-Marie
Shiu, Yu
Liu, Xiaobai
author_facet Li, Pu
Roch, Marie A.
Klinck, Holger
Fleishman, Erica
Gillespie, Douglas
Nosal, Eva-Marie
Shiu, Yu
Liu, Xiaobai
author_sort Li, Pu
title Learning stage-wise GANs for whistle extraction in time-frequency spectrograms
title_short Learning stage-wise GANs for whistle extraction in time-frequency spectrograms
title_full Learning stage-wise GANs for whistle extraction in time-frequency spectrograms
title_fullStr Learning stage-wise GANs for whistle extraction in time-frequency spectrograms
title_full_unstemmed Learning stage-wise GANs for whistle extraction in time-frequency spectrograms
title_sort learning stage-wise gans for whistle extraction in time-frequency spectrograms
publishDate 2023
url https://research-portal.st-andrews.ac.uk/en/researchoutput/learning-stagewise-gans-for-whistle-extraction-in-timefrequency-spectrograms(19b861ca-09c0-41a4-a51a-36b27838cd48).html
https://doi.org/10.1109/tmm.2023.3251109
genre toothed whales
genre_facet toothed whales
op_source Li , P , Roch , M A , Klinck , H , Fleishman , E , Gillespie , D , Nosal , E-M , Shiu , Y & Liu , X 2023 , ' Learning stage-wise GANs for whistle extraction in time-frequency spectrograms ' , IEEE Transactions on Multimedia , vol. 25 , pp. 9302-9314 . https://doi.org/10.1109/tmm.2023.3251109
op_relation https://research-portal.st-andrews.ac.uk/en/researchoutput/learning-stagewise-gans-for-whistle-extraction-in-timefrequency-spectrograms(19b861ca-09c0-41a4-a51a-36b27838cd48).html
op_rights info:eu-repo/semantics/embargoedAccess
op_doi https://doi.org/10.1109/tmm.2023.3251109
container_title IEEE Transactions on Multimedia
container_volume 25
container_start_page 9302
op_container_end_page 9314
_version_ 1809946463583600640