From citizen science to AI models: Advancing cetacean vocalization automatic detection through multi-annotator campaigns

International audience Continuous underwater Passive Acoustic Monitoring (PAM) has emerged as a strong tool for cetacean research. To handle the vast volume of collected data, it is essential to employ automated detection and classification methods. The recent advancement of deep learning, involving...

Full description

Bibliographic Details
Published in:Ecological Informatics
Main Authors: Dubus, Gabriel, Cazau, Dorian, Torterotot, Maëlle, Gros-Martial, Anatole, Nguyen Hong Duc, Paul, Adam, Olivier
Other Authors: Institut Jean Le Rond d'Alembert (DALEMBERT), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne), Equipe Observations Signal & Environnement (Lab-STICC_OSE), Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance (Lab-STICC), École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom Paris (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom Paris (IMT)-École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom Paris (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom Paris (IMT), Equipe Marine Mapping & Metrology (Lab-STICC_M3), Centre d'Études Biologiques de Chizé - UMR 7372 (CEBC), La Rochelle Université (ULR)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Centre for Marine Science and Technology Curtin University (CMST), Curtin University
Format: Article in Journal/Newspaper
Language:English
Published: HAL CCSD 2024
Subjects:
Online Access:https://ensta-bretagne.hal.science/hal-04614603
https://doi.org/10.1016/j.ecoinf.2024.102642
Description
Summary:International audience Continuous underwater Passive Acoustic Monitoring (PAM) has emerged as a strong tool for cetacean research. To handle the vast volume of collected data, it is essential to employ automated detection and classification methods. The recent advancement of deep learning, involving model training and testing, requires a large amount of labeled data. These labels are derived through the manual annotation of audio files often reliant on human experts. Based on an annotation campaign focusing on blue whale calls in the Indian Ocean involving 19 novice annotators and one expert in bioacoustics, this study explores the integration of novice annotators in marine bioacoustics research, through citizen science programs, which could drastically increase the size of labeled datasets and enhance the performance of detection and classification models. The analysis reveals distinctive annotation profiles influenced by the complexity of vocalizations and the annotators' strategies, ranging from conservative to permissive. To address the challenges of annotation discrepancies, Convolutional Neural Networks (CNNs) are trained on annotations from both novices and the expert. The results show variations in model performance. Our work highlights the importance of annotation guidelines encouraging a more conservative approach to improve overall annotation quality. In an effort to optimize the potential of multi-annotation and mitigate the presence of noisy labels, two annotation aggregation methods (majority voting and soft labeling) are proposed and tested. The results demonstrate that both methods, particularly when a sufficient number of annotators are involved, significantly improve model performance and reduce variability: the standard deviation of the area under PR and ROC curves fall under 0.02 for both vocalizations with 13 aggregated annotators, while it was at 0.17 and 0.21 for the Blue Whale Dcalls and 0.05 and 0.04 for the SEIO PBW vocalizations with all annotators separately. Moreover, these ...