Deep Learning and Domain Transfer for Orca Vocalization Detection

International audience In this paper, we study the difficulties of domain transfer when training deep learning models, on a specific task that is orca vocalization detection. Deep learning appears to be an answer to many sound recognition tasks in human speech analysis as well as in bioacoustics. Th...

Full description

Bibliographic Details
Main Authors: Best, Paul, Ferrari, Maxence, Poupard, Marion, Paris, Sébastien, Marxer, Ricard, Symonds, Helena, Spong, Paul, Glotin, Hervé
Other Authors: DYNamiques de l’Information (DYNI), Laboratoire d'Informatique et des Systèmes (LIS) (Marseille, Toulon) (LIS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Amiénois de Mathématique Fondamentale et Appliquée - UMR CNRS 7352 UPJV (LAMFA), Université de Picardie Jules Verne (UPJV)-Centre National de la Recherche Scientifique (CNRS), ANR-18-CE40-0014,SMILES,Modélisation et Inférence Statistique pour l'Apprentissage non-supervisé à partir de Données Massives(2018), ANR-20-CHIA-0014,ADSIL,Écoute intelligente sous-marine avancée(2020)
Format: Conference Object
Language:English
Published: HAL CCSD 2020
Subjects:
Online Access:https://hal.science/hal-02865300
https://hal.science/hal-02865300/document
https://hal.science/hal-02865300/file/IJCNN_ORCALAB.pdf
Description
Summary:International audience In this paper, we study the difficulties of domain transfer when training deep learning models, on a specific task that is orca vocalization detection. Deep learning appears to be an answer to many sound recognition tasks in human speech analysis as well as in bioacoustics. This method allows to learn from large amounts of data, and find the best scoring way to discriminate between classes (e.g. orca vocalization and other sounds). However, to learn the perfect data representation and discrimination boundaries, all possible data configurations need to be processed. This causes problems when those configurations are ever changing (e.g. in our experiment, a change in the recording system happened to considerably disturb our previously well performing model). We thus explore approaches to compensate on the difficulties faced with domain transfer, with two convolutionnal neural networks (CNN) architectures, one that works in the time-frequency domain, and one that works directly on the time domain.