End to end raw audio deep learning of transients, application to bioacoustics
International audience In this paper, we propose a raw audio deep learning of clicks, building specific convolution filters in high dimension to elaborate complex TF representation. The CNN has 12 layers for several thousands of audio bins in inputs, and a dozen of output classes. We test this model...
Main Authors: | , , |
---|---|
Other Authors: | , , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2020
|
Subjects: | |
Online Access: | https://hal.archives-ouvertes.fr/hal-03230842 https://hal.archives-ouvertes.fr/hal-03230842/document https://hal.archives-ouvertes.fr/hal-03230842/file/001096.pdf https://doi.org/10.48465/fa.2020.1096 |
Summary: | International audience In this paper, we propose a raw audio deep learning of clicks, building specific convolution filters in high dimension to elaborate complex TF representation. The CNN has 12 layers for several thousands of audio bins in inputs, and a dozen of output classes. We test this model on the international DCLDE challenge of 3 To of clicks (http://sabiod.org/DCLDE). This challenge was open in 2018, but no team answered before. At our knowledge, our model is the first raw audio click classifier with nearly 70% accurray on a dozen of classes. We discuss on the class confusions of the model and possible enhancement using data augmentation and regulation. |
---|