Fast Chirplet Transform to Enhance CNN Machine Listening - Validation on Animal calls and Speech

The scattering framework offers an optimal hierarchical convolutional decomposition according to its kernels. Convolutional Neural Net (CNN) can be seen as an optimal kernel decomposition, nevertheless it requires large amount of training data to learn its kernels. We propose a trade-off between the...

Full description

Bibliographic Details
Main Authors: Glotin, Herve, Ricard, Julien, Balestriero, Randall
Format: Text
Language:unknown
Published: 2016
Subjects:
Online Access:http://arxiv.org/abs/1611.08749
id ftarxivpreprints:oai:arXiv.org:1611.08749
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:1611.08749 2023-09-05T13:22:22+02:00 Fast Chirplet Transform to Enhance CNN Machine Listening - Validation on Animal calls and Speech Glotin, Herve Ricard, Julien Balestriero, Randall 2016-11-26 http://arxiv.org/abs/1611.08749 unknown http://arxiv.org/abs/1611.08749 Computer Science - Sound text 2016 ftarxivpreprints 2023-08-16T14:12:26Z The scattering framework offers an optimal hierarchical convolutional decomposition according to its kernels. Convolutional Neural Net (CNN) can be seen as an optimal kernel decomposition, nevertheless it requires large amount of training data to learn its kernels. We propose a trade-off between these two approaches: a Chirplet kernel as an efficient Q constant bioacoustic representation to pretrain CNN. First we motivate Chirplet bioinspired auditory representation. Second we give the first algorithm (and code) of a Fast Chirplet Transform (FCT). Third, we demonstrate the computation efficiency of FCT on large environmental data base: months of Orca recordings, and 1000 Birds species from the LifeClef challenge. Fourth, we validate FCT on the vowels subset of the Speech TIMIT dataset. The results show that FCT accelerates CNN when it pretrains low level layers: it reduces training duration by -28\% for birds classification, and by -26% for vowels classification. Scores are also enhanced by FCT pretraining, with a relative gain of +7.8% of Mean Average Precision on birds, and +2.3\% of vowel accuracy against raw audio CNN. We conclude on perspectives on tonotopic FCT deep machine listening, and inter-species bioacoustic transfer learning to generalise the representation of animal communication systems. Text Orca ArXiv.org (Cornell University Library)
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Sound
spellingShingle Computer Science - Sound
Glotin, Herve
Ricard, Julien
Balestriero, Randall
Fast Chirplet Transform to Enhance CNN Machine Listening - Validation on Animal calls and Speech
topic_facet Computer Science - Sound
description The scattering framework offers an optimal hierarchical convolutional decomposition according to its kernels. Convolutional Neural Net (CNN) can be seen as an optimal kernel decomposition, nevertheless it requires large amount of training data to learn its kernels. We propose a trade-off between these two approaches: a Chirplet kernel as an efficient Q constant bioacoustic representation to pretrain CNN. First we motivate Chirplet bioinspired auditory representation. Second we give the first algorithm (and code) of a Fast Chirplet Transform (FCT). Third, we demonstrate the computation efficiency of FCT on large environmental data base: months of Orca recordings, and 1000 Birds species from the LifeClef challenge. Fourth, we validate FCT on the vowels subset of the Speech TIMIT dataset. The results show that FCT accelerates CNN when it pretrains low level layers: it reduces training duration by -28\% for birds classification, and by -26% for vowels classification. Scores are also enhanced by FCT pretraining, with a relative gain of +7.8% of Mean Average Precision on birds, and +2.3\% of vowel accuracy against raw audio CNN. We conclude on perspectives on tonotopic FCT deep machine listening, and inter-species bioacoustic transfer learning to generalise the representation of animal communication systems.
format Text
author Glotin, Herve
Ricard, Julien
Balestriero, Randall
author_facet Glotin, Herve
Ricard, Julien
Balestriero, Randall
author_sort Glotin, Herve
title Fast Chirplet Transform to Enhance CNN Machine Listening - Validation on Animal calls and Speech
title_short Fast Chirplet Transform to Enhance CNN Machine Listening - Validation on Animal calls and Speech
title_full Fast Chirplet Transform to Enhance CNN Machine Listening - Validation on Animal calls and Speech
title_fullStr Fast Chirplet Transform to Enhance CNN Machine Listening - Validation on Animal calls and Speech
title_full_unstemmed Fast Chirplet Transform to Enhance CNN Machine Listening - Validation on Animal calls and Speech
title_sort fast chirplet transform to enhance cnn machine listening - validation on animal calls and speech
publishDate 2016
url http://arxiv.org/abs/1611.08749
genre Orca
genre_facet Orca
op_relation http://arxiv.org/abs/1611.08749
_version_ 1776202887944208384