Performance metrics for marine mammal signal detection and classification.

Automatic algorithms for the detection and classification of sound are essential to the analysis of acoustic datasets with long duration. Metrics are needed to assess the performance characteristics of these algorithms. Four metrics for performance evaluation are discussed here: receiver-operating-c...

Full description

Bibliographic Details
Main Authors: Hildebrand, John A, Frasier, Kaitlin E, Helble, Tyler A, Roch, Marie A
Format: Article in Journal/Newspaper
Language:unknown
Published: eScholarship, University of California 2022
Subjects:
Online Access:https://escholarship.org/uc/item/6vc7j5sn
Description
Summary:Automatic algorithms for the detection and classification of sound are essential to the analysis of acoustic datasets with long duration. Metrics are needed to assess the performance characteristics of these algorithms. Four metrics for performance evaluation are discussed here: receiver-operating-characteristic (ROC) curves, detection-error-trade-off (DET) curves, precision-recall (PR) curves, and cost curves. These metrics were applied to the generalized power law detector for blue whale D calls [Helble, Ierley, D'Spain, Roch, and Hildebrand (2012). J. Acoust. Soc. Am. 131(4), 2682-2699] and the click-clustering neural-net algorithm for Cuvier's beaked whale echolocation click detection [Frasier, Roch, Soldevilla, Wiggins, Garrison, and Hildebrand (2017). PLoS Comp. Biol. 13(12), e1005823] using data prepared for the 2015 Detection, Classification, Localization and Density Estimation Workshop. Detection class imbalance, particularly the situation of rare occurrence, is common for long-term passive acoustic monitoring datasets and is a factor in the performance of ROC and DET curves with regard to the impact of false positive detections. PR curves overcome this shortcoming when calculated for individual detections and do not rely on the reporting of true negatives. Cost curves provide additional insight on the effective operating range for the detector based on the a priori probability of occurrence. Use of more than a single metric is helpful in understanding the performance of a detection algorithm.