Human Annotation of ASR Error Regions: is "gravity" a Sharable Concept for Human Annotators?

International audience This paper is concerned with human assessments of the severity of errors in ASR outputs. We did not design any guidelines so that each annotator involved in the study could consider the " seriousness " of an ASR error using their own scientific background. Eight huma...

Full description

Bibliographic Details
Main Authors: Luzzati, Daniel, Grouin, Cyril, Vasilescu, Ioana, Adda-Decker, Martine, Bilinski, Eric, Camelin, Nathalie, Kahn, Juliette, Lailler, Carole, Lamel, Lori, Rosset, Sophie
Other Authors: Laboratoire d'Informatique de l'Université du Mans (LIUM), Le Mans Université (UM), Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris-Sud - Paris 11 (UP11)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE), LPP - Laboratoire de Phonétique et Phonologie - UMR 7018 (LPP), Université Sorbonne Nouvelle - Paris 3-Centre National de la Recherche Scientifique (CNRS), LNE, European Language Resources Association (ELRA), ANR-12-BS02-0006,VERA,Analyse d'erreurs avancée pour la reconnaissance de la parole(2012)
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.science/hal-01134802
Description
Summary:International audience This paper is concerned with human assessments of the severity of errors in ASR outputs. We did not design any guidelines so that each annotator involved in the study could consider the " seriousness " of an ASR error using their own scientific background. Eight human annotators were involved in an annotation task on three distinct corpora, one of the corpora being annotated twice, hiding this annotation in duplicate to the annotators. None of the computed results (inter-annotator agreement, edit distance, majority annotation) allow any strong correlation between the considered criteria and the level of seriousness to be shown, which underlines the difficulty for a human to determine whether a ASR error is serious or not.