Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling

International audience Long audio alignment systems for Spanish and English are presented, within an automatic subtitling application. Language-specific phone decoders automatically recognize audio contents at phoneme level. At the same time, language-dependent grapheme-to-phoneme modules perform a...

Full description

Bibliographic Details
Main Authors: Ruiz, Pablo, Álvarez, Aitor, Arzelus, Haritz
Other Authors: Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice), Département Littératures et langage (LILA), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3, VicomTech
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.archives-ouvertes.fr/hal-01099239
https://hal.archives-ouvertes.fr/hal-01099239/document
https://hal.archives-ouvertes.fr/hal-01099239/file/387_Paper.pdf
id ftccsdartic:oai:HAL:hal-01099239v1
record_format openpolar
spelling ftccsdartic:oai:HAL:hal-01099239v1 2023-05-15T16:50:07+02:00 Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling Ruiz, Pablo Álvarez, Aitor Arzelus, Haritz Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice) Département Littératures et langage (LILA) École normale supérieure - Paris (ENS Paris) Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris) Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3 VicomTech Reykjavik, Iceland 2014-05 https://hal.archives-ouvertes.fr/hal-01099239 https://hal.archives-ouvertes.fr/hal-01099239/document https://hal.archives-ouvertes.fr/hal-01099239/file/387_Paper.pdf en eng HAL CCSD hal-01099239 https://hal.archives-ouvertes.fr/hal-01099239 https://hal.archives-ouvertes.fr/hal-01099239/document https://hal.archives-ouvertes.fr/hal-01099239/file/387_Paper.pdf info:eu-repo/semantics/OpenAccess LREC, Ninth International Conference on Language Resources and Evaluation https://hal.archives-ouvertes.fr/hal-01099239 LREC, Ninth International Conference on Language Resources and Evaluation, May 2014, Reykjavik, Iceland http://lrec2014.lrec-conf.org/en/ automatic subtitling long audio alignment phoneme similarity matrices [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [SHS.LANGUE]Humanities and Social Sciences/Linguistics info:eu-repo/semantics/conferenceObject Conference papers 2014 ftccsdartic 2021-12-05T03:13:17Z International audience Long audio alignment systems for Spanish and English are presented, within an automatic subtitling application. Language-specific phone decoders automatically recognize audio contents at phoneme level. At the same time, language-dependent grapheme-to-phoneme modules perform a transcription of the script for the audio. A dynamic programming algorithm (Hirschberg's algorithm) finds matches between the phonemes automatically recognized by the phone decoder and the phonemes in the script's transcription. Alignment accuracy is evaluated when scoring alignment operations with a baseline binary matrix, and when scoring alignment operations with several continuous-score matrices, based on phoneme similarity as assessed through comparing multivalued phonological features. Alignment accuracy results are reported at phoneme, word and subtitle level. Alignment accuracy when using the continuous scoring matrices based on phonological similarity was clearly higher than when using the baseline binary matrix. Conference Object Iceland Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
institution Open Polar
collection Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
op_collection_id ftccsdartic
language English
topic automatic subtitling
long audio alignment
phoneme similarity matrices
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
spellingShingle automatic subtitling
long audio alignment
phoneme similarity matrices
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
Ruiz, Pablo
Álvarez, Aitor
Arzelus, Haritz
Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
topic_facet automatic subtitling
long audio alignment
phoneme similarity matrices
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
description International audience Long audio alignment systems for Spanish and English are presented, within an automatic subtitling application. Language-specific phone decoders automatically recognize audio contents at phoneme level. At the same time, language-dependent grapheme-to-phoneme modules perform a transcription of the script for the audio. A dynamic programming algorithm (Hirschberg's algorithm) finds matches between the phonemes automatically recognized by the phone decoder and the phonemes in the script's transcription. Alignment accuracy is evaluated when scoring alignment operations with a baseline binary matrix, and when scoring alignment operations with several continuous-score matrices, based on phoneme similarity as assessed through comparing multivalued phonological features. Alignment accuracy results are reported at phoneme, word and subtitle level. Alignment accuracy when using the continuous scoring matrices based on phonological similarity was clearly higher than when using the baseline binary matrix.
author2 Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice)
Département Littératures et langage (LILA)
École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3
VicomTech
format Conference Object
author Ruiz, Pablo
Álvarez, Aitor
Arzelus, Haritz
author_facet Ruiz, Pablo
Álvarez, Aitor
Arzelus, Haritz
author_sort Ruiz, Pablo
title Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
title_short Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
title_full Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
title_fullStr Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
title_full_unstemmed Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
title_sort phoneme similarity matrices to improve long audio alignment for automatic subtitling
publisher HAL CCSD
publishDate 2014
url https://hal.archives-ouvertes.fr/hal-01099239
https://hal.archives-ouvertes.fr/hal-01099239/document
https://hal.archives-ouvertes.fr/hal-01099239/file/387_Paper.pdf
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source LREC, Ninth International Conference on Language Resources and Evaluation
https://hal.archives-ouvertes.fr/hal-01099239
LREC, Ninth International Conference on Language Resources and Evaluation, May 2014, Reykjavik, Iceland
http://lrec2014.lrec-conf.org/en/
op_relation hal-01099239
https://hal.archives-ouvertes.fr/hal-01099239
https://hal.archives-ouvertes.fr/hal-01099239/document
https://hal.archives-ouvertes.fr/hal-01099239/file/387_Paper.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1766040297078259712