Visual speech recognition by recurrent neural networks
Thesis (M.Sc.)--Memorial University of Newfoundland, 1997. Computer Science Bibliography: leaves 115-121 One of the major drawbacks of current acoustically-based speech recognizers is that their performance deteriorates drastically with noise. The focus of this thesis is to develop a computer system...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
1997
|
Subjects: | |
Online Access: | http://collections.mun.ca/cdm/ref/collection/theses3/id/5663 |
id |
ftmemorialunivdc:oai:collections.mun.ca:theses3/5663 |
---|---|
record_format |
openpolar |
spelling |
ftmemorialunivdc:oai:collections.mun.ca:theses3/5663 2023-05-15T17:23:32+02:00 Visual speech recognition by recurrent neural networks Rabi, Gihad, 1969- Memorial University of Newfoundland. Dept. of Computer Science 1997 xi, 121 leaves : ill. Image/jpeg; Application/pdf http://collections.mun.ca/cdm/ref/collection/theses3/id/5663 eng eng Electronic Theses and Dissertations (29.83 MB) -- http://collections.mun.ca/PDFs/theses/Rabi_Gihad.pdf a1211948 http://collections.mun.ca/cdm/ref/collection/theses3/id/5663 The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. Paper copy kept in the Centre for Newfoundland Studies, Memorial University Libraries Speech processing systems Neural networks (Computer science) Text Electronic thesis or dissertation 1997 ftmemorialunivdc 2015-08-06T19:17:37Z Thesis (M.Sc.)--Memorial University of Newfoundland, 1997. Computer Science Bibliography: leaves 115-121 One of the major drawbacks of current acoustically-based speech recognizers is that their performance deteriorates drastically with noise. The focus of this thesis is to develop a computer system that performs speech recognition based on visual information of the speaker. The system automatically extracts visual speech features through image processing techniques that operate on facial images taken in a normally-lluminated environment. To cope with the dynamic nature of change in speech patterns with respect to time as well as the spatial variations in the individual patterns, the recognition scheme proposed in this work uses a recurrent neural network architecture. By specifying a certain behavior when the network is presented with exemplar sequences, the recurrent network is trained with no more than feed-forward complexity. The network's desired behavior is based on characterizing a given word by well-defined segments. Adaptive segmentation is employed to segment the training sequences of a given class. This technique iterates the execution of two steps. First, the sequences are segmented individually. Then, a generalized version of dynamic time warping is used to align the segments of all sequences. At each iteration, the weights of the distance functions used in the two steps are updated in a way that minimizes a segmentation error. The system has been implemented and tested on a few words and the results are satisfactory. In particular, the system has been able to distinguish between words with common segments. Moreover, it tolerates, to a great extent, variable-duration words of the same class. Thesis Newfoundland studies University of Newfoundland Memorial University of Newfoundland: Digital Archives Initiative (DAI) |
institution |
Open Polar |
collection |
Memorial University of Newfoundland: Digital Archives Initiative (DAI) |
op_collection_id |
ftmemorialunivdc |
language |
English |
topic |
Speech processing systems Neural networks (Computer science) |
spellingShingle |
Speech processing systems Neural networks (Computer science) Rabi, Gihad, 1969- Visual speech recognition by recurrent neural networks |
topic_facet |
Speech processing systems Neural networks (Computer science) |
description |
Thesis (M.Sc.)--Memorial University of Newfoundland, 1997. Computer Science Bibliography: leaves 115-121 One of the major drawbacks of current acoustically-based speech recognizers is that their performance deteriorates drastically with noise. The focus of this thesis is to develop a computer system that performs speech recognition based on visual information of the speaker. The system automatically extracts visual speech features through image processing techniques that operate on facial images taken in a normally-lluminated environment. To cope with the dynamic nature of change in speech patterns with respect to time as well as the spatial variations in the individual patterns, the recognition scheme proposed in this work uses a recurrent neural network architecture. By specifying a certain behavior when the network is presented with exemplar sequences, the recurrent network is trained with no more than feed-forward complexity. The network's desired behavior is based on characterizing a given word by well-defined segments. Adaptive segmentation is employed to segment the training sequences of a given class. This technique iterates the execution of two steps. First, the sequences are segmented individually. Then, a generalized version of dynamic time warping is used to align the segments of all sequences. At each iteration, the weights of the distance functions used in the two steps are updated in a way that minimizes a segmentation error. The system has been implemented and tested on a few words and the results are satisfactory. In particular, the system has been able to distinguish between words with common segments. Moreover, it tolerates, to a great extent, variable-duration words of the same class. |
author2 |
Memorial University of Newfoundland. Dept. of Computer Science |
format |
Thesis |
author |
Rabi, Gihad, 1969- |
author_facet |
Rabi, Gihad, 1969- |
author_sort |
Rabi, Gihad, 1969- |
title |
Visual speech recognition by recurrent neural networks |
title_short |
Visual speech recognition by recurrent neural networks |
title_full |
Visual speech recognition by recurrent neural networks |
title_fullStr |
Visual speech recognition by recurrent neural networks |
title_full_unstemmed |
Visual speech recognition by recurrent neural networks |
title_sort |
visual speech recognition by recurrent neural networks |
publishDate |
1997 |
url |
http://collections.mun.ca/cdm/ref/collection/theses3/id/5663 |
genre |
Newfoundland studies University of Newfoundland |
genre_facet |
Newfoundland studies University of Newfoundland |
op_source |
Paper copy kept in the Centre for Newfoundland Studies, Memorial University Libraries |
op_relation |
Electronic Theses and Dissertations (29.83 MB) -- http://collections.mun.ca/PDFs/theses/Rabi_Gihad.pdf a1211948 http://collections.mun.ca/cdm/ref/collection/theses3/id/5663 |
op_rights |
The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. |
_version_ |
1766113036767068160 |