Visual speech recognition by recurrent neural networks

Thesis (M.Sc.)--Memorial University of Newfoundland, 1997. Computer Science Bibliography: leaves 115-121 One of the major drawbacks of current acoustically-based speech recognizers is that their performance deteriorates drastically with noise. The focus of this thesis is to develop a computer system...

Full description

Bibliographic Details
Main Author:	Rabi, Gihad, 1969-
Other Authors:	Memorial University of Newfoundland. Dept. of Computer Science
Format:	Thesis
Language:	English
Published:	1997
Subjects:	Speech processing systems Neural networks (Computer science) Newfoundland studies University of Newfoundland
Online Access:	http://collections.mun.ca/cdm/ref/collection/theses3/id/5663

id	ftmemorialunivdc:oai:collections.mun.ca:theses3/5663
record_format	openpolar
spelling	ftmemorialunivdc:oai:collections.mun.ca:theses3/5663 2023-05-15T17:23:32+02:00 Visual speech recognition by recurrent neural networks Rabi, Gihad, 1969- Memorial University of Newfoundland. Dept. of Computer Science 1997 xi, 121 leaves : ill. Image/jpeg; Application/pdf http://collections.mun.ca/cdm/ref/collection/theses3/id/5663 eng eng Electronic Theses and Dissertations (29.83 MB) -- http://collections.mun.ca/PDFs/theses/Rabi_Gihad.pdf a1211948 http://collections.mun.ca/cdm/ref/collection/theses3/id/5663 The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. Paper copy kept in the Centre for Newfoundland Studies, Memorial University Libraries Speech processing systems Neural networks (Computer science) Text Electronic thesis or dissertation 1997 ftmemorialunivdc 2015-08-06T19:17:37Z Thesis (M.Sc.)--Memorial University of Newfoundland, 1997. Computer Science Bibliography: leaves 115-121 One of the major drawbacks of current acoustically-based speech recognizers is that their performance deteriorates drastically with noise. The focus of this thesis is to develop a computer system that performs speech recognition based on visual information of the speaker. The system automatically extracts visual speech features through image processing techniques that operate on facial images taken in a normally-lluminated environment. To cope with the dynamic nature of change in speech patterns with respect to time as well as the spatial variations in the individual patterns, the recognition scheme proposed in this work uses a recurrent neural network architecture. By specifying a certain behavior when the network is presented with exemplar sequences, the recurrent network is trained with no more than feed-forward complexity. The network's desired behavior is based on characterizing a given word by well-defined segments. Adaptive segmentation is employed to segment the training sequences of a given class. This technique iterates the execution of two steps. First, the sequences are segmented individually. Then, a generalized version of dynamic time warping is used to align the segments of all sequences. At each iteration, the weights of the distance functions used in the two steps are updated in a way that minimizes a segmentation error. The system has been implemented and tested on a few words and the results are satisfactory. In particular, the system has been able to distinguish between words with common segments. Moreover, it tolerates, to a great extent, variable-duration words of the same class. Thesis Newfoundland studies University of Newfoundland Memorial University of Newfoundland: Digital Archives Initiative (DAI)
institution	Open Polar
collection	Memorial University of Newfoundland: Digital Archives Initiative (DAI)
op_collection_id	ftmemorialunivdc
language	English
topic	Speech processing systems Neural networks (Computer science)
spellingShingle	Speech processing systems Neural networks (Computer science) Rabi, Gihad, 1969- Visual speech recognition by recurrent neural networks
topic_facet	Speech processing systems Neural networks (Computer science)
description	Thesis (M.Sc.)--Memorial University of Newfoundland, 1997. Computer Science Bibliography: leaves 115-121 One of the major drawbacks of current acoustically-based speech recognizers is that their performance deteriorates drastically with noise. The focus of this thesis is to develop a computer system that performs speech recognition based on visual information of the speaker. The system automatically extracts visual speech features through image processing techniques that operate on facial images taken in a normally-lluminated environment. To cope with the dynamic nature of change in speech patterns with respect to time as well as the spatial variations in the individual patterns, the recognition scheme proposed in this work uses a recurrent neural network architecture. By specifying a certain behavior when the network is presented with exemplar sequences, the recurrent network is trained with no more than feed-forward complexity. The network's desired behavior is based on characterizing a given word by well-defined segments. Adaptive segmentation is employed to segment the training sequences of a given class. This technique iterates the execution of two steps. First, the sequences are segmented individually. Then, a generalized version of dynamic time warping is used to align the segments of all sequences. At each iteration, the weights of the distance functions used in the two steps are updated in a way that minimizes a segmentation error. The system has been implemented and tested on a few words and the results are satisfactory. In particular, the system has been able to distinguish between words with common segments. Moreover, it tolerates, to a great extent, variable-duration words of the same class.
author2	Memorial University of Newfoundland. Dept. of Computer Science
format	Thesis
author	Rabi, Gihad, 1969-
author_facet	Rabi, Gihad, 1969-
author_sort	Rabi, Gihad, 1969-
title	Visual speech recognition by recurrent neural networks
title_short	Visual speech recognition by recurrent neural networks
title_full	Visual speech recognition by recurrent neural networks
title_fullStr	Visual speech recognition by recurrent neural networks
title_full_unstemmed	Visual speech recognition by recurrent neural networks
title_sort	visual speech recognition by recurrent neural networks
publishDate	1997
url	http://collections.mun.ca/cdm/ref/collection/theses3/id/5663
genre	Newfoundland studies University of Newfoundland
genre_facet	Newfoundland studies University of Newfoundland
op_source	Paper copy kept in the Centre for Newfoundland Studies, Memorial University Libraries
op_relation	Electronic Theses and Dissertations (29.83 MB) -- http://collections.mun.ca/PDFs/theses/Rabi_Gihad.pdf a1211948 http://collections.mun.ca/cdm/ref/collection/theses3/id/5663
op_rights	The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
_version_	1766113036767068160

Visual speech recognition by recurrent neural networks

Similar Items