以FPGA實現基於HMM之音節模型組成之語音辨識系統

語音辨識目前最熱門的方法就是隱藏式馬可夫模型。本篇論文提出基於隱藏式馬可夫模型之語音辨識系統,並將整個自動語音語者辨識(ASSR)系統實現於FPGA上。本系統主要分為四個部分,分別為:語音前處理、特徵擷取、語音語者辨識及非待選詞彙(Out-of-Vocabulary OOV)與非待選語者(Out-of-Speaker OOS)偵測。 本篇研究中的特徵擷取模組採用梅爾規格倒頻譜係數(Mel-Frequency Cepstral Coefficients MFCCs) 做為語音辨識特徵,語音辨識部分採用隱藏馬可夫模型(HMM)來對每一個音素(Phoneme) 建立聲學模型(Acoustic Mod...

Full description

Bibliographic Details
Main Authors: 廖韋翔, Liao, Wei-Xiang
Other Authors: 電機工程學系, 王駿發, Wang, Jhing-Fa
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://ir.lib.ncku.edu.tw/handle/987654321/169910
http://ir.lib.ncku.edu.tw/bitstream/987654321/169910/1/index.html
Description
Summary:語音辨識目前最熱門的方法就是隱藏式馬可夫模型。本篇論文提出基於隱藏式馬可夫模型之語音辨識系統,並將整個自動語音語者辨識(ASSR)系統實現於FPGA上。本系統主要分為四個部分,分別為:語音前處理、特徵擷取、語音語者辨識及非待選詞彙(Out-of-Vocabulary OOV)與非待選語者(Out-of-Speaker OOS)偵測。 本篇研究中的特徵擷取模組採用梅爾規格倒頻譜係數(Mel-Frequency Cepstral Coefficients MFCCs) 做為語音辨識特徵,語音辨識部分採用隱藏馬可夫模型(HMM)來對每一個音素(Phoneme) 建立聲學模型(Acoustic Model),實驗部分則採用公開的語料庫(Corpus) 來做為訓練語料,語料庫選用自北京清華大學的中文語料庫THCHS-30以及Carnegie Mellon University 的英文語料庫CMU ARCTIC Databases。 語者辨識使用二元對分分類法(Binary Halved Clustering Method) 產生語者模型。ASSR 最後的模組則是利用文法(Grammar)以偵測OOV,以及使用OOS偵測演算法以偵測OOS。 ASSR 系統的實驗分別在兩種平台上測試,PC 和Xilinx Spartan-6 FPGA。實驗結果顯示在FPGA 上的中文語音辨識率為90 8%,OOV 辨識率為88 7%,而在電腦上的英文語音辨識率為86 6%,OOV 辨識率為84 9%,語者辨識率則是81 3%、OOS辨識率80 8%。 Hidden Markov Models (HMMs) is one of the most popular methods for modern speech recognition In this thesis we propose an Automatic Speech-Speaker Recognition (ASSR) system on a FPGA platform The ASSR system includes four parts: 1) pre-processing 2) feature extraction 3) speech and speaker recognition and 4) Out-of-Vocabulary (OOV) and Out-of-Speaker (OOS) detection This study adopts the Mel-frequency cepstral coefficients (MFCCs) as the features for feature extraction module We use Hidden Markov Model (HMM) to build the acoustic model for each phoneme and evaluate our approaches on two databases: the THCHS-30 (Tsinghua Chinese 30 hour database) and the CMU ARCTIC Databases The binary halved clustering (BHC) method uses binary-halved splitting to generate speaker models for low complexity requirement The last part of ASSR uses the grammar to detect OOV and the OOS detection algorithm to detect OOS The experiments are conducted on two types of platforms including PC and Xilinx Spartan-6 FPGA The experimental results indicate that the proposed work can achieve 90 8% of Mandarin speech recognition and 86 6% of English speech recognition rate respectively The work can achieve 88 7% of OOV detection rate of Mandarin and 84 9% of OOV detection rate of English as well The speaker recognition rate also reaches to 81 3% and OOS detection rate reaches to 80 8% respectively