Tandem MLNs based Phonetic Feature Extraction for Phoneme Recognition
Keywords:
: multilayer neural network, hidden Markov model, automatic speech recognition, mel frequency cepstral coefficients, distinctive phonetic features, out-of-vocabularyAbstract
This paper presents a method for automatic phoneme recognition for Japanese language using tandem MLNs. Here, an accurate phoneme recognizer or phonetic type-writer, which extracts out-of-vocabulary (OOV) word for resolving OOV problem that occurred when a new vocabulary does not exist in word lexicon, plays an important role in current hidden Markov model (HMM)-based automatic speech recognition (ASR) system. The construction of the proposed method comprises three stages: (i) the multilayer neural network (MLN) that converts acoustic features, mel frequency cepstral coefficients (MFCCs), into distinctive phonetic features (DPFs) is incorporated at first stage, (ii) the second MLN that combines DPFs and acoustic features as input and outputs a 45 dimensional DPF vector with less context effect is added and (iii) the 45 dimensional feature vector generated by the second MLN are inserted into a hidden Markov model (HMM) based classifier to obtain more accurate phoneme strings from the input speech. From the experiments on Japanese Newspaper Article Sentences (JNAS) in clean acoustic environment, it is observed that the proposed method provides a higher phoneme correct rate and improves phoneme accuracy tremendously over the method based on a single MLN. Moreover, it requires fewer mixture components in HMMs. Consequently, less computation time is required for the HMMs.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 International Journal of Computer Information Systems and Industrial Management Applications
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.