A New Approach to Extract Formant Instantaneous Characteristics for Speaker Identification
Keywords:
speaker identification, formant, HHT, instantaneous frequency, MFCCAbstract
This article presents a new approach to extract formant instantaneous characteristics (FIC) parameters for speaker identification (SI). On the one hand, FIC could be derived from time-frequency description of speech signal in the Hilbert-Huang Transform (HHT). HHT is a powerful tool to analyze non-stationary signal and consists of sifting procedure of empirical mode decomposition (EMD) and the Hilbert Transform (HT). The sifting procedure of EMD is to get intrinsic mode functions (IMF), so it is significant to determine all the instantaneous information from nonlinear or non-stationary signals like speech signals. This could be achieved directly through HT yet. On the other hand, a lot of information comprised in formant is not only reflection of speech contents but also speakers’ individual features, so that have to get finer formant properties. Compared with traditional methods, the FIC of extracting by HHT is able to describe fine formant instantaneous information in detail. These FIC parameters are a class of reflections of speaker’s individual features from both glottal wave and vocal tract. Finally, different kinds of FIC parameters were combined to MFCC to form a plurality of experimental parameters for SI based on a Gaussian mixture model (GMM). And results show that FIC parameters play a compensating role to MFCC in SI, with one of improved relative rate up to 11.96%. Experimental utterances are Chinese mandarin under clean background recording circumstances.