论文部分内容阅读
To reduce degradation in speech recognition due to varied characteristics of different speakers,a method of perceptual frequency warping based on subglottal resonances for speaker normalization is proposed.The warping factor is extracted from the second subglottal resonance using acoustic coupling between subglottis and vocal tract.The second subglottal resonance is independent of the speech content,which reflects the speaker characteristics more than the third formant.The perceptual minimum variation distortionless response(PMVDR) coefficient is normalized,which is more robust and has better anti-noise capability than MFCC. The normalized coefficients are used in the speech-mode training and speech recognition.Experiments show that the word error rate,as compared with MFCC and the spectrum warping by the third formant,decreases by 4%and 3%respectively in clean speech recognition,and by 9%and 5%respectively in a noisy environment.The results indicate that the proposed method can improve the word recognition accuracy in a speaker-independent recognition system.
To reduce degradation in speech recognition due to to characteristics of different speakers, a method of perceptual frequency warping based on subglottal resonances for speaker normalization is proposed. The warping factor is extracted from the second subglottal resonance using acoustic coupling between subglottis and vocal tract. second subglottal resonance is independent of the speech content, which reflects the speaker characteristics more than the third formant. The perceptual minimum variation distortionless response (PMVDR) coefficient is normalized, which is more robust and has better anti-noise capability than MFCC. The normalized coefficients are used in the speech-mode training and speech recognition. Experiments show that the word error rate, as compared with MFCC and the spectrum warping by the third formant, decreases by 4% and 3% respectively in clean speech recognition, and by 9 % and 5% respectively in a noisy environment. The results indicate that the proposed method can improv e the word recognition accuracy in a speaker-independent recognition system.