论文部分内容阅读
视觉特征提取是听视觉语音识别研究的热点问题。文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分。文章同时提出了一利利用语音识别结果进行LDA训练数据自动标注的方法。这种方法免去了繁重的人工标注工作,避免了标注错误。实验表明,将’VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率;将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80%以上。
Visual feature extraction is a hot issue in the study of visual speech recognition. This paper introduces a robust lip-shape dynamic feature based on Visemic LDA, which fully takes into account the change of lip shape and the visual Viseme classification when pronouncing. At the same time, the article puts forward a method of automatically labeling LDA training data by using speech recognition results. This method eliminates the need for heavy manual labeling and avoids annotation errors. Experiments show that the introduction of ’VisemicLDA visual features into auditory visual speech recognition can greatly improve the recognition rate of speech recognition systems under noisy conditions. After combining this visual feature with multi-stream HMM, Strong noise, the recognition rate can still reach more than 80%.