论文部分内容阅读
在汉语语音合成中,音节内清音和浊音的时长是影响自然度的重要因素、并且与说话人关系较大的个性化特征之一。该文针对基于隐Markov模型(HMM)的汉语语音合成说话人自适应,提出了一种清浊音时长优化算法。将原始说话人训练语料的清音在音节中的相对时长特征根据语境特征进行决策树聚类,并进一步使用自适应算法将决策树中的特征值自适应到目标说话人的清音相对时长。在语音合成时,从该决策树得到目标说话人的清音相对时长参考值,合成语音的清浊音时长按照参考值进行调整。实验表明:该算法可以提高HMM汉语语音合成中说话人自适应的时长预测准确度,有效地提高说话人自适应的相似度和合成语音的自然度。
In Chinese speech synthesis, the duration of unvoiced and voiced speech in syllables is an important factor affecting the degree of naturalness and one of the more personal characteristics of the speaker. In this paper, based on Hidden Markov Model (HMM) for Chinese speech synthesis speaker adaptation, an optimization algorithm for the duration of voiced / unvoiced voices is proposed. The relative duration characteristics of unvoiced syllables in syllables of the original speaker training are clustered according to the context features and the adaptive algorithm is used to adaptively adapt the eigenvalues in the decision tree to the target speaker’s unvoiced relative duration. In speech synthesis, the target speaker’s voiceless relative time length reference value is obtained from the decision tree, and the duration of the voiced and unvoiced voice of the synthesized voice is adjusted according to the reference value. Experiments show that this algorithm can improve the speaker prediction accuracy in HMM Chinese speech synthesis, effectively improve the speaker’s self-adaptive similarity and the naturalness of synthesized speech.