论文部分内容阅读
This paper investigates the use of Deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks(DBLSTM-RNNs)for voice conversion.Temporal correlations across speech frames are not directly modeled in frame-based methods using conventional Deep Neural Networks(DNNs),which results in a limited quality of the converted speech.To improve the naturalness and continuity of the speech output in voice conversion,we propose a sequence-based conversion method using DBLSTM-RNNs to model not only the frame-wised relationship between the source and the target voice,but also the long-range context-dependencies in the acoustic trajectory.