论文部分内容阅读
虽然浮值掩蔽比二值掩蔽有更好的语音分离效果,但是由于理想浮值掩蔽难以直接估计,现有的语音分离系统通常以理想二值掩蔽估计作为计算目标。我们提出了一个二值掩蔽到浮值掩蔽的泛化算法。由于实现浮值掩蔽估计的关键在于噪声能量追踪,我们首先采用指数分布刻画以混合谱和噪声能量以混合能量及二值掩蔽为观测的条件分布。其次,采用高斯马尔柯夫条件随机场刻画噪声估计在连续几帧内的关联。最后,采用马尔柯夫链-蒙特卡洛计算噪声能量最小均方误差估计并进一步计算浮值掩蔽。实验表明,相比于基于二值掩蔽估计的常规算法,我们所提出的算法在信噪比增益和客观感知质量两方面都有显著提高。
Although floating masking has better speech separation than binary masking, the existing speech separation systems often use ideal binary masking estimates as computational targets due to the difficulty of direct estimation of ideal floating masking. We propose a generalization algorithm for binary masking to floating masking. Since the key to achieving the floating-mask estimation lies in the noise energy tracking, we first use an exponential distribution to characterize the conditional distribution observed with mixed energy and binary masking with mixing spectrum and noise energy. Secondly, the Gaussian Markov random field is used to characterize the correlation of noise estimates in consecutive frames. Finally, Markov chain Monte Carlo is used to calculate the minimum mean square error of noise energy and to further calculate the floating mask. Experiments show that compared with the conventional algorithm based on binary mask estimation, our proposed algorithm has significantly improved signal-to-noise gain and objective perceived quality.