使用潜在狄利克雷分解的单声道说话人分离
摘要:我们提出了一种使用声谱图的潜在变量分解算法,来从混合的单声道录音里分离出多个说话人。我们将语音信号短时傅立叶变换的每一个幅度谱分量建模成一个离散随机过程的输出。这个离散随机过程产生一系列频率分辨率分量。这个过程被建模成混合多项式分布,这些分量多项式的混合权重在不同的分析窗间变化。这些分量多项式可以认为是属于特定说话人的,而且可以通过对每一个说话人训练信号来得到。我们把每一个说话人的混合权重的先验分布建模成一个狄利克雷分布。代表混合信号幅度谱分量的分布被分解成所有单个说话人多项式分布的混合。通过这种分解,频率分布,或者说每一个说话人的语音频谱将得到重建。
1、 简介
对于单声道说话人的分离问题,比如说从有几个人说话的单声道录音里分离出当前说话人的问题,历史上一直是考虑从频率选择的角度来解决的。为了分离出每一个说话人的语音信号,需要从不完整的视频序列里,重建出混合信号中的受说话人控制的时频分量。对说话人的时频分量的选择在实际中可能是基于感知原理的(如文献
Latent Dirichlet Decomposition for Single
Channel Speaker Separation
Bhiksha Raj, Madhusudana V.S. Shashanka, Paris Smaragdis
TR2006-064 May 2006
Abstract
We present an algorithm for the seaparation of multiple speakers from mixed single-channel
recordings by latent variable decomposition of the speech spectrogram. We model each magni-
tude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a
discrete random process that generates frequency bin indices. The distribution of the process is
modeled as a mixture of multinomial distributions, such that the mixture weights of the compo-
nent multinomials vary from analysis window to analysis window. The component multinomials
are assumed to be speaker specific and are learned from training signals for each speaker. We
model the prior distribution of the mixture weights for each speaker as a Dirichlet distribution.
The distributions representing magnitude spectral vectors for the mixed signal are decomposed
into mixtures of the multinomials for all component speakers. The frequency distribution i.e. the
spectrum for each speaker is reconstructed from this decomposition.