Abstract:In nonnegative matrix factorization (NMF)-based speech enhancement, the matched noise basis matrix needs to be trained, which is difficult to be guaranteed in practice. In this paper, an NMF-based speech enhancement method is proposed in which the noise basis matrix is updated online. First, the non-speech regions of noisy signal are determined by utilizing a decision module of non-speech frame. Then, a fixed-length sliding window is used to cover several recent past frames of noisy speech determined as non-speech, and the magnitude spectrums of these non-speech frames are used to update the noise basis matrix online. After that, the updated noise basis matrix and the pre-trained speech basis matrix are used to achieve speech enhancement. This method can obtain the matched noise basis matrix online and effectively solve the problem of the mismatch of the noise basis matrix. The test results demonstrate that the noise basis matrix trained online by the proposed method performs better than that trained from the matched dataset in most conditions.
白志刚,鲍长春. 在线更新噪声基矩阵的非负矩阵分解语音增强方法[J]. 信号处理, 2020, 36(6): 831-838.
Bai Zhigang, Bao Changchun. Online Update of Noise Basis Matrix for the NMF-based Speech Enhancement. Journal of Signal Processing, 2020, 36(6): 831-838.
N. Mohammadiha, T. Gerkmann, A. Leijon. A new linear MMSE filter for single channel speech enhancement based on nonnegative matrix factorization [C]. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 16-19 Oct. 2011: 45-48.
[2]
K. W. Wilson, B. Raj, P. Smaragdis, A. Divakaran. Speech denoising using nonnegative matrix factorization with priors [C]. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, NV, USA, 31 March-4 April 2008: 4029-4032.
[3]
K. Kwon, J. W. Shin, N. S. Kim. NMF-based speech enhancement using bases update [J]. IEEE Signal Processing Letters, April 2015, 22(4): 450-454.
[4]
T. T. Vu, B. Bigot, E. S. Chng. Combining non-negative matrix factorization and deep neural networks for speech enhancement and automatic speech recognition [C]. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20-25 March 2016: 499-503.
[5]
N. Mohammadiha, P. Smaragdis, A. Leijon. Supervised and unsupervised speech enhancement using nonnegative matrix factorization [J]. IEEE Transactions on Audio, Speech, and Language Processing, Oct. 2013, 21(10): 2140-2151.
[6]
M. N. Schmidt, J. Larsen. Reduction of non-stationary noise using a non-negative latent variable decomposition [C]. IEEE Workshop on Machine Learning for Signal Processing (MLSP), Cancun, Mexico, 16-19 Oct. 2008: 486-491.
[7]
F. Weninger, J. Feliu, B. Schuller. Supervised and semi-supervised suppression of background music in monaural speech recordings [C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25-30 March 2012: 61-64.
[8]
C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller. Real-time speech separation by semi-supervised nonnegative matrix factorization [C]. International Conference on Latent Variable Analysis and Signal Separation (LVA ICA), Tel Aviv, Israel, 2012: 322-329.
[9]
M. Kim, P. Smaragdis. Mixtures of local dictionaries for unsupervised speech enhancement [J]. IEEE Signal Processing Letters, March 2015, 22(3): 293-297.
[10]
K. M. Jeon, H. K. Kim. Local sparsity based online dictionary learning for environment-adaptive speech enhancement with nonnegative matrix factorization [C]. INTERSPEECH 2016, San Francisco, USA, 8-12 September 2016: 2861-2865.
[11]
S. Lee, D. K. Han, H. Ko. Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities [J]. Applied Acoustics, 2017, 117: 257-262.
[12]
T. Gerkmann, C. Breithaupt, R. Martin. Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors [J]. IEEE Transactions on Audio, Speech, and Language Processing, July 2008, 16(5): 910-919.
[13]
F. Bao, H. Dou, M. Jia, C. Bao. Speech enhancement based on a few shapes of speech spectrum [C]. IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP), Xi'an, China, 9-13 July 2014: 90-94.
[14]
D. D. Lee, H. S. Seung. Learning the parts of objects by non-negative matrix factorization [J]. Nature, October 1999, 401(21): 788-791.
[15]
D. D. Lee, H. S. Seung. Algorithms for non-negative matrix factorization [J]. Advances in Neural Information Processing Systems, 2001, 13(6): 556-562.
[16]
I. Cohen, B. Berdugo. Noise estimation by minima controlled recursive averaging for robust speech enhancement [J]. IEEE Signal Processing Letters, January 2002, 9(1): 12-15.
[17]
S. R. Quackenbush, T. P. Barnwell, M. A. Clements. Objective measures of speech quality. Englewood Cliffs, NJ: Prentice Hall, 1988.
[18]
Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. ITU-T Recommendation, P.862, February 2001.
[19]
J. Jensen, C. H. Taal. An algorithm for predicting the intelligibility of speech masked by modulated noise maskers [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(11): 2009-2022.