Abstract：Traditional speech endpoint detection methods make use of the difference between speech and noise in a single parameter to segment the start and end points of speech in the signal. However, the performance of different parameters under different noise environments with low signal-to-noise ratio is unstable and the robustness is poor. To overcome such problem, this paper proposed a speech endpoint detection method based on the fusion of four parameters: sub-band spectral variance, energy entropy ratio, MFCC cepstrum distance and likelihood ratio. This method could change the threshold of each parameter adaptively, then determined the voting mechanism by real-time detection of the energy entropy ratio of the noise segment, so as to determine the speech endpoint. Experimental results show that the proposed method has higher detection accuracy and robustness than the conventional endpoint detection methods in the case of low signal-to-noise ratio. The proposed method has certain reference significance for the follow-up processing of speech signal.
雷静, 何培宇, 徐自励. 低信噪比下多参数融合的自适应语音端点检测[J]. 信号处理, 2020, 36(8): 1205-1211.
Lei Jing, He Peiyu, Xu Zili. Adaptive Speech Endpoint Detection based on Multi-parameter Fusion in Low SNR Situation. Journal of Signal Processing, 2020, 36(8): 1205-1211.