[poster] Improving Design of Input Condition Invariant Speech Enhancement

Building a single universal speech enhancement (SE) system that can handle arbitrary input is a demanded but underexplored research topic. Towards this ultimate goal, one direction is to build a single model that handles diverse audio duration, sampling frequencies, and microphone variations in noisy and reverberant scenarios, which we deﬁne here as “input condition invariant SE”. Such a model was recently proposed showing promising performance; however, its multi-channel performance degraded severely in real conditions. In this paper we propose novel architectures to improve the input condition invariant SE model so that performance in simulated conditions remains competitive while real condition degradation is much mitigated. For this purpose, we redesign the key components that comprise such a system. First, we identify that the channelmodeling module’s generalization to unseen scenarios can be suboptimal and redesign this module. We further introduce a two-stage training strategy to enhance training efﬁciency. Second, we propose two novel dual-path time-frequency blocks, demonstrating superior performance with fewer parameters and computational costs compared to the existing method. All proposals combined, experiments on various public datasets validate the efﬁcacy of the proposed model, with signiﬁcantly improved performance on real conditions. Recipes with full model details will be released for reproducibility.

poster_USES2.pdf

poster_USES2.pdf (250)

Thumbs Up

CITE

Documents

Poster

[poster] Improving Design of Input Condition Invariant Speech Enhancement

poster_USES2.pdf

QUESTIONS?