Sorry, you need to enable JavaScript to visit this website.

PHASE RECONSTRUCTION IN SINGLE CHANNEL SPEECH ENHANCEMENT BASED ON PHASE GRADIENTS AND ESTIMATED CLEAN-SPEECH AMPLITUDES

DOI:
10.60864/a9qp-qv41
Citation Author(s):
Yanjue Song, Nilesh Madhu
Submitted by:
Yanjue Song
Last updated:
6 June 2024 - 10:27am
Document Type:
Poster
Document Year:
2024
Event:
Presenters:
Yanjue Song
Paper Code:
AASP-P17.1
 

Phase gradients can help enforce phase consistency across time and frequency, further improving the output of speech enhancement approaches. Recently, neural networks were used to estimate the phase gradients from the short-term amplitude spectra of clean speech. These were then used to synthesise phase to reconstruct a plausible time-domain signal. However, using purely synthetic phase in speech enhancement yields unnatural-sounding output. Therefore we derive a closed-form phase estimate that combines the synthetic phase with that of the enhanced speech, yielding more natural output. Secondly, we empirically evaluate the benefit of (re-)training the phase gradient estimation networks on the amplitude spectra of the estimated clean-speech signal. Lastly we apply our proposed phase enhancement to the output of a phase-aware speech enhancement DNN, verifying if an independent phase estimator brings additional advantage. Results show that, compared to the baseline, the proposed approach further improves the DNSMOS scores by ≈0.1 on average, and significantly in the first quartile on broadband, quasi-stationary noises, where phase enhancement is expected to have maximum benefit. Training phase gradient estimators on estimated speech spectra is additionally beneficial here. Our method even improves the performance of the phase-aware approach, indicating its feasibility as a generic post-processor for speech enhancement.

up
0 users have voted: