Documents
Demo
Optimize for my Voice with Speaker Identification
- Citation Author(s):
- Submitted by:
- Marcin Ciolek
- Last updated:
- 31 May 2023 - 6:51am
- Document Type:
- Demo
- Document Year:
- 2023
- Event:
- Presenters:
- Marcin Ciolek
- Paper Code:
- ST-L1.04
- Categories:
- Log in to post comments
The proposed system enhances speech in video-conferencing applications. We aim to improve speech quality and communication clarity in various daily-life scenarios. Our demo will appeal to the ICASSP audience because it is related to the 5th DNS Challenge. The demo aims to enhance audio signal to preserve the primary talker while suppressing neighboring talkers, noise, and reverberation. Besides these challenges, the system automatically controls the level of the primary talker and doesn’t boost return echos or misdetections of noise as speech. The novelty of the proposed system is given by implementing adaptive primary talker detection and tracking while preserving fast and accurate far-field talker attenuation. We want the ICASSP audience to pay attention to the new challenge called word chopping caused by the misdetection of the primary talker as the interfering talker. The demo shows that we can adaptively track a primary talker moving without headphones and fully preserve their speech up to 2 meters from the laptop or mobile device. The increased robustness to word chopping is achieved by incorporating a speaker identification network into the system. The objective and subjective results are superior to the ones obtained from the existing product feature "Optimize for my Voice", available in the Cisco Webex Teams videoconferencing application for Desktop and Mobile devices. The demo fits the theme of Signal Processing in the AI Era and can be demonstrated to an audience live using our demo software. The interaction with the ICASSP audience will be based on playing a live scene where the primary talker and interfering (far-field) talkers will participate. The demo software will process the noisy audio in almost real time. Next, everyone will listen to their enhanced voice compared to the noisy input. Our demo will be supported with a poster and a videocast. We want to emphasize that the demo prepared by our R&D team isn’t a commercial product.