Sorry, you need to enable JavaScript to visit this website.

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

This paper introduces an innovative deep learning framework for parallel voice conversion to mitigate inherent risks associated with such systems. Our approach focuses on developing an invertible model capable of countering potential spoofing threats. Specifically, we present a conversion model that allows for the retrieval of source voices, thereby facilitating the identification of the source speaker. This framework is constructed using a series of invertible modules composed of affine coupling layers to ensure the reversibility of the conversion process.

Categories:
78 Views

To address wideband direction of arrival (DOA) estimation problems, this paper proposes a gridless and covariance-free joint multi-band (JMB) DOA estimation method using low-rank matrix recovery. In contrast with subspace methods and sparse array-based methods, a unified frequency grid is established based on the concept of the greatest common divisor (GCD) to solve the nonlinearity of steering matrices from multiple frequencies. With the unified frequency grid, a low-rank master matrix is formed as a combination of the truncated Hankel matrices from different subbands and snapshots.

Categories:
116 Views

The problem of audio-to-audio (A2A) style transfer involves replacing the style features of the source audio with those from the target audio while preserving the content related attributes of the source audio. In this paper, we propose an efficient approach, termed as Zero-shot Emotion Style Transfer (ZEST), that allows the transfer of emotional content present in the given source audio with the one embedded in the target audio while retaining the speaker and speech content from the source.

Categories:
110 Views

Pages