Sorry, you need to enable JavaScript to visit this website.

IEEE ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2023 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Low frequency personal sound zones can be created by controlling the sound pressure in separate spatially confined regions. The performance of a sound zone system using wireless communication may be degraded due to potential packet losses. In this paper, we propose robust FIR filters for low-frequency sound zone system by incorporating information about the expected packet losses into the design.


More complex and ever more common lens distortion correction post-processing is seriously hampering state-of-the-art camera attribution techniques. In this paper, we show that


Phonation modes play a vital role in voice quality evaluation and vocal health diagnosis. Existing studies on phonation modes cover feature analysis and classification of vowels, which does not apply to real-life scenarios. In this paper, we define the phonation mode detection (PMD) problem, which entails the prediction of phonation mode labels as well as their onset and offset timestamps.


BCI Motor Imagery datasets usually are small and have different electrodes setups. When training a Deep Neural Network, one may want to capitalize on all these datasets to increase the amount of data available and hence obtain good generalization results.


Synthetic human speech signals have become very easy to generate given modern text-to-speech methods. When these signals are shared on social media they are often compressed using the Advanced Audio Coding (AAC) standard. Our goal is to study if a small set of coding metadata contained in the AAC compressed bit stream is sufficient to detect synthetic speech. This would avoid decompressing of the speech signals before analysis. We call our proposed method AAC Synthetic Speech Detection (ASSD).


Accurate pitch estimation in speech signal plays a vital role in several applications. Robust pitch estimation in telephone speech is still a challenge due to the narrow bandwidth of the signal. Electroglottograph (EGG) signal is a reliable means for pitch estimation, however, it’s not practically possible to


The proposed system enhances speech in video-conferencing applications. We aim to improve speech quality and communication clarity in various daily-life scenarios. Our demo will appeal to the ICASSP audience because it is related to the 5th DNS Challenge. The demo aims to enhance audio signal to preserve the primary talker while suppressing neighboring talkers, noise, and reverberation. Besides these challenges, the system automatically controls the level of the primary talker and doesn’t boost return echos or misdetections of noise as speech.


An interpolation method for region-to-region acoustic transfer functions (ATFs) based on kernel ridge regression with an adaptive kernel is proposed. Most current ATF interpolation methods do not incorporate the acoustic properties for which measurements are performed. Our proposed method is based on a separate adaptation of directional weighting functions to directed and residual reverberations, which are used for adapting kernel functions. Thus, the proposed method can not only impose constraints on fundamental acoustic properties, but can also adapt to the acoustic environment.