ICASSP 2021

ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2021 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

PROGRESSIVE MULTI-STAGE FEATURE MIX FOR PERSON RE-IDENTIFICATION

Read more about PROGRESSIVE MULTI-STAGE FEATURE MIX FOR PERSON RE-IDENTIFICATION
1 comment
Log in to post comments

poster_0420.pdf

poster_0420.pdf (353)

slides.pdf

slides.pdf (367)

Categories:: Image, Video, and Multidimensional Signal Processing

25 Views

Streaming Multi-Speaker ASR with RNN-T

Read more about Streaming Multi-Speaker ASR with RNN-T
Log in to post comments

Recent research shows end-to-end ASR systems can recognize overlapped speech from multiple speakers. However, all published works have assumed no latency constraints during inference, which does not hold for most voice assistant inter- actions. This work focuses on multi-speaker speech recognition based on a recurrent neural network transducer (RNN-T) that has been shown to provide high recognition accuracy at a low latency online recognition regime.

icassp_presentation_final.pdf

ICASSP 2021 presentation slides (340)

poster_20210412_final.pdf

ICASSP 2021 presentation poster (960)

Categories:: Robust Speech Recognition (SPE-ROBU)

36 Views

ON LOSS FUNCTIONS FOR DEEP-LEARNING BASED T60 ESTIMATION

Read more about ON LOSS FUNCTIONS FOR DEEP-LEARNING BASED T60 ESTIMATION
Log in to post comments

Reverberation time, T60, directly influences the amount of reverberation
in a signal, and its direct estimation may help with
dereverberation. Traditionally, T60 estimation has been done
using signal processing or probabilistic approaches, until recently
where deep-learning approaches have been developed.
Unfortunately, the appropriate loss function for training the
network has not been adequately determined. In this paper,
we propose a composite classification- and regression-based

ICASSP2021 Poster_Grace.V2.pdf

ICASSP2021 Poster_Grace.V2.pdf (332)

Categories:: Room Acoustics and Acoustic System Modeling

26 Views

OAS-Net: Occlusion Aware Sampling Network for Accurate Optical Flow

Read more about OAS-Net: Occlusion Aware Sampling Network for Accurate Optical Flow
Log in to post comments

OASNet_poster.pdf

Poster (320)

Categories:: Image/Video Processing

16 Views

Looking through Walls: Inferring Scenes from Video-Surveillance Encrypted Traffic

Read more about Looking through Walls: Inferring Scenes from Video-Surveillance Encrypted Traffic
Log in to post comments

Nowadays living environments are characterized by networks of inter-connected sensing devices that accomplish different tasks, e.g., video-surveillance of an environment by a network of CCTV cameras. A malicious user could gather sensitive details on people’s activities by eavesdropping the exchanged data packets. To overcome this problem,video streams are protected by encryption systems, but even secured channels may still leak some information.

ICASSP2021_poster.pdf

Poster (658)

ICASSP2021_presentation.pdf

Presentation (574)

Categories:: Information Forensics and Security

29 Views

Parameter Identifiability of Spatial-Smoothing-Based Bistatic MIMO Radar

Read more about Parameter Identifiability of Spatial-Smoothing-Based Bistatic MIMO Radar
Log in to post comments

Diversity smoothing has been widely developed for angle estimation with bistatic multiple input multiple output (MIMO) radar in the presence of coherent targets, the parameter identifiability of which is an important issue. In this paper, we are devoted to establishing more accurate conditions by studying the positive definiteness of smoothed target covariance matrix. The antenna numbers of transmit and receive arrays are derived as functions of the target number and target structure. We show that the new results improve upon previous ones and recover them in special cases.

Parameter Identifiability of SS-Based Bistatic MIMO Radar(poster).pdf

Parameter Identifiability of SS-Based Bistatic MIMO Radar(poster).pdf (378)

Categories:: Sensor Array Processing

22 Views

Generalized Thinned Coprime Array for DOA Estimation

Read more about Generalized Thinned Coprime Array for DOA Estimation
Log in to post comments

We propose a generalized thinned coprime array by introducing the flexible inter-element spacings, where the conventional one can be seen as a special case. We derive closed-form expression for the range of consecutive lags, written as the functions of the antenna numbers and inter-element spacings. We show that, after optimization, the proposed array can achieve more consecutive lags than the other coprime arrays. In particular, the optimized results also provide the minimum number of antenna pairs with small separation.

Generalized Thinned Coprime Array for DOA Estimation(poster).pdf

Generalized Thinned Coprime Array for DOA Estimation(poster).pdf (313)

Categories:: Sensor Array Processing

15 Views

Fusing Information Streams in End-to-End Audio-Visual Speech Recognition

Read more about Fusing Information Streams in End-to-End Audio-Visual Speech Recognition
Log in to post comments

End-to-end acoustic speech recognition has quickly gained widespread popularity and shows promising results in many studies. Specifically the joint transformer/CTC model provides very good performance in many tasks. However, under noisy and distorted conditions, the performance still degrades notably. While audio-visual speech recognition can significantly improve the recognition rate of end-to-end models in such poor conditions, it is not obvious how to best utilize any available information on acoustic and visual signal quality and reliability in these models.

ICASSP21_Wentao.zip

E2E AVSR with decision fusion network (383)

Categories:: Audio for Multimedia

13 Views

Communication Over Block Fading Channels - An Algorithmic Perspective on Optimal Transmission Schemes

Wireless channels are considered that change over time but remain constant for a certain (coherence) period. This behavior is perfectly captured by block fading channels and affects the performance of the corresponding wireless communication systems. Desired closed-form characterizations of optimal transmission schemes remain unknown in many cases. This paper approaches this issue from a fundamental, algorithmic point of view by studying whether or not it is in principle possible to construct or find such optimal transmission

icassp21_fading_talk.pdf

Presentation Slides (323)

icassp21_fading_poster.pdf

Poster (355)

Categories:: Communication Systems and Applications

33 Views

ADVANCING RNN TRANSDUCER TECHNOLOGY FOR SPEECH RECOGNITION

Read more about ADVANCING RNN TRANSDUCER TECHNOLOGY FOR SPEECH RECOGNITION
Log in to post comments

We investigate a set of techniques for RNN Transducers (RNN-Ts) that were instrumental in lowering the word error rate on three different tasks (Switchboard 300 hours, conversational Spanish 780 hours and conversational Italian 900 hours). The techniques pertain to architectural changes, speaker adaptation, language model fusion, model combination and general training recipe. First, we introduce a novel multiplicative integration of the encoder and prediction network vectors in the joint network (as opposed to additive).

icassp2021-slides.pdf

icassp2021-slides.pdf (347)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

13 Views

Pages