ICASSP 2021

ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2021 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

BLEND-RES^2NET: Blended Representation Space by Transformation of Residual Mapping with Restrained Learning For Time Series Classification

The typical problem like insufficient training instances in time series classification task demands for novel deep neural architecture to warrant consistent and accurate performance. Deep Residual Network (ResNet) learns through H(x)=F(x)+x, where F(x) is a nonlinear function. We propose Blend-Res2Net that blends two different representation spaces: H^1 (x)=F(x)+Trans(x) and H^2 (x)=F(Trans(x))+x with the intention of learning over richer representation by capturing the temporal as well as the spectral signatures (Trans(∙) represents the transformation function).

ICASSP 2021_presentation_Arijit_Ukil.pdf

https://ieeexplore.ieee.org/document/9414647 (241)

Categories:: Neural network learning (MLR-NNLR)

18 Views

Phoneme based Neural Transducer for Large Vocabulary Speech Recognition

Read more about Phoneme based Neural Transducer for Large Vocabulary Speech Recognition
Log in to post comments

To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and word-end-based phoneme label augmentation is proposed to improve performance. Utilizing the local dependency of phonemes, we adopt a simplified neural network structure and a straightforward integration with the external word-level language model to preserve the consistency of seq-to-seq modeling.

poster_Zhou_phoneme-transducer.pdf

poster_Zhou_phoneme-transducer.pdf (223)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

16 Views

PositNN: Training Deep Neural Networks with Mixed Low-Precision Posit

Read more about PositNN: Training Deep Neural Networks with Mixed Low-Precision Posit
Log in to post comments

Low-precision formats have proven to be an efficient way to reduce not only the memory footprint but also the hardware resources and power consumption of deep learning computations. Under this premise, the posit numerical format appears to be a highly viable substitute for the IEEE floating-point, but its application to neural networks training still requires further research. Some preliminary results have shown that 8-bit (and even smaller) posits may be used for inference and 16-bit for training, while maintaining the model accuracy.

presentation.pdf

Presentation Slides (328)

poster.pdf

Poster (320)

Categories:: Algorithm and architecture co-optimization

2 Views

Slides_ICASSP2021

Read more about Slides_ICASSP2021
Log in to post comments

Slides_ICASSP.pdf

Slides ICASSP2021 (267)

Categories:: Signal Processing Theory and Methods

15 Views

Exploiting Non-negative Matrix Factorization for Binaural Sound Localization in the Presence of Directional Interference

This study presents a novel solution to the problem of binaural localization of a speaker in the presence of interfering directional noise and reverberation. Using a state-of-the-art binaural localization algorithm based on a deep neural network (DNN), we propose adding a source separation stage based on non-negative matrix factorization (NMF) to improve the localization performance in conditions with interfering sources.

slides.pdf

Presentation slides (215)

poster.pdf

Conference poster (190)

Categories:: Source Separation and Signal Enhancement

9 Views

Poster_ICASSP2021

Read more about Poster_ICASSP2021
Log in to post comments

Poster_ICASSP.pdf

Poster_ICASSP.pdf (413)

Categories:: Signal Processing Theory and Methods

8 Views

Autoregressive Fast Multichannel Nonnegative Matrix Factorization For Joint Blind Source Separation And Dereverberation

This paper describes a joint blind source separation and dereverberation method that works adaptively and efficiently in a reverberant noisy environment. The modern approach to blind source separation (BSS) is to formulate a probabilistic model of multichannel mixture signals that consists of a source model representing the time-frequency structures of source spectrograms and a spatial model representing the inter-channel covariance structures of source images.

ICASSP2021_Poster.pdf

ICASSP2021_Poster.pdf (245)

ICASSP2021_Slide.pdf

ICASSP2021_Slide.pdf (284)

Categories:: Source Separation and Signal Enhancement

9 Views

End-to-End Audio-Visual Speech Recognition with Conformers

Read more about End-to-End Audio-Visual Speech Recognition with Conformers
Log in to post comments

In this work, we present a hybrid CTC/Attention model based on a modified ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner. In particular, the audio and visual encoders learn to extract features directly from raw pixels and audio waveforms, respectively, which are then fed to conformers and then fusion takes place via a Multi-Layer Percep- tron (MLP). The model learns to recognise characters using a com- bination of CTC and an attention mechanism.

conformers_poster.pdf

conformers_poster.pdf (271)

Categories:: Multimodal signal processing

34 Views

Slides for SPTM-14.6: WIENER FILTER ON MEET/JOIN LATTICES

Read more about Slides for SPTM-14.6: WIENER FILTER ON MEET/JOIN LATTICES
Log in to post comments

Recent work introduced a framework for signal processing (SP) on meet/join lattices. Such a lattice is partially ordered and supports a meet (or join) operation that returns the greatest lower bound and the smallest upper bound of two elements, respectively. Lattices appear in various domains and can be used, for example, to express rankings in social choice theory or multisets in combinatorial auctions. Discrete lattice SP (DLSP) uses the meet operation as shift and derives associated notions of convolution and Fourier transform for signals indexed by lattices.