ICASSP 2021

ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2021 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

FULL-DUPLEX MULTIFUNCTION TRANSCEIVER WITH JOINT CONSTANT ENVELOPE TRANSMISSION AND WIDEBAND RECEPTION // ICASSP 2021 poster

ICASSP21_poster_5258.pdf

ICASSP21_poster_5258.pdf (382)

Categories:: Signal and System Modeling, Representation and Estimation

29 Views

SPARSE-CODED DYNAMIC MODE DECOMPOSITION ON GRAPH FOR PREDICTION OF RIVER WATER LEVEL DISTRIBUTION

This work proposes a method for estimating dynamics on graph by using dynamic mode decomposition (DMD) and sparse approximation with graph filter banks (GFBs). The motivation of introducing DMD on graph is to predict multi-point river water levels for forecasting river flood and giving proper evacuation warnings. The proposed method represents a spatio-temporal variation of physical quantities on a graph as a time-evolution equation. Specifically, water level observation data available on the Internet is collected by web scraping.

Poster_SPARSE-CODED DYNAMIC MODE DECOMPOSITION ON GRAPH FOR PREDICTION OF RIVER WATER LEVEL DISTRIBUTION.pdf

Poster_SPARSE-CODED DYNAMIC MODE DECOMPOSITION ON GRAPH FOR PREDICTION OF RIVER WATER LEVEL DISTRIBUTION.pdf (365)

Categories:: Signal and System Modeling, Representation and Estimation

134 Views

G-ARRAYS: GEOMETRIC ARRAYS FOR EFFICIENT POINT CLOUD PROCESSING

Read more about G-ARRAYS: GEOMETRIC ARRAYS FOR EFFICIENT POINT CLOUD PROCESSING
Log in to post comments

Slides.pdf

Slides.pdf (922)

Categories:: Image/Video Processing

28 Views

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline.

slides.pdf

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model (360)

Categories:: Other

58 Views

END-TO-END MULTILINGUAL AUTOMATIC SPEECH RECOGNITION FOR LESS-RESOURCED LANGUAGES: THE CASE OF FOUR ETHIOPIAN LANGUAGES

End-to-End (E2E) approach, which maps a sequence of input features into a sequence of grapheme or words, to Automatic Speech Recognition (ASR) is a hot research agenda. It is interesting for less-resourced languages since it avoids the use of pronunciation dictionary, which is one of the major components in the traditional ASR systems. However, like any deep neural network (DNN) approaches, E2E is data greedy. This makes the application of E2E to less-resourced languages questionable.

MarthaSolomonTanja_Poster.pdf

MarthaSolomonTanja_Poster.pdf (356)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

53 Views

SCALABLE MULTILEVEL QUANTIZATION FOR DISTRIBUTED DETECTION

Read more about SCALABLE MULTILEVEL QUANTIZATION FOR DISTRIBUTED DETECTION
Log in to post comments

A scalable algorithm is derived for multilevel quantization of sensor observations in distributed sensor networks, which consist of a number of sensors transmitting a summary information of their observations to the fusion center for a final decision. The proposed algorithm is directly minimizing the overall error probability of the network without resorting to minimizing pseudo objective functions such as distances between probability distributions.

icassp2021_poster.pdf

Poster (368)

Categories:: Statistical Signal Processing

31 Views

Fast and Robust ADMM for Blind Super-resolution

Read more about Fast and Robust ADMM for Blind Super-resolution
Log in to post comments

Though the blind super-resolution problem is nonconvex in nature, recent advance shows the feasibility of a convex formulation which gives the unique recovery guarantee. However, the convexification procedure is coupled with a huge computational cost and is therefore of great interest to investigate fast algorithms. To do so, we adapt an operator splitting approach ADMM and combine it with a novel preconditioning scheme. Numerical results show that the convergence rate is significantly improved by around two orders of magnitudes compared to the currently most adopted solver CVX.

Poster_2607_ICASSP2021.pdf

Poster (328)

Categories:: Signal and System Modeling, Representation and Estimation

44 Views

AUDITORY FILTERBANKS BENEFIT UNIVERSAL SOUND SOURCE SEPARATION

Read more about AUDITORY FILTERBANKS BENEFIT UNIVERSAL SOUND SOURCE SEPARATION
Log in to post comments

Han LI_poster.pdf

Han LI_poster.pdf (330)

Categories:: Source Separation and Signal Enhancement

24 Views

Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning

Most of existing audio fingerprinting systems have limitations to be used for high-specific audio retrieval at scale. In this work, we generate a low-dimensional representation from a short unit segment of audio, and couple this fingerprint with a fast maximum inner-product search. To this end, we present a contrastive learning framework that derives from the segment-level search objective. Each update in training uses a batch consisting of a set of pseudo labels, randomly selected original samples, and their augmented replicas.

icassp 2021 poster.pdf

icassp 2021 poster.pdf (478)

Categories:: Music Signal Processing
Content-Based Audio Processing
Audio Analysis and Synthesis

50 Views

ARTIFICIALLY SYNTHESISING DATA FOR AUDIO CLASSIFICATION AND SEGMENTATION TO IMPROVE SPEECH AND MUSIC DETECTION IN RADIO BROADCAST

Segmenting audio into homogeneous sections such as music and speech helps us understand the content of audio. It is useful as a pre-processing step to index, store, and modify audio recordings, radio broadcasts and TV programmes. Deep learning models for segmentation are generally trained on copyrighted material, which cannot be shared. Annotating these datasets is time-consuming and expensive and therefore, it significantly slows down research progress. In this study, we present a novel procedure that artificially synthesises data that resembles radio signals.

Venkatesh Slides and poster.pdf

Venkatesh Slides and poster.pdf (363)

Categories:: Content-Based Audio Processing
Audio Processing Systems

22 Views

Pages