Content-Based Audio Processing

UNIT-DSR: DYSARTHRIC SPEECH RECONSTRUCTION SYSTEM USING SPEECH UNIT NORMALIZATION

Read more about UNIT-DSR: DYSARTHRIC SPEECH RECONSTRUCTION SYSTEM USING SPEECH UNIT NORMALIZATION
1 comment
Log in to post comments

Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric speech into normal-sounding speech. The technology eases communication with speakers affected by the neuromotor disorder and enhances their social inclusion.

unitDSR-icassp2024-wyj-final.pptx

ICASSP24-UNIT_DSR_PPT (352)

Categories:: Content-Based Audio Processing

88 Views

Adaptive speech emotion representation learning based on dynamic graph

Read more about Adaptive speech emotion representation learning based on dynamic graph
1 comment
Log in to post comments

Graph representation learning has become a hot research topic due to its powerful nonlinear fitting capability in extracting representative node embeddings. However, for sequential data such as speech signals, most traditional methods merely focus on the static graph created within a sequence, and largely overlook the intrinsic evolving patterns of these data. This may reduce the efficiency of graph representation learning for sequential data.

2024-ICASSP-Poster-高迎雪.pptx

2024-ICASSP-Poster-高迎雪.pptx (236)

Categories:: Content-Based Audio Processing

26 Views

Multi-Level Graph Learning For Audio Event Classification And Human-Perceived Annoyance Rating Prediction

WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-related monitoring, this paper proposes a graph-based model to identify AEs in a soundscape, and explore relations between diverse AEs and human-perceived annoyance rating (AR).

4320_hou_poster.pdf

4320_hou_poster.pdf (330)

Categories:: Content-Based Audio Processing
Multimedia Signal Processing

23 Views

GCT: GATED CONTEXTUAL TRANSFORMER FOR SEQUENTIAL AUDIO TAGGING

Read more about GCT: GATED CONTEXTUAL TRANSFORMER FOR SEQUENTIAL AUDIO TAGGING
Log in to post comments

2027_hou_poster.pdf

2027_hou_poster.pdf (263)

Categories:: Content-Based Audio Processing

15 Views

Self-supervised learning of audio representations using angular contrastive loss

Read more about Self-supervised learning of audio representations using angular contrastive loss
Log in to post comments

icassp2023_shanshan_wang_poster.pdf

Poster (268)

Categories:: Content-Based Audio Processing

23 Views

ICASSP 2022 L3DAS22 CHALLENGE: ENSEMBLE OF RESNET-CONFORMERS WITH AMBISONICS DATA AUGMENTATION FOR SOUND EVENT LOCALIZATION AND DETECTION

poster_paper9336_A0.pdf

poster_paper9336_A0.pdf (355)

Categories:: Spatial and Multichannel Audio
Content-Based Audio Processing

38 Views

Unsupervised Audio-Caption Aligning Learns Correspondences between Individual Sound Events and Textual Phrases

We investigate unsupervised learning of correspondences between sound events and textual phrases through aligning audio clips with textual captions describing the content of a whole audio clip. We align originally unaligned and unannotated audio clips and their captions by scoring the similarities between audio frames and words, as encoded by modality-specific encoders and using a ranking-loss criterion to optimize the model.

ICASSP2022_3634_final.pdf

Presentation slides (301)

Categories:: Content-Based Audio Processing

29 Views

Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data

Area under the ROC curve (AUC) optimisation techniques developed for neural networks have recently demonstrated their capabilities in different audio and speech related tasks. However, due to its intrinsic nature, AUC optimisation has focused only on binary tasks so far. In this paper, we introduce an extension to the AUC optimisation framework so that it can be easily applied to an arbitrary number of classes, aiming to overcome the issues derived from training data limitations in deep learning solutions.

poster_icassp22_final.pdf

poster_icassp22_final.pdf (693)

Categories:: Content-Based Audio Processing

13 Views

Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning

Most of existing audio fingerprinting systems have limitations to be used for high-specific audio retrieval at scale. In this work, we generate a low-dimensional representation from a short unit segment of audio, and couple this fingerprint with a fast maximum inner-product search. To this end, we present a contrastive learning framework that derives from the segment-level search objective. Each update in training uses a batch consisting of a set of pseudo labels, randomly selected original samples, and their augmented replicas.

icassp 2021 poster.pdf

icassp 2021 poster.pdf (584)

Categories:: Music Signal Processing
Content-Based Audio Processing
Audio Analysis and Synthesis

56 Views

ARTIFICIALLY SYNTHESISING DATA FOR AUDIO CLASSIFICATION AND SEGMENTATION TO IMPROVE SPEECH AND MUSIC DETECTION IN RADIO BROADCAST

Segmenting audio into homogeneous sections such as music and speech helps us understand the content of audio. It is useful as a pre-processing step to index, store, and modify audio recordings, radio broadcasts and TV programmes. Deep learning models for segmentation are generally trained on copyrighted material, which cannot be shared. Annotating these datasets is time-consuming and expensive and therefore, it significantly slows down research progress. In this study, we present a novel procedure that artificially synthesises data that resembles radio signals.

Venkatesh Slides and poster.pdf

Venkatesh Slides and poster.pdf (474)

Categories:: Content-Based Audio Processing
Audio Processing Systems

22 Views

Content-Based Audio Processing

Pages