Audio and Acoustic Signal Processing

Recurrent Neural Networks with Stochastic Layers for Acoustic Novelty Detection

Read more about Recurrent Neural Networks with Stochastic Layers for Acoustic Novelty Detection
Log in to post comments

In this paper, we adapt Recurrent Neural Networks with Stochastic Layers, which are the state-of-the-art for generating text, music and speech, to the problem of acoustic novelty detection. By integrating uncertainty into the hidden states, this type of network is able to learn the distribution of complex sequences. Because the learned distribution can be calculated explicitly in terms of probability, we can evaluate how likely an observation is then detect low-probability events as novel.

ICASSP2019_3073.pdf

ICASSP2019_3073.pdf (394)

Categories:: Audio and Acoustic Signal Processing
Neural network learning (MLR-NNLR)
Pattern recognition and classification (MLR-PATT)

2 Views

Audio Caption: Listen and Tell

Read more about Audio Caption: Listen and Tell
Log in to post comments

ICASSP POSTER.pdf

ICASSP POSTER.pdf (198)

Categories:: Audio and Acoustic Signal Processing

27 Views

POLYPHONIC SOUND EVENT DETECTION USING CONVOLUTIONAL BIDIRECTIONAL LSTM AND SYNTHETIC DATA-BASED TRANSFER LEARNING

ICASSP_Poster_인쇄본.pdf

ICASSP_Poster_인쇄본.pdf (328)

Categories:: Audio and Acoustic Signal Processing

28 Views

SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS  IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS 

Audio-signal acquisition as part of wearable sensing adds an important dimension for applications such as understanding human behaviors. As part of a large study on work place behaviors, we collected audio data from individual hospital staff using custom wearable recorders. The audio features collected were limited to preserve privacy of the interactions in the hospital. A first step towards audio processing is to identify the foreground speech of the person wearing the audio badge.

ICASSP 2019 poster 34*26in new.pdf

SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS  IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS  (528)

Categories:: Audio and Acoustic Signal Processing
Speech Processing

28 Views

EMBEDDING PHYSICAL AUGMENTATION AND WAVELET SCATTERING TRANSFORM TO GENERATIVE ADVERSARIAL NETWORKS FOR AUDIO CLASSIFICATION WITH LIMITED TRAINING RESOURCES

This paper addresses audio classification with limited training resources. We first investigate different types of data augmentation including physical modeling, wavelet scattering transform and Generative Adversarial Networks (GAN). We than propose a novel GAN which allows embedding of physical augmentation and wavelet scattering transform in processing. The experimental results on Google Speech Command show significant improvements of the proposed method when training with limited resources.

WST_GAN(revised_d).pdf

Audio Classification, Limited Training, Augmentation, Generative Adversarial Networks (503)

Categories:: Audio and Acoustic Signal Processing

86 Views

MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION

Read more about MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION
Log in to post comments

Poster_for_multiband_PIT.pdf

Poster_for_multiband_PIT.pdf (445)

Categories:: Audio and Acoustic Signal Processing
Speech Processing

21 Views

Point Cloud Segmentation using Hierarchical Tree for Architectural Models.

Read more about Point Cloud Segmentation using Hierarchical Tree for Architectural Models.
Log in to post comments

Over the past few years, gathering massive volume of 3D data has become straightforward due to the proliferation of laser scanners and acquisition devices. Segmentation of such large data into meaningful segments, however, remains a challenge. Raw scans usually have missing data and varying density. In this work, we present a simple yet effective method to semantically decompose and reconstruct 3D models from point clouds. Using a hierarchical tree approach, we segment and reconstruct planar as well as non-planar scenes in an outdoor environment.

ICASSP_POSTER.pdf

ICASSP_POSTER.pdf (472)

Categories:: Audio and Acoustic Signal Processing

87 Views

Dialogue State Tracking with Convolutional Semantic Taggers

Read more about Dialogue State Tracking with Convolutional Semantic Taggers
Log in to post comments

In this paper, we present our novel approach to the 6th Dialogue State Tracking Challenge (DSTC6) track for end-to-end goal-oriented dialogue, in which the goal is to select the best system response from among a list of candidates in a restaurant booking conversation. Our model uses a convolutional neural network (CNN) for semantic tagging of each utterance in the dialogue history to update the dialogue state, and another CNN for predicting the best system action template.

icassp19.pdf

icassp19.pdf (496)

Categories:: Audio and Acoustic Signal Processing

22 Views

INCREMENTAL TRANSFER LEARNING IN TWO-PASS INFORMATION BOTTLENECK BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS

The two-pass information bottleneck (TPIB) based speaker diarization system operates independently on different conversational recordings. TPIB system does not consider previously learned speaker discriminative information while diarizing new conversations. Hence, the real time factor (RTF) of TPIB system is high owing to the training time required for the artificial neural network (ANN).