Sorry, you need to enable JavaScript to visit this website.

Audio-signal acquisition as part of wearable sensing adds an important dimension for applications such as understanding human behaviors. As part of a large study on work place behaviors, we collected audio data from individual hospital staff using custom wearable recorders. The audio features collected were limited to preserve privacy of the interactions in the hospital. A first step towards audio processing is to identify the foreground speech of the person wearing the audio badge.

Categories:
23 Views

This paper addresses audio classification with limited training resources. We first investigate different types of data augmentation including physical modeling, wavelet scattering transform and Generative Adversarial Networks (GAN). We than propose a novel GAN which allows embedding of physical augmentation and wavelet scattering transform in processing. The experimental results on Google Speech Command show significant improvements of the proposed method when training with limited resources.

Categories:
83 Views

Over the past few years, gathering massive volume of 3D data has become straightforward due to the proliferation of laser scanners and acquisition devices. Segmentation of such large data into meaningful segments, however, remains a challenge. Raw scans usually have missing data and varying density. In this work, we present a simple yet effective method to semantically decompose and reconstruct 3D models from point clouds. Using a hierarchical tree approach, we segment and reconstruct planar as well as non-planar scenes in an outdoor environment.

Categories:
54 Views

In this paper, we present our novel approach to the 6th Dialogue State Tracking Challenge (DSTC6) track for end-to-end goal-oriented dialogue, in which the goal is to select the best system response from among a list of candidates in a restaurant booking conversation. Our model uses a convolutional neural network (CNN) for semantic tagging of each utterance in the dialogue history to update the dialogue state, and another CNN for predicting the best system action template.

Categories:
19 Views

The two-pass information bottleneck (TPIB) based speaker diarization system operates independently on different conversational recordings. TPIB system does not consider previously learned speaker discriminative information while diarizing new conversations. Hence, the real time factor (RTF) of TPIB system is high owing to the training time required for the artificial neural network (ANN).

Categories:
8 Views

The paper introduces a hierarchy-aware loss function in a Deep Neural Network for an audio event detection task that has a bi-level tree structured label space. The goal is not only to improve audio event detection performance at all levels in the label hierarchy, but also to produce better audio embeddings. We exploit the label tree structure to preserve that information in the hierarchy-aware loss function. Two different loss functions are separately employed. First, a triplet loss with probabilistic multi-level batch mining is introduced.

Categories:
60 Views

The ageing population has caused a marked increased in the number of people with cognitive decline linked with dementia. Thus, current diagnostic services are overstretched, and there is an urgent need for automating parts of the assessment process. In previous work, we demonstrated how a stratification tool built around an Intelligent Virtual Agent (IVA) eliciting a conversation by asking memory-probing questions, was able to accurately distinguish between people with a neuro-degenerative disorder (ND) and a functional memory disorder (FMD).

Categories:
27 Views

Processing of speech and audio signals with time-frequency representations require windowing methods which allow perfect reconstruction of the original signal and where processing artifacts have a predictable behavior. The most common approach for this purpose is overlap-add windowing, where signal segments are windowed before and after processing. Commonly used windows include the half-sine and a Kaiser-Bessel derived window. The latter is an approximation of the discrete prolate spherical sequence, and thus a maximum energy concentration window, adapted for overlap-add.

Categories:
9 Views

Pages