Audio and Acoustic Signal Processing

Hierarchy-aware Loss Function on a Tree Structured Label Space for Audio Event Detection

The paper introduces a hierarchy-aware loss function in a Deep Neural Network for an audio event detection task that has a bi-level tree structured label space. The goal is not only to improve audio event detection performance at all levels in the label hierarchy, but also to produce better audio embeddings. We exploit the label tree structure to preserve that information in the hierarchy-aware loss function. Two different loss functions are separately employed. First, a triplet loss with probabilistic multi-level batch mining is introduced.

icassp_arindam_hierarchy.pptx

icassp_arindam_hierarchy.pptx (443)

Categories:: Audio and Acoustic Signal Processing
Neural network learning (MLR-NNLR)

62 Views

COMPUTATIONAL COGNITIVE ASSESSMENT: INVESTIGATING THE USE OF AN INTELLIGENT VIRTUAL AGENT FOR THE DETECTION OF EARLY SIGNS OF DEMENTIA

The ageing population has caused a marked increased in the number of people with cognitive decline linked with dementia. Thus, current diagnostic services are overstretched, and there is an urgent need for automating parts of the assessment process. In previous work, we demonstrated how a stratification tool built around an Intelligent Virtual Agent (IVA) eliciting a conversation by asking memory-probing questions, was able to accurately distinguish between people with a neuro-degenerative disorder (ND) and a functional memory disorder (FMD).

BM_Icassp2019_Poster.pdf

BM_Icassp2019_Poster.pdf (379)

Categories:: Audio and Acoustic Signal Processing

35 Views

Overlap-Add Windows with Maximum Energy Concentration for Speech and Audio Processing

Read more about Overlap-Add Windows with Maximum Energy Concentration for Speech and Audio Processing
Log in to post comments

Processing of speech and audio signals with time-frequency representations require windowing methods which allow perfect reconstruction of the original signal and where processing artifacts have a predictable behavior. The most common approach for this purpose is overlap-add windowing, where signal segments are windowed before and after processing. Commonly used windows include the half-sine and a Kaiser-Bessel derived window. The latter is an approximation of the discrete prolate spherical sequence, and thus a maximum energy concentration window, adapted for overlap-add.

ola_presentation_resize.pdf

ola_presentation_resize.pdf (449)

Categories:: Audio and Acoustic Signal Processing

10 Views

FINE-TUNING APPROACH TO NIR FACE RECOGNITION

Read more about FINE-TUNING APPROACH TO NIR FACE RECOGNITION
Log in to post comments

Despite extensive researches for face recognition (FR), it is still difficult to apply deep CNN models to NIR FR due to a lack of training data. In this study, we propose a fine-tuning approach to allow deep CNN models to be applied to NIR FR with small training datasets. In the proposed approach, parameters of deep CNN models for RGB FR are utilized as initial parameters to train deep CNN models for NIR FR. The proposed approach has two main advantages: 1) High NIR FR performances can be achieved with very small public training datasets.

Fine-tuning approach to NIR face recognition_ICASSP_2019_jykim_pdf_ver.pdf

Fine-tuning approach to NIR face recognition_ICASSP_2019_jykim_pdf_ver.pdf (424)

Fine-tuning approach to NIR face recognition_ICASSP_2019_jykim_horizontal_pdf_ver.pdf

Fine-tuning approach to NIR face recognition_ICASSP_2019_jykim_horizontal_pdf_ver.pdf (423)

Categories:: Audio and Acoustic Signal Processing

44 Views

DIFFERENTIALLY PRIVATE GREEDY DECISION FOREST

Read more about DIFFERENTIALLY PRIVATE GREEDY DECISION FOREST
Log in to post comments

2019-ICASSP-XinBangzhou-Paper#2940-poster.pdf

2019-ICASSP-XinBangzhou-Paper#2940-poster.pdf (323)

Categories:: Audio and Acoustic Signal Processing

6 Views

COMPACT CONVOLUTIONAL RECURRENT NEURAL NETWORKS VIA BINARIZATION FOR SPEECH EMOTION RECOGNITION

Despite the great advances, most of the recently developed automatic speech recognition systems focus on working in a server-client manner, and thus often require a high computational cost, such as the storage size and memory accesses. This, however, does not satisfy the increasing demand for a succinct model that can run smoothly in embedded devices like smartphones.

ICASSP19005.pdf

ICASSP19005.pdf (442)

Categories:: Audio and Acoustic Signal Processing

27 Views

ICASSP 2019 Poster (TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION)

ICASSP2019_Poster_TharshiniGunendradasan.pptx

ICASSP2019_Poster_TharshiniGunendradasan.pptx (454)

Categories:: Audio and Acoustic Signal Processing
Design and Implementation of Signal Processing Systems

11 Views

DISTRIBUTED TRACKING OF MANEUVERING TARGET: A FINITE-TIME ALGORITHM

Read more about DISTRIBUTED TRACKING OF MANEUVERING TARGET: A FINITE-TIME ALGORITHM
Log in to post comments

George_ICASSPpaper3664_.pdf

poster (404)

Categories:: Audio and Acoustic Signal Processing

14 Views

Inter- and Intra- Patient ECG Heartbeat Classification For Arrhythmia Detection: a Sequence to Sequence Deep Learning Approach

Electrocardiogram (ECG) signal is a common and powerful tool to study heart function and diagnose several abnormal arrhythmias. While there have been remarkable improvements in cardiac arrhythmia classification methods, they still cannot offer acceptable performance in detecting different heart conditions, especially when dealing with imbalanced datasets. In this paper, we propose a solution to address this limitation of current classification approaches by developing an automatic heartbeat classification method using deep convolutional neural networks and sequence to sequence models.