ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2020 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.
- Read more about Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions
- Log in to post comments
Degradation due to additive noise is a significant road block in the real-life deployment of Speech Emotion Recognition (SER) systems. Most of the previous work in this field dealt with the noise degradation either at the signal or at the feature level. In this paper, to address the robustness aspect of the SER in additive noise scenarios, we propose multi-conditioning and data augmentation using an utterance level parametric generative noise model. The generative noise model is designed to generate noise types which can span the entire noise space in the mel-filterbank energy domain.
- Categories:
- Read more about FULLY-NEURAL APPROACH TO HEAVY VEHICLE DETECTION ON BRIDGES USING A SINGLE STRAIN SENSOR
- Log in to post comments
Bridge weigh-in-motion (BWIM) is a technique for detecting heavy vehicles that may cause serious damage to real bridges. BWIM is realized by analyzing the strain signals observed at places on the bridge in terms of bridge-component responses to the axle loads. In current practice, a BWIM system requires multiple strain sensors to collect vehicle properties including speed and axle positions for accurate load estimation, which may limit the system’s life-span.
icassp20s.pdf
- Categories:
- Read more about Conditional Density Driven Grid Design in Point-Mass Filter
- 1 comment
- Log in to post comments
- Categories:
- Read more about A REAL-TIME DEEP NETWORK FOR CROWD COUNTING
- Log in to post comments
Automatic analysis of highly crowded people has attracted extensive attention from computer vision research. Previous approaches for crowd counting have already achieved promising performance across various benchmarks. However, to deal with the real situation, we hope the model run as fast as possible while keeping accuracy. In this paper, we propose a compact convolutional neural network for crowd counting which learns a more efficient model with a small number of parameters.
- Categories:
- Read more about View-angle Invariant Object Monitoring Without Image Registration
- Log in to post comments
Object monitoring can be performed by change detection algorithms. However, for the image pair with a large perspective difference, the change detection performance is usually impacted by inaccurate image registration. To address the above difficulties, a novel object-specific change detection approach is proposed for object monitoring in this paper. In contrast to traditional approaches, the proposed approach is robust to view angle variation and does not require explicit image registration. Experiments demonstrate the effectiveness and advantages of the proposed approach.
- Categories:
- Read more about IMPROVING THE PERFORMANCE OF TRANSFORMER BASED LOW RESOURCE SPEECH RECOGNITION FOR INDIAN LANGUAGES
- Log in to post comments
The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e.,(i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors.
shetty.pdf
- Categories:
- Categories:
- Read more about DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES
- Log in to post comments
In this work, we present speech recognition systems for four Ethiopian languages: Amharic, Tigrigna, Oromo and Wolaytta. We have used comparable training corpora of about 20 to 29 hours speech and evaluation speech of about 1 hour for each of the languages. For Amharic and Tigrigna, lexical and language models of different vocabulary size have been developed. For Oromo and Wolaytta, the training lexicons have been used for decoding.
- Categories:
- Read more about Effects of Spectral Tilt on Listeners' Preferences And Intelligibility
- Log in to post comments
- Categories: