Neural network learning (MLR-NNLR)

Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices. As singing voices represent a rich form of expression, a powerful technique to model them accurately is required. In the proposed technique, long-term dependencies of singing voices are modeled by CNNs.

ICASSP2020_slide_20200417b.pdf

ICASSP2020_slide_20200417b.pdf (393)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)
Neural network learning (MLR-NNLR)

118 Views

DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES

Read more about DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES
Log in to post comments

In this work, we present speech recognition systems for four Ethiopian languages: Amharic, Tigrigna, Oromo and Wolaytta. We have used comparable training corpora of about 20 to 29 hours speech and evaluation speech of about 1 hour for each of the languages. For Amharic and Tigrigna, lexical and language models of different vocabulary size have been developed. For Oromo and Wolaytta, the training lexicons have been used for decoding.

MarthaSolomonTanja.pdf

MarthaSolomonTanja.pdf (554)

Categories:: Human Spoken Language Acquisition, Development and Learning (SLP-LADL)
Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Neural network learning (MLR-NNLR)

150 Views

Expression Guided EEG Representation Learning for Emotion Recognition

Read more about Expression Guided EEG Representation Learning for Emotion Recognition
Log in to post comments

Learning a joint and coordinated representation between different modalities can improve multimodal emotion recognition. In this paper, we propose a deep representation learning approach for emotion recognition from electroencephalogram (EEG) signals guided by facial electromyogram (EMG) and electrooculogram (EOG) signals. We recorded EEG, EMG and EOG signals from 60 participants who watched 40 short videos and self-reported their emotions.

rayatdoost_ICASSP_17_04.pdf

rayatdoost_ICASSP_17_04.pdf (441)

Categories:: Neural network learning (MLR-NNLR)

60 Views

A NOVEL RANK SELECTION SCHEME IN TENSOR RING DECOMPOSITION BASED ON REINFORCEMENT LEARNING FOR DEEP NEURAL NETWORKS

Tensor decomposition has been proved to be effective for solving many problems in signal processing and machine learning. Recently, tensor decomposition finds its advantage for compressing deep neural networks. In many applications of deep neural networks, it is critical to reduce the number of parameters and computation workload to accelerate inference speed in deployment of the network. Modern deep neural network consists of multiple layers with multi-array weights where tensor decomposition is a natural way to perform compression.

ICASSP_slides_Cheng.pdf

ICASSP_slides_Cheng.pdf (375)

Categories:: Neural network learning (MLR-NNLR)

46 Views

DEEP LEARNING FOR ROBUST POWER CONTROL FOR WIRELESS NETWORKS

Read more about DEEP LEARNING FOR ROBUST POWER CONTROL FOR WIRELESS NETWORKS
Log in to post comments

ICASSP_Slides_Wei_Cui.pdf

ICASSP_Slides_Wei_Cui.pdf (309)

Categories:: Neural network learning (MLR-NNLR)

20 Views

TEMPORAL CODING IN SPIKING NEURAL NETWORKS WITH ALPHA SYNAPTIC FUNCTION

Read more about TEMPORAL CODING IN SPIKING NEURAL NETWORKS WITH ALPHA SYNAPTIC FUNCTION
Log in to post comments

We propose a spiking neural network model that encodes information in the relative timing of individual neuron spikes and performs classification using the first output neuron to spike. This temporal coding scheme allows the supervised training of the network with backpropagation, using locally exact derivatives of the postsynaptic with respect to presynaptic spike times. The network uses a biologically-inspired alpha synaptic transfer function and trainable synchronisation pulses as temporal references. We successfully train the network on the MNIST dataset encoded in time.

ICASSP2020_IuliaComsa.pdf

Presentation slides (612)

Categories:: Neural network learning (MLR-NNLR)

25 Views

TEMPORAL CODING IN SPIKING NEURAL NETWORKS WITH ALPHA SYNAPTIC FUNCTION

Read more about TEMPORAL CODING IN SPIKING NEURAL NETWORKS WITH ALPHA SYNAPTIC FUNCTION
Log in to post comments

ICASSP2020_IuliaComsa.pdf

Presentation slides (612)

Categories:: Neural network learning (MLR-NNLR)

11 Views

An ensemble Based Approach for Generalized Detection of Spoofing Attacks to Automatic Speaker Recognizers

As automatic speaker recognizer systems become mainstream, voice spoofing attacks are on the rise. Common attack strategies include replay, the use of text-to-speech synthesis, and voice conversion systems. While previously-proposed end-to-end detection frameworks have shown to be effective in spotting attacks for one particular spoofing strategy, they have relied on different models, architectures, and speech representations, depending on the spoofing strategy.

ICASSP_Spoofing.pdf

ICASSP_Spoofing.pdf (352)

Categories:: Neural network learning (MLR-NNLR)
Pattern recognition and classification (MLR-PATT)
Speaker Recognition and Characterization (SPE-SPKR)

34 Views

Motion Dynamics Improve Speaker-Independent Lipreading

Read more about Motion Dynamics Improve Speaker-Independent Lipreading
Log in to post comments

We present a novel lipreading system that improves on the task of speaker-independent word recognition by decoupling motion and content dynamics. We achieve this by implementing a deep learning architecture that uses two distinct pipelines to process motion and content and subsequently merges them, implementing an end-to-end trainable system that performs fusion of independently learned representations. We obtain a average relative word accuracy improvement of ≈6.8% on unseen speakers and of ≈3.3% on known speakers, with respect to a baseline which uses a standard architecture.

presentation.pdf

Presentation PDF slides (581)

Categories:: Resource constrained speech recognition (SPE-RCSR)
General Topics in Speech Recognition (SPE-GASR)
Neural network learning (MLR-NNLR)

48 Views

Deep Clustering of Compressed Variational Embeddings

Read more about Deep Clustering of Compressed Variational Embeddings
Log in to post comments

Motivated by the ever-increasing demands for limited communication bandwidth and low-power consumption, we propose a new methodology, named joint Variational Autoencoders with Bernoulli mixture models (VAB), for performing clustering in the compressed data domain. The idea is to reduce the data dimension by Variational Autoencoders (VAEs) and group data representations by Bernoulli mixture models (BMMs).

DCC_Deep_Clustering_of_Compressed_Variational_Embeddings_poster.pdf

DCC_Deep_Clustering_of_Compressed_Variational_Embeddings_poster.pdf (326)

Categories:: Neural network learning (MLR-NNLR)

27 Views

Neural network learning (MLR-NNLR)

Pages