Audio Analysis and Synthesis

Investigating the Effect of Sound-Event Loudness on Crowdsourced Audio Annotations

Read more about Investigating the Effect of Sound-Event Loudness on Crowdsourced Audio Annotations
Log in to post comments

Audio annotation is an important step in developing machine-listening systems. It is also a time consuming process, which has motivated investigators to crowdsource audio annotations. However, there are many factors that affect annotations, many of which have not been adequately investigated. In previous work, we investigated the effects of visualization aids and sound scene complexity on the quality of crowdsourced sound-event annotations.

cartwright_icassp_2018_poster.pdf

cartwright_icassp_2018_poster.pdf (629)

Categories:: Audio Analysis and Synthesis

11 Views

SAMPLERNN-BASED NEURAL VOCODER FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS

Read more about SAMPLERNN-BASED NEURAL VOCODER FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
Log in to post comments

This paper presents a SampleRNN-based neural vocoder for statistical parametric speech synthesis. This method utilizes a conditional SampleRNN model composed of a hierarchical structure of GRU layers and feed-forward layers to capture long-span dependencies between acoustic features and waveform sequences. Compared with conventional vocoders based on the source-filter model, our proposed vocoder is trained without assumptions derived from the prior knowledge of speech production and is able to provide a better modeling and recovery of phase information.

ICASSP2018_poster_aiyang.pdf

ICASSP2018_poster_aiyang.pdf (632)

Categories:: Audio Analysis and Synthesis

22 Views

SAMPLERNN-BASED NEURAL VOCODER FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS

Read more about SAMPLERNN-BASED NEURAL VOCODER FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
Log in to post comments

ICASSP2018_poster_aiyang.pdf

Poster (687)

Categories:: Audio Analysis and Synthesis

36 Views

REVISITING THE PROBLEM OF AUDIO-BASED HIT SONG PREDICTION USING CONVOLUTIONAL NEURAL NETWORKS

Being able to predict whether a song can be a hit has important applications in the music industry. Although it is true that the popularity of a song can be greatly affected by external factors such as social and commercial influences, to which degree audio features computed from musical signals (whom we regard as internal factors) can predict song popularity is an interesting research question on its own.

icassp2017.pdf

icassp2017.pdf (546)

Categories:: Audio Analysis and Synthesis

42 Views

Global Variance in Speech Synthesis with Linear Dynamical Models

Read more about Global Variance in Speech Synthesis with Linear Dynamical Models
Log in to post comments

GV_LDM.pdf

GV_LDM.pdf (598)

Categories:: Audio Analysis and Synthesis

8 Views

poster_STEGANALYSIS OF AAC USINGCALIBRATED MARKOV MODEL OF ADJACENT CODEBOOK

Read more about poster_STEGANALYSIS OF AAC USINGCALIBRATED MARKOV MODEL OF ADJACENT CODEBOOK
Log in to post comments

poster_STEGANALYSIS OF AAC USINGCALIBRATED MARKOV MODEL OF ADJACENT CODEBOOK.pdf

poster_STEGANALYSIS OF AAC USINGCALIBRATED MARKOV MODEL OF ADJACENT CODEBOOK.pdf (100)

Categories:: Audio Analysis and Synthesis

7 Views

poster_STEGANALYSIS OF AAC USINGCALIBRATED MARKOV MODEL OF ADJACENT CODEBOOK

Read more about poster_STEGANALYSIS OF AAC USINGCALIBRATED MARKOV MODEL OF ADJACENT CODEBOOK
Log in to post comments

poster_STEGANALYSIS OF AAC USINGCALIBRATED MARKOV MODEL OF ADJACENT CODEBOOK.pdf

poster_STEGANALYSIS OF AAC USINGCALIBRATED MARKOV MODEL OF ADJACENT CODEBOOK.pdf (99)

Categories:: Audio Analysis and Synthesis

3 Views

Lecture ICASSP 2016 Pierre Laffitte

Read more about Lecture ICASSP 2016 Pierre Laffitte
Log in to post comments

This presentation introduces a Deep Learning model that performs classification of the Audio Scene in the subway environment. The final goal is to detect Screams and Shouts for surveillance purposes. The model is a combination of Deep Belief Network and Deep Neural Network, (generatively pre-trained within the DBN framework and fine-tuned discriminatively within the DNN framework), and is trained on a novel database of pseudo-real signals collected in the Paris metro.

ICASSP Lecture.pdf

ICASSP Lecture.pdf (89)

Categories:: Audio Analysis and Synthesis
Pattern recognition and classification (MLR-PATT)
Neural network learning (MLR-NNLR)

13 Views

A Time Regularization Technique for Discrete Spectral Envelopes Through Frequency Derivative

Abstract—In most applications of sinusoidal models for speech
signal, an amplitude spectral envelope is necessary. This envelope
is not only assumed to fit the vocal tract filter response as
accurately as possible, but it should also exhibit slow varying
shapes across time. Indeed, time irregularities can generate
artifacts in signal manipulations or increase improperly the
features variance used in statistical models. In this letter, a
simple technique is suggested to improve this time regularity.

ICASSPMFAPoster.pdf

ICASSPMFAPoster.pdf (810)

Categories:: Audio Analysis and Synthesis

11 Views

Guided Signal Reconstruction with Application to Image Magnification

Read more about Guided Signal Reconstruction with Application to Image Magnification
Log in to post comments

We propose signal reconstruction algorithms which utilize a guiding subspace that represents desired properties of reconstructed signals. Optimal reconstructed signals are shown to belong to a convex bounded set, called the ``reconstruction'' set. Iterative reconstruction algorithms, based on conjugate gradient methods, are developed to approximate optimal reconstructions with low memory and computational costs. Effectiveness of the proposed method is demonstrated with an application to image magnification.

globalsip-15-slides-v2.pdf

globalsip-15-slides-v2.pdf (976)

Categories:: Sampling and Reconstruction
Image/Video Processing
Speech Enhancement (SPE-ENHA)
Audio Analysis and Synthesis
Emerging: Big Data

12 Views

Audio Analysis and Synthesis

Pages