ICASSP 2018

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Novel realizations of speech-driven head movements with generative adversarial networks

Head movement is an integral part of face-to-face communications. It is important to investigate methodologies to generate naturalistic movements for conversational agents (CAs). The predominant method for head movement generation is using rules based on the meaning of the message. However, the variations of head movements by these methods are bounded by the predefined dictionary of gestures. Speech-driven methods offer an alternative approach, learning the relationship between speech and head movements from real recordings.

Sadoughi_2018-poster.pdf

Sadoughi_2018-poster.pdf (493)

Categories:: Spoken Language Understanding (SLP-UNDE)

32 Views

Study of dense network approaches for speech emotion recognition

Read more about Study of dense network approaches for speech emotion recognition
Log in to post comments

Deep neural networks have been proven to be very effective in various classification problems and show great promise for emotion recognition from speech. Studies have proposed various architectures that further improve the performance of emotion recognition systems. However, there are still various open questions regarding the best approach to building a speech emotion recognition system. Would the system’s performance improve if we have more labeled data? How much do we benefit from data augmentation? What activation and regularization schemes are more beneficial?

Abdelwahab_2018-poster.pdf

Abdelwahab_2018-poster.pdf (494)

Categories:: Speech Analysis (SPE-ANLS)

21 Views

Reduced Dimension Minimum BER PSK Precoding for Constrained Transmit Signals in Massive MIMO

icassp 2018 compact.pdf

icassp 2018 compact.pdf (749)

Categories:: MIMO Communications and Signal Processing

56 Views

Sufficiency quantification for seamless text-independent speaker enrollment

Read more about Sufficiency quantification for seamless text-independent speaker enrollment
Log in to post comments

Text-independent speaker recognition (TI-SR) requires a lengthy enrollment process that involves asking dedicated time from the user to create a reliable model of their voice. Seamless enrollment is a highly attractive feature which refers to the enrollment process that happens in the background and asks for no dedicated time from the user. One of the key problems in a fully automated seamless enrollment process is to determine the sufficiency of a given utterance collection for the purpose of TI-SR. No known metric exists in the literature to quantify sufficiency.

ICASSP2018_poster_Cilingir.pdf

Poster presented at ICASSP 2018 (760)

SufficiencyMetric_ICASP2018_Cilingir.pdf

Paper for ICASSP 2018 (791)

Categories:: Audio and Acoustic Signal Processing
Speaker Recognition and Characterization (SPE-SPKR)

20 Views

HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK

poster.pdf

poster.pdf (793)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

32 Views

Self-paced mixture of t distribution model

Read more about Self-paced mixture of t distribution model
Log in to post comments

Gaussian mixture model (GMM) is a powerful probabilistic model for representing the probability distribution of observations in the population. However, the fitness of Gaussian mixture model can be significantly degraded when the data contain a certain amount of outliers. Although there are certain variants of GMM (e.g., mixture of Laplace, mixture of t distribution) attempting to handle outliers, none of them can sufficiently mitigate the effect of outliers if the outliers are far from the centroids.

icassp-landscape.pdf

icassp-landscape.pdf (678)

Categories:: Other applications of machine learning (MLR-APPL)

45 Views

Unsupervised Learning of Semantic Audio Representations

Read more about Unsupervised Learning of Semantic Audio Representations
Log in to post comments

ICASSP18_ Unsupervised Learning of Semantic Audio Representations.pdf

ICASSP18_ Unsupervised Learning of Semantic Audio Representations.pdf (843)

Categories:: Audio and Acoustic Signal Processing

27 Views

Low Complexity Joint RDO of Prediction Units Couples for HEVC Intra Coding

Read more about Low Complexity Joint RDO of Prediction Units Couples for HEVC Intra Coding
Log in to post comments

HEVC is the latest block-based video compression standard, outperforming H.264/AVC by 50% bitrate savings for the same perceptual quality. An HEVC encoder provides Rate-Distortion optimization coding tools for block-wise compression. Because of complexity limitations, Rate-Distortion Optimization (RDO) is usually performed independently for each block, assuming coding efficiency losses to be negligible.

mbichon_ICASSP18_poster_portrait.pdf

mbichon_ICASSP18_poster_portrait.pdf (917)

Categories:: Image/Video Coding

23 Views

SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION

Read more about SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION
Log in to post comments

In this paper, we present an algorithm called Reliable Mask Selection-Phase Difference Channel Weighting (RMS-PDCW) which selects the target source masked by a noise source using the Angle of Arrival (AoA) information calculated using the phase difference information. The RMS-PDCW algorithm selects masks to apply using the information about the localized sound source and the onset detection of speech.

icassp_4465_poster.pdf

icassp_4465_poster.pdf (699)

Categories:: Robust Speech Recognition (SPE-ROBU)
Source Separation and Signal Enhancement

18 Views

SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION

In this paper, we present an algorithm which introduces phase-perturbation to the training database when training phase-sensitive deep neural-network models. Traditional features such as log-mel or cepstral features do not have have any phase-relevant information.However features such as raw-waveform or complex spectra features contain phase-relevant information. Phase-sensitive features have the advantage of being able to detect differences in time of

icassp_4404_poster.pdf

icassp_4404_poster.pdf (711)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Robust Speech Recognition (SPE-ROBU)

24 Views

Pages