ICASSP 2020

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2020 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION

The paper presents a Multi-Head Attention deep learning network for Speech Emotion Recognition (SER) using Log mel-Filter Bank Energies (LFBE) spectral features as the input. The multi-head attention along with the position embedding jointly attends to information from different representations of the same LFBE input sequence. The position embedding helps in attending to the dominant emotion features by identifying positions of the features in the sequence. In addition to Multi-Head Attention and position embedding, we apply multi-task learning with gender recognition as an auxiliary task.

ICASSP.pdf

ICASSP.pdf (624)

Categories:: Speech Processing

145 Views

Combining cGAN and MIL for Hotspot Segmentation in Bone Scintigraphy

Read more about Combining cGAN and MIL for Hotspot Segmentation in Bone Scintigraphy
Log in to post comments

Paper 2748 ICASSP 2020.pdf

Paper 2748 ICASSP 2020.pdf (399)

Categories:: Medical imaging

24 Views

Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning

In this paper, we propose a communication-efficient decentralized machine learning (ML) algorithm, coined quantized group ADMM (Q-GADMM). Every worker in Q-GADMM communicates only with two neighbors, and updates its model via the group alternating direct method of multiplier (GADMM), thereby ensuring fast convergence while reducing the number of communication rounds. Furthermore, each worker quantizes its model updates before transmissions, thereby decreasing the communication payload sizes.

icassp2020_final.pdf

icassp2020_final.pdf (434)

Categories:: Communications and Networking

23 Views

Optimizing Bayesian HMM Based x-vector Clustering for theSecond DIHARD Speech Diarization Challenge

ICASSP2020_DIHARD_BHMM_Slides.pdf

ICASSP2020_DIHARD_BHMM_Slides.pdf (357)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

11 Views

Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking

The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a script. We found that the phoneme recognition probabilities of exaggerated voice-overs decrease compared to ordinary utterances, but their rankings do not demonstrate any significant change.

ICASSP2020_YJEONG_SLIDES.pdf

ICASSP2020_YJEONG_SLIDES.pdf (341)

Categories:: General Topics in Speech Recognition (SPE-GASR)

24 Views

Detection of Malicious VBScript Using Static and Dynamic Analysis with Recurrent Deep Learning

VbsNetStokesIcassp.pdf

VbsNetStokesIcassp.pdf (371)

Categories:: Applications

35 Views

Privacy-Preserving Phishing Web Page Classification Via Fully Homomorphic Encryption

Read more about Privacy-Preserving Phishing Web Page Classification Via Fully Homomorphic Encryption
Log in to post comments

PrivacyPreserving_ICASSP.pdf

PrivacyPreserving_ICASSP.pdf (1355)

Categories:: Applications

74 Views

'TEXCEPTION: A Character/Word-Level Deep Learning Model for Phishing URL Detection

Read more about 'TEXCEPTION: A Character/Word-Level Deep Learning Model for Phishing URL Detection
Log in to post comments

texception_icassp_presentation.pdf

texception_icassp_presentation.pdf (888)

Categories:: Applications

179 Views

A Generalized Framework for Domain Adaptation of PLDA in Speaker Recognition

Read more about A Generalized Framework for Domain Adaptation of PLDA in Speaker Recognition
Log in to post comments

This paper proposes a generalized framework for domain adaptation of Probabilistic Linear Discriminant Analysis (PLDA) in speaker recognition. It not only includes several existing supervised and unsupervised domain adaptation methods but also makes possible more flexible usage of available data in different domains. In particular, we introduce here the two new techniques described below. (1) Correlation-alignment-based interpolation and (2) covariance regularization.

ICASSP2020_presentation_pdf.pdf

Presentation material (615)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

43 Views

Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices. As singing voices represent a rich form of expression, a powerful technique to model them accurately is required. In the proposed technique, long-term dependencies of singing voices are modeled by CNNs.

ICASSP2020_slide_20200417b.pdf

ICASSP2020_slide_20200417b.pdf (394)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)
Neural network learning (MLR-NNLR)

118 Views

Pages