Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Generative Adversarial Network and its Applications to Speech Signal and Natural Language Processing

Abstract: 

Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.

The tutorial includes two parts. The first part provides a thorough review of GAN. We will first introduce GAN to newcomers and describe why it is powerful in generating objects with sophisticated structures, for example, images, sentences, and speech. Then, we will introduce the approaches that aim to improve the training procedure and the variants of GAN beyond simply generating random objects. The second part of this tutorial will focus on the applications of GAN on speech and natural language. Although most techniques related to GAN are developed on image generation today, GAN can also generate speech. However, speech signals are temporal sequences which have very different nature from images. We will describe how to apply GAN on speech signal processing, including text-to-speech synthesis, voice conversion, speech enhancement, and domain adversarial training on speech-related tasks. The major challenge for applying GAN on natural language is its discrete nature (words are usually represented by one-hot encodings), which makes the original GAN fails. We will review a series of approaches dealing with this problem, and finally demonstrate the applications of GAN on chat-bot, abstractive summarization, and text style transformation.

up
1 user has voted: Peng Zhang

Paper Details

Authors:
Hung-yi Lee, Yu Tsao
Submitted On:
16 April 2018 - 2:25am
Short Link:
Type:
Tutorial
Event:
Presenter's Name:
Hung-yi Lee, Yu Tsao
Paper Code:
T-7
Document Year:
2018
Cite

Document Files

all ICASSP 2018 (v3).pdf

(971 downloads)

all ICASSP 2018 (v3).pptx

(385 downloads)

Subscribe

[1] Hung-yi Lee, Yu Tsao, "Generative Adversarial Network and its Applications to Speech Signal and Natural Language Processing", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2863. Accessed: Jul. 17, 2018.
@article{2863-18,
url = {http://sigport.org/2863},
author = {Hung-yi Lee; Yu Tsao },
publisher = {IEEE SigPort},
title = {Generative Adversarial Network and its Applications to Speech Signal and Natural Language Processing},
year = {2018} }
TY - EJOUR
T1 - Generative Adversarial Network and its Applications to Speech Signal and Natural Language Processing
AU - Hung-yi Lee; Yu Tsao
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2863
ER -
Hung-yi Lee, Yu Tsao. (2018). Generative Adversarial Network and its Applications to Speech Signal and Natural Language Processing. IEEE SigPort. http://sigport.org/2863
Hung-yi Lee, Yu Tsao, 2018. Generative Adversarial Network and its Applications to Speech Signal and Natural Language Processing. Available at: http://sigport.org/2863.
Hung-yi Lee, Yu Tsao. (2018). "Generative Adversarial Network and its Applications to Speech Signal and Natural Language Processing." Web.
1. Hung-yi Lee, Yu Tsao. Generative Adversarial Network and its Applications to Speech Signal and Natural Language Processing [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2863