Documents
Tutorial
Generative Adversarial Network and its Applications to Speech Signal and Natural Language Processing
- Citation Author(s):
- Submitted by:
- HUNG-YI LEE
- Last updated:
- 16 April 2018 - 2:25am
- Document Type:
- Tutorial
- Document Year:
- 2018
- Event:
- Presenters:
- Hung-yi Lee, Yu Tsao
- Paper Code:
- T-7
- Categories:
- Log in to post comments
Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.
The tutorial includes two parts. The first part provides a thorough review of GAN. We will first introduce GAN to newcomers and describe why it is powerful in generating objects with sophisticated structures, for example, images, sentences, and speech. Then, we will introduce the approaches that aim to improve the training procedure and the variants of GAN beyond simply generating random objects. The second part of this tutorial will focus on the applications of GAN on speech and natural language. Although most techniques related to GAN are developed on image generation today, GAN can also generate speech. However, speech signals are temporal sequences which have very different nature from images. We will describe how to apply GAN on speech signal processing, including text-to-speech synthesis, voice conversion, speech enhancement, and domain adversarial training on speech-related tasks. The major challenge for applying GAN on natural language is its discrete nature (words are usually represented by one-hot encodings), which makes the original GAN fails. We will review a series of approaches dealing with this problem, and finally demonstrate the applications of GAN on chat-bot, abstractive summarization, and text style transformation.