Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

Citation Author(s):: Baolin Peng
Yun-Nung Chen

Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, Yun-Nung Chen, Kam-Fai Wong
Submitted by:: Yun-Nung Chen
Last updated:: 22 April 2018 - 12:00pm
Document Type:: Poster
Document Year:: 2018
Event:: ICASSP 2018
Presenters:: Yun-Nung Chen
Paper Code:: HLT-P2.4

Categories:: Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking domain show that the proposed Adversarial A2C can accelerate policy exploration efficiently.

poster_icassp2018.pdf

Poster for Advantage A2C Dialogue Policy Learning (857)

Thumbs Up

CITE

Documents

Poster

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

poster_icassp2018.pdf

QUESTIONS?