Sorry, you need to enable JavaScript to visit this website.

Iterative Autoregressive Generation for Abstractive Summarization

Citation Author(s):
Jiaxin Duan, Fengyu Lu, Junfei Liu
Submitted by:
Jiaxin Duan
Last updated:
10 April 2024 - 12:17pm
Document Type:
Presentation Slides
Document Year:
2024
Event:
Presenters:
Jiaxin Duan
Paper Code:
5396
Categories:
Keywords:
 

Abstractive summarization suffers from exposure bias caused by the teacher-forced maximum likelihood estimation (MLE) learning, that an autoregressive language model predicts the next token distribution conditioned on the exact pre-context during training while on its own predictions at inference. Preceding resolutions for this problem straightforwardly augment the pure token-level MLE with summary-level objectives. Although this method effectively exposes a model to its prediction errors during summary-level learning, such errors accumulate in the unidirectional autoregressive generation and further limit the learning efficiency. To address this problem, we imitate the human behavior of revising a manuscript multiple times after writing it and introduce a novel iterative autoregressive summarization (IARSum) paradigm, which iteratively rewrites a generated summary to approximate an errorless version. Concretely, IARSum performs iterative revisions after summarization, where the output of the previous revision is taken as the input for the next, and a minimum-risk training strategy is used to ensure that the original summary is effectively polished in every revision round. We conduct extensive experiments on two widely used datasets and show the new or matched state-of-the-art performance of IARSum.

up
0 users have voted:

Comments

This paper introduces a revision-based multi-iteration method for the training of summarization models. Reviewers in general are positive about the idea of the paper and the results seem promising. There are some concerns regarding the suitability of the paper to ICASSP as the paper is an NLP paper. I see no issue with this as the paper was submitted to the “Speech and Language Processing” track.