Documents
Poster
AUGSUMM: TOWARDS GENERALIZABLE SPEECH SUMMARIZATION USING SYNTHETIC LABELS FROM LARGE LANGUAGE MODELS
- DOI:
- 10.60864/pesc-t089
- Citation Author(s):
- Submitted by:
- Jee-weon Jung
- Last updated:
- 6 June 2024 - 10:22am
- Document Type:
- Poster
- Document Year:
- 2024
- Categories:
- Log in to post comments
Abstractive speech summarization (SSUM) aims to generate humanlike summaries from speech. Given variations in information captured
and phrasing, recordings can be summarized in multiple ways. Therefore, it is more reasonable to consider a probabilistic distribution
of all potential summaries rather than a single summary. However, conventional SSUM models are mostly trained and evaluated
with a single ground-truth (GT) human-annotated deterministic summary for every recording. Generating multiple human references
would be ideal to better represent the distribution statistically, but impractical because annotation is expensive. We tackle this challenge
by proposing AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented
summaries for training and evaluation. First, we explore prompting strategies to generate synthetic summaries from Chat-
GPT. We validate the quality of synthetic summaries using multiple metrics including human evaluation, where we find that summaries
generated using AugSumm are perceived more valid to humans. Second, we develop methods to utilize synthetic summaries
in training and evaluation. Experiments on How2 demonstrate that pre-training on synthetic summaries and fine-tuning on GT summaries
improves ROUGE-L by 1 point on both GT and AugSummbased test sets. AugSumm summaries are available at https://github.com/Jungjee/AugSumm.