Sorry, you need to enable JavaScript to visit this website.

Are Soft prompts good zero-shot learners for speech recognition?

DOI:
10.60864/enm4-a246
Citation Author(s):
Dianwen Ng, Chong Zhang, Ruixi Zhang, Yukun Ma, Fabian Ritter-Gutierrez, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma
Submitted by:
Shengkui Zhao
Last updated:
6 June 2024 - 10:27am
Document Type:
Presentation Slides
Document Year:
2024
Event:
Presenters:
Shengkui Zhao
Paper Code:
SLP-L14.1
 

Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR). Our findings highlight their role as zero-shot learners in improving ASR performance while also exposing them to the risk of malicious modifications. Soft prompts aid generalization but are not obligatory for inference. We also identify two primary roles of soft prompts: content refinement and noise information enhancement, which enhances robustness against background noise. Additionally, we propose an effective modification on noise prompts to show that they are capable of zero-shot learning on adapting to out-of-distribution noise environments.

up
0 users have voted: