Sorry, you need to enable JavaScript to visit this website.

FOLLOWING THE EMBEDDING: IDENTIFYING TRANSITION PHENOMENA IN WAV2VEC 2.0 REPRESENTATIONS OF SPEECH AUDIO

DOI:
10.60864/d1tq-4h20
Citation Author(s):
Patrick Cormac English, Erfan A. Shams, John D. Kelleher, Julie Carson-Berndsen
Submitted by:
Erfan Amirzadeh...
Last updated:
4 August 2024 - 8:31am
Document Type:
Poster
Document Year:
2024
Event:
Presenters:
Patrick Cormack English, Erfan A. Shams
Paper Code:
MLSP-P12.3
 

Although transformer-based models have improved the state-of-the-art in speech recognition, it is still not well understood what information from the speech signal these models encode in their latent representations. This study investigates the potential of using labelled data (TIMIT) to probe wav2vec 2.0 embeddings for insights into the encoding and visualisation of speech signal information at phone boundaries. Our experiment involves training probing models to detect phone-specific articulatory features in the hidden layers based on IPA classifications. Furthermore, we propose an analysis framework for visualising the probabilities of the detected articulatory features in every layer and frame vector. Our primary focus is to probe and better understand the structure of speech signal information in the embeddings learned by unsupervised transformers, with a view to contributing to more explainable speech processing systems.

up
0 users have voted: