Documents
Poster
FOLLOWING THE EMBEDDING: IDENTIFYING TRANSITION PHENOMENA IN WAV2VEC 2.0 REPRESENTATIONS OF SPEECH AUDIO
- DOI:
- 10.60864/d1tq-4h20
- Citation Author(s):
- Submitted by:
- Erfan Amirzadeh...
- Last updated:
- 4 August 2024 - 8:31am
- Document Type:
- Poster
- Document Year:
- 2024
- Event:
- Presenters:
- Patrick Cormack English, Erfan A. Shams
- Paper Code:
- MLSP-P12.3
- Categories:
- Keywords:
- Log in to post comments
Although transformer-based models have improved the state-of-the-art in speech recognition, it is still not well understood what information from the speech signal these models encode in their latent representations. This study investigates the potential of using labelled data (TIMIT) to probe wav2vec 2.0 embeddings for insights into the encoding and visualisation of speech signal information at phone boundaries. Our experiment involves training probing models to detect phone-specific articulatory features in the hidden layers based on IPA classifications. Furthermore, we propose an analysis framework for visualising the probabilities of the detected articulatory features in every layer and frame vector. Our primary focus is to probe and better understand the structure of speech signal information in the embeddings learned by unsupervised transformers, with a view to contributing to more explainable speech processing systems.