Documents
Poster
Poster
OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018
- Citation Author(s):
- Submitted by:
- Ekaterina Egorova
- Last updated:
- 24 April 2018 - 10:23am
- Document Type:
- Poster
- Document Year:
- 2018
- Event:
- Presenters:
- Ekaterina Egorova
- Paper Code:
- 4076
- Categories:
- Log in to post comments
The paper presents a new approach to extracting useful information from out-of-vocabulary (OOV) speech regions in ASR system output. The system makes use of a hybrid decoding network with both words and sub-word units. In the decoded lattices, candidates for OOV regions are identified
as sub-graphs of sub-word units. To facilitate OOV word recovery, we search for recurring OOVs by clustering the detected candidate OOVs. The metrics for clustering is based on a comparison of the sub-graphs corresponding to the OOV candidates. The proposed method discovers repeating out-of-vocabulary words and finds their graphemic representation more robustly than more conventional techniques taking into account only one best sub-word string hypotheses.