Sorry, you need to enable JavaScript to visit this website.

THE USTC-NERCSLIP SYSTEMS FOR THE ICMC-ASR CHALLENGE

DOI:
10.60864/c72c-zq13
Citation Author(s):
Submitted by:
YICHI WANG
Last updated:
6 June 2024 - 10:28am
Document Type:
Presentation Slides
 

This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position, respectively. For ASR, we employ an iterative pseudo-label generation method based on fusion model to obtain text labels of unsupervised data. To mitigate the impact of accent, an Accent-ASR framework is proposed, which captures pronunciation-related accent features at a fine-grained level and linguistic information at a coarse-grained level. On the ICMC-ASR eval set, the proposed system achieves a CER of 13.16% on track 1 and a cpCER of 21.48% on track 2, which significantly outperforms the official baseline system and obtains the first rank on both tracks.

up
0 users have voted: