Sorry, you need to enable JavaScript to visit this website.

Investigation of the Effects of Automatic Scoring Technology on Human Raters' Performances in L2 Speech Proficiency Assessment

Citation Author(s):
Dean Luo, Wentao Gu, Ruxin Luo, Lixin Wang
Submitted by:
Dean Luo
Last updated:
14 October 2016 - 12:37pm
Document Type:
Presentation Slides
Document Year:
Dean Luo
Paper Code:


This study investigates how automatic scorings based on speech technology can affect human raters' judgement of students' oral language proficiency in L2 speaking tests. Automatic scorings based on ASR are widely used in non-critical speaking tests or practices and relatively high correlations between machine scores and human scores have been reported. In high-stakes speaking tests, however, many teachers remain skeptical about the fairness of automatic scores given by machines even with the most advanced scoring methods. In this paper, we first investigate ASR-based scorings on students’ recordings of real tests. We then propose a radar chart based scoring method to assist human raters and analyze the effects of automatic scores on human raters’ performances. Instead of providing an overall machine score for each utterance or speaker, we provide 10 scores presented as a radar chart to represent different aspects of phonemic and prosodic level proficiency, and leave the final judgment to human raters. Experimental results show that automatic scores can significantly affects human raters’ judgement. With sufficient training samples, the scores given by non-experts can be comparable to experts’ ratings in reliability.

0 users have voted: