Crowdsourced Pairwise-Comparison for Source Separation Evaluation

Automated objective methods of audio source separation evaluation are fast, cheap, and require little effort by the investigator. However, their output often correlates poorly with human quality assessments and typically require ground-truth (perfectly separated) signals to evaluate algorithm performance. Subjective multi-stimulus human ratings (e.g. MUSHRA) of audio quality are the gold standard for many tasks, but they are slow and require a great deal of effort to recruit participants and run listening tests. Recent work has shown that a crowdsourced multi-stimulus listening test can have results comparable to lab-based multi-stimulus tests. While these results are encouraging, MUSHRA multi-stimulus tests are limited to evaluating 12 or fewer stimuli, and they require ground-truth stimuli for reference. In this work, we evaluate a web-based pairwise-comparison listening approach that promises to speed and facilitate conducting listening tests, while also addressing some of the shortcomings of multi-stimulus tests. Using audio source separation quality as our evaluation task, we compare our web-based pairwise-comparison listening test to both web-based and lab-based multi-stimulus tests. We find that pairwise-comparison listening tests perform comparably to multi-stimulus tests, but without many of their shortcomings.

cartwright_caqe_icassp_2018_poster.pdf

cartwright_caqe_icassp_2018_poster.pdf (490)

Thumbs Up

CITE

Documents

Poster

Crowdsourced Pairwise-Comparison for Source Separation Evaluation

cartwright_caqe_icassp_2018_poster.pdf

QUESTIONS?