Segment-oriented evaluation of speaker diarisation performance

Rosanna Milner, Thomas Hain
rosanna milner
23 March 2016 - 4:37am
Rosanna Milner
High performance diarisation is a necessity for a variety of applications, and the task has been
studied extensively in the context of broadcast news and meeting processing. Upon introduction of
the task in NIST led evaluations, diarisation error rate (DER) was introduced as the standard metric
for evaluation, and it has been consistently used to compare systems ever since. DER is a frame
based metric that does not penalise for producing many short segments. However, practical systems
that require diarisation input are typically not able to cope well with such artefacts.
%For example it was repeatedly shown that DER and ASR word error rate do not correlate well.
In this paper we
illustrate the need for an alternative metric focussing on segments, instead of duration or boundaries
only. We propose a segment based F-measure, which specifically addresses issues such as reference
errors, matching start and end boundaries, and speaker pairing. The performance of the metric is
analysed in the context of state-of-the-art systems and compared with other existing metrics. It is
shown to give a deeper insight into the segmentation quality over the standard metrics, and thus
better value for to understand impact on follow on tasks such as ASR.

