Automatic identification of speakers from head gestures in a narration

In this work, we focus on quantifying speaker identity information encoded in the head gestures of speakers, while they narrate a story. We hypothesize that the head gestures over a long duration have speaker-specific patterns. To establish this, we consider a classification problem to identify speakers from head gestures. We represent every head orientation as a triplet of Euler angles and a sequence of head orientations as head gestures. We use a database having recordings from 24 speakers where the head movements are recorded using a motion capture device, with each subject narrating ten stories. We get the best speaker identification accuracy of 0.836 using head gestures over a duration of 40 seconds. Further, the accuracy increases by combining decisions from multiple 40 second windows when a recording is available with duration more than the window length. We achieve an average accuracy of 0.9875 on our database when the entire recording is used. Analysis of the speaker identification performance over 40 second windows across a recording reveals that the speaker-identity information is more prevalent in some parts of a story than others.

ICASSP_2020_Presentation.pdf

ICASSP_2020_Presentation.pdf (474)

Thumbs Up

CITE

Documents

Presentation Slides

Automatic identification of speakers from head gestures in a narration

ICASSP_2020_Presentation.pdf

QUESTIONS?