Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

A DEEP NEURAL NETWORK BASED END TO END MODEL FOR JOINT HEIGHT AND AGE ESTIMATION FROM SHORT DURATION SPEECH

Abstract: 

Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.

up
0 users have voted:

Paper Details

Authors:
Shareef Babu Kalluri, Deepu Vijayasenan, Sriram Ganapathy
Submitted On:
8 May 2019 - 1:55am
Short Link:
Type:
Poster
Event:
Paper Code:
3201
Document Year:
2019
Cite

Document Files

ICASSP poster

(213)

Subscribe

[1] Shareef Babu Kalluri, Deepu Vijayasenan, Sriram Ganapathy, "A DEEP NEURAL NETWORK BASED END TO END MODEL FOR JOINT HEIGHT AND AGE ESTIMATION FROM SHORT DURATION SPEECH", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4014. Accessed: Oct. 14, 2019.
@article{4014-19,
url = {http://sigport.org/4014},
author = {Shareef Babu Kalluri; Deepu Vijayasenan; Sriram Ganapathy },
publisher = {IEEE SigPort},
title = {A DEEP NEURAL NETWORK BASED END TO END MODEL FOR JOINT HEIGHT AND AGE ESTIMATION FROM SHORT DURATION SPEECH},
year = {2019} }
TY - EJOUR
T1 - A DEEP NEURAL NETWORK BASED END TO END MODEL FOR JOINT HEIGHT AND AGE ESTIMATION FROM SHORT DURATION SPEECH
AU - Shareef Babu Kalluri; Deepu Vijayasenan; Sriram Ganapathy
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4014
ER -
Shareef Babu Kalluri, Deepu Vijayasenan, Sriram Ganapathy. (2019). A DEEP NEURAL NETWORK BASED END TO END MODEL FOR JOINT HEIGHT AND AGE ESTIMATION FROM SHORT DURATION SPEECH. IEEE SigPort. http://sigport.org/4014
Shareef Babu Kalluri, Deepu Vijayasenan, Sriram Ganapathy, 2019. A DEEP NEURAL NETWORK BASED END TO END MODEL FOR JOINT HEIGHT AND AGE ESTIMATION FROM SHORT DURATION SPEECH. Available at: http://sigport.org/4014.
Shareef Babu Kalluri, Deepu Vijayasenan, Sriram Ganapathy. (2019). "A DEEP NEURAL NETWORK BASED END TO END MODEL FOR JOINT HEIGHT AND AGE ESTIMATION FROM SHORT DURATION SPEECH." Web.
1. Shareef Babu Kalluri, Deepu Vijayasenan, Sriram Ganapathy. A DEEP NEURAL NETWORK BASED END TO END MODEL FOR JOINT HEIGHT AND AGE ESTIMATION FROM SHORT DURATION SPEECH [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4014