Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition

Abstract: 

DNNs play a major role in the state-of-the-art ASR systems. They can be used for extracting features and building probabilistic models for acoustic and language modelling. Despite their huge practical success, the level of theoretical understanding has remained shallow. This paper investigates DNNs from a statistical standpoint. In particular, the effect of activation functions on the distribution of the pre-activations and activations is investigated and discussed from both analytic and empirical viewpoints. This study, among others, shows that the pre-activation density in the bottleneck layer can be well fitted with a diagonal GMM with a few Gaussians and how and why the ReLU activation function promotes sparsity. Motivated by the statistical properties of the pre-activations, the usefulness of statistical normalisation of bottleneck features was also investigated. To this end, methods such as mean(-variance) normalisation, Gaussianisation, and histogram equalisation (HEQ) were employed and up to 2\% (absolute) WER reduction achieved in the Aurora-4 task.

up
0 users have voted:

Paper Details

Authors:
Erfan Loweimi, Peter Bell, Steve Renals
Submitted On:
7 May 2019 - 1:08pm
Short Link:
Type:
Poster
Event:
Presenter's Name:
Erfan Loweimi
Paper Code:
2446
Document Year:
2019
Cite

Document Files

Poster Presentation

(29)

Subscribe

[1] Erfan Loweimi, Peter Bell, Steve Renals, "On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3926. Accessed: Jun. 19, 2019.
@article{3926-19,
url = {http://sigport.org/3926},
author = {Erfan Loweimi; Peter Bell; Steve Renals },
publisher = {IEEE SigPort},
title = {On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition},
year = {2019} }
TY - EJOUR
T1 - On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition
AU - Erfan Loweimi; Peter Bell; Steve Renals
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3926
ER -
Erfan Loweimi, Peter Bell, Steve Renals. (2019). On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition. IEEE SigPort. http://sigport.org/3926
Erfan Loweimi, Peter Bell, Steve Renals, 2019. On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition. Available at: http://sigport.org/3926.
Erfan Loweimi, Peter Bell, Steve Renals. (2019). "On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition." Web.
1. Erfan Loweimi, Peter Bell, Steve Renals. On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3926