Sorry, you need to enable JavaScript to visit this website.

Robust Speech Recognition (SPE-ROBU)

Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data

Paper Details

Authors:
Jun Wang, Dan Su, Jie Chen, Shulin Feng, Dongpeng Ma, Na Li, Dong Yu
Submitted On:
15 May 2019 - 3:11am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Slides.pdf

(642)

Subscribe

[1] Jun Wang, Dan Su, Jie Chen, Shulin Feng, Dongpeng Ma, Na Li, Dong Yu, "Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4513. Accessed: Sep. 20, 2019.
@article{4513-19,
url = {http://sigport.org/4513},
author = {Jun Wang; Dan Su; Jie Chen; Shulin Feng; Dongpeng Ma; Na Li; Dong Yu },
publisher = {IEEE SigPort},
title = {Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data},
year = {2019} }
TY - EJOUR
T1 - Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data
AU - Jun Wang; Dan Su; Jie Chen; Shulin Feng; Dongpeng Ma; Na Li; Dong Yu
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4513
ER -
Jun Wang, Dan Su, Jie Chen, Shulin Feng, Dongpeng Ma, Na Li, Dong Yu. (2019). Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data. IEEE SigPort. http://sigport.org/4513
Jun Wang, Dan Su, Jie Chen, Shulin Feng, Dongpeng Ma, Na Li, Dong Yu, 2019. Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data. Available at: http://sigport.org/4513.
Jun Wang, Dan Su, Jie Chen, Shulin Feng, Dongpeng Ma, Na Li, Dong Yu. (2019). "Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data." Web.
1. Jun Wang, Dan Su, Jie Chen, Shulin Feng, Dongpeng Ma, Na Li, Dong Yu. Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4513

Conditional Teacher-Student Learning


The teacher-student (T/S) learning has been shown to be effective for a variety of problems such as domain adaptation and model compression. One shortcoming of the T/S learning is that a teacher model, not always perfect, sporadically produces wrong guidance in form of posterior probabilities that misleads the student model towards a suboptimal performance.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong
Submitted On:
12 May 2019 - 9:23pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

cts_poster.pptx

(39)

Subscribe

[1] Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong, "Conditional Teacher-Student Learning", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4472. Accessed: Sep. 20, 2019.
@article{4472-19,
url = {http://sigport.org/4472},
author = {Zhong Meng; Jinyu Li; Yong Zhao; Yifan Gong },
publisher = {IEEE SigPort},
title = {Conditional Teacher-Student Learning},
year = {2019} }
TY - EJOUR
T1 - Conditional Teacher-Student Learning
AU - Zhong Meng; Jinyu Li; Yong Zhao; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4472
ER -
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. (2019). Conditional Teacher-Student Learning. IEEE SigPort. http://sigport.org/4472
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong, 2019. Conditional Teacher-Student Learning. Available at: http://sigport.org/4472.
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. (2019). "Conditional Teacher-Student Learning." Web.
1. Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. Conditional Teacher-Student Learning [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4472

On reducing the effect of speaker overlap for CHiME-5


The CHiME-5 speech separation and recognition challenge was recently shown to pose a difficult task for the current automatic speech recognition systems.
Speaker overlap was one of the main difficulties of the challenge. The presence of noise, reverberation and the moving speakers have made the traditional source separation methods ineffective in improving the recognition accuracy.
In this paper we have explored several enhancement strategies aimed to reduce the effect of speaker overlap for CHiME-5 without performing source separation.

Paper Details

Authors:
Catalin Zorila, Rama Doddipatla
Submitted On:
11 May 2019 - 8:58pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster presentation

(30)

Subscribe

[1] Catalin Zorila, Rama Doddipatla, "On reducing the effect of speaker overlap for CHiME-5", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4454. Accessed: Sep. 20, 2019.
@article{4454-19,
url = {http://sigport.org/4454},
author = {Catalin Zorila; Rama Doddipatla },
publisher = {IEEE SigPort},
title = {On reducing the effect of speaker overlap for CHiME-5},
year = {2019} }
TY - EJOUR
T1 - On reducing the effect of speaker overlap for CHiME-5
AU - Catalin Zorila; Rama Doddipatla
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4454
ER -
Catalin Zorila, Rama Doddipatla. (2019). On reducing the effect of speaker overlap for CHiME-5. IEEE SigPort. http://sigport.org/4454
Catalin Zorila, Rama Doddipatla, 2019. On reducing the effect of speaker overlap for CHiME-5. Available at: http://sigport.org/4454.
Catalin Zorila, Rama Doddipatla. (2019). "On reducing the effect of speaker overlap for CHiME-5." Web.
1. Catalin Zorila, Rama Doddipatla. On reducing the effect of speaker overlap for CHiME-5 [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4454

Analyzing Uncertainties in Speech Recognition Using Dropout


The performance of Automatic Speech Recognition (ASR) systems is often measured using Word Error Rates (WER) which requires time-consuming and expensive manually transcribed data. In this paper, we use state-of-the-art ASR systems based on Deep Neural Networks (DNN) and propose a novel framework which uses ``Dropout'' at the test time to model uncertainty in prediction hypotheses. We systematically exploit this uncertainty to estimate WER without the need for explicit transcriptions.

Paper Details

Authors:
Hervé Bourlard
Submitted On:
11 May 2019 - 8:55am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster_avyas_ICASSP_2019.pdf

(32)

Subscribe

[1] Hervé Bourlard , "Analyzing Uncertainties in Speech Recognition Using Dropout", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4441. Accessed: Sep. 20, 2019.
@article{4441-19,
url = {http://sigport.org/4441},
author = { Hervé Bourlard },
publisher = {IEEE SigPort},
title = {Analyzing Uncertainties in Speech Recognition Using Dropout},
year = {2019} }
TY - EJOUR
T1 - Analyzing Uncertainties in Speech Recognition Using Dropout
AU - Hervé Bourlard
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4441
ER -
Hervé Bourlard . (2019). Analyzing Uncertainties in Speech Recognition Using Dropout. IEEE SigPort. http://sigport.org/4441
Hervé Bourlard , 2019. Analyzing Uncertainties in Speech Recognition Using Dropout. Available at: http://sigport.org/4441.
Hervé Bourlard . (2019). "Analyzing Uncertainties in Speech Recognition Using Dropout." Web.
1. Hervé Bourlard . Analyzing Uncertainties in Speech Recognition Using Dropout [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4441

MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION


The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between speech enhancement and ASR optimization objectives.

Paper Details

Authors:
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister
Submitted On:
10 May 2019 - 6:38pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster file

(29)

manuscript file

(28)

Subscribe

[1] Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister, "MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4420. Accessed: Sep. 20, 2019.
@article{4420-19,
url = {http://sigport.org/4420},
author = {Shiva Sundaram; Nikko Strom; Bjorn Hoffmeister },
publisher = {IEEE SigPort},
title = {MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION},
year = {2019} }
TY - EJOUR
T1 - MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
AU - Shiva Sundaram; Nikko Strom; Bjorn Hoffmeister
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4420
ER -
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. (2019). MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/4420
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister, 2019. MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION. Available at: http://sigport.org/4420.
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. (2019). "MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION." Web.
1. Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4420

FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION


Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques do not always yield ASR accuracy improvement because the optimization criterion for speech enhancement is not directly relevant to the ASR objective. In this work, we develop new acoustic modeling techniques that optimize spatial filtering and long short-term memory (LSTM) layers from multi-channel (MC) input based on an ASR criterion directly.

Paper Details

Authors:
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister
Submitted On:
10 May 2019 - 6:36pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster file

(27)

manuscript file

(24)

Subscribe

[1] Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister, "FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4419. Accessed: Sep. 20, 2019.
@article{4419-19,
url = {http://sigport.org/4419},
author = {Shiva Sundaram; Nikko Strom; Bjorn Hoffmeister },
publisher = {IEEE SigPort},
title = {FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION},
year = {2019} }
TY - EJOUR
T1 - FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
AU - Shiva Sundaram; Nikko Strom; Bjorn Hoffmeister
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4419
ER -
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. (2019). FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/4419
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister, 2019. FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION. Available at: http://sigport.org/4419.
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. (2019). "FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION." Web.
1. Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4419

REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION

Paper Details

Authors:
Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, Yu Tsao, Hsin-Min Wang, Tai-Shih Chi,
Submitted On:
9 May 2019 - 12:29pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION.pdf

(28)

Subscribe

[1] Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, Yu Tsao, Hsin-Min Wang, Tai-Shih Chi,, "REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4221. Accessed: Sep. 20, 2019.
@article{4221-19,
url = {http://sigport.org/4221},
author = {Yih-Liang Shen; Chao-Yuan Huang; Syu-Siang Wang; Yu Tsao; Hsin-Min Wang; Tai-Shih Chi; },
publisher = {IEEE SigPort},
title = {REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION},
year = {2019} }
TY - EJOUR
T1 - REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION
AU - Yih-Liang Shen; Chao-Yuan Huang; Syu-Siang Wang; Yu Tsao; Hsin-Min Wang; Tai-Shih Chi;
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4221
ER -
Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, Yu Tsao, Hsin-Min Wang, Tai-Shih Chi,. (2019). REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/4221
Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, Yu Tsao, Hsin-Min Wang, Tai-Shih Chi,, 2019. REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION. Available at: http://sigport.org/4221.
Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, Yu Tsao, Hsin-Min Wang, Tai-Shih Chi,. (2019). "REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION." Web.
1. Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, Yu Tsao, Hsin-Min Wang, Tai-Shih Chi,. REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4221

Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech


This paper investigates the use of subband temporal envelope (STE) features and speed perturbation based data augmentation in end-to-end recognition of distant conversational speech in everyday home environments. STE features track energy peaks in perceptual frequency bands which reflect the resonant properties of the vocal tract. Data augmentation is performed by adding more training data obtained after modifying the speed of the original training data.

Paper Details

Authors:
Submitted On:
9 May 2019 - 9:29am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster_icassp2019_CTDO.pdf

(26)

Subscribe

[1] , "Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4187. Accessed: Sep. 20, 2019.
@article{4187-19,
url = {http://sigport.org/4187},
author = { },
publisher = {IEEE SigPort},
title = {Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech},
year = {2019} }
TY - EJOUR
T1 - Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech
AU -
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4187
ER -
. (2019). Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech. IEEE SigPort. http://sigport.org/4187
, 2019. Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech. Available at: http://sigport.org/4187.
. (2019). "Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech." Web.
1. . Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4187

On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition


DNNs play a major role in the state-of-the-art ASR systems. They can be used for extracting features and building probabilistic models for acoustic and language modelling. Despite their huge practical success, the level of theoretical understanding has remained shallow. This paper investigates DNNs from a statistical standpoint. In particular, the effect of activation functions on the distribution of the pre-activations and activations is investigated and discussed from both analytic and empirical viewpoints.

Paper Details

Authors:
Erfan Loweimi, Peter Bell, Steve Renals
Submitted On:
7 May 2019 - 1:08pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster Presentation

(50)

Keywords

Additional Categories

Subscribe

[1] Erfan Loweimi, Peter Bell, Steve Renals, "On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3926. Accessed: Sep. 20, 2019.
@article{3926-19,
url = {http://sigport.org/3926},
author = {Erfan Loweimi; Peter Bell; Steve Renals },
publisher = {IEEE SigPort},
title = {On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition},
year = {2019} }
TY - EJOUR
T1 - On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition
AU - Erfan Loweimi; Peter Bell; Steve Renals
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3926
ER -
Erfan Loweimi, Peter Bell, Steve Renals. (2019). On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition. IEEE SigPort. http://sigport.org/3926
Erfan Loweimi, Peter Bell, Steve Renals, 2019. On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition. Available at: http://sigport.org/3926.
Erfan Loweimi, Peter Bell, Steve Renals. (2019). "On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition." Web.
1. Erfan Loweimi, Peter Bell, Steve Renals. On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3926

SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION


In this paper, we present an algorithm called Reliable Mask Selection-Phase Difference Channel Weighting (RMS-PDCW) which selects the target source masked by a noise source using the Angle of Arrival (AoA) information calculated using the phase difference information. The RMS-PDCW algorithm selects masks to apply using the information about the localized sound source and the onset detection of speech.

Paper Details

Authors:
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern
Submitted On:
7 May 2018 - 12:38am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

icassp_4465_poster.pdf

(194)

Subscribe

[1] Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern, "SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3203. Accessed: Sep. 20, 2019.
@article{3203-18,
url = {http://sigport.org/3203},
author = {Chanwoo Kim; Anjali Menon; Michiel Bacchiani ; Richard Stern },
publisher = {IEEE SigPort},
title = {SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION},
year = {2018} }
TY - EJOUR
T1 - SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION
AU - Chanwoo Kim; Anjali Menon; Michiel Bacchiani ; Richard Stern
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3203
ER -
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern. (2018). SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION. IEEE SigPort. http://sigport.org/3203
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern, 2018. SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION. Available at: http://sigport.org/3203.
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern. (2018). "SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION." Web.
1. Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern. SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3203

Pages