Sorry, you need to enable JavaScript to visit this website.

Robust Speech Recognition (SPE-ROBU)

A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition


We propose a novel speaker-dependent (SD) approach to joint training of deep neural networks (DNNs) with an explicit speech separation structure for multi-talker speech recognition in a single-channel setting. First, a multi-condition training strategy is designed for a SD-DNN recognizer in multi-talker scenarios, which can significantly reduce the decoding runtime and improve the recognition accuracy over the approaches that use speaker-independent DNN models with a complicated joint decoding framework.

Paper Details

Authors:
Yan-Hui Tu, Jun Du, Li-Rong Dai, Chin-Hui Lee
Submitted On:
15 October 2016 - 2:48am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Yanhui_ISCSLP2016_oral.pdf

(318)

Subscribe

[1] Yan-Hui Tu, Jun Du, Li-Rong Dai, Chin-Hui Lee, "A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1216. Accessed: Jun. 25, 2019.
@article{1216-16,
url = {http://sigport.org/1216},
author = {Yan-Hui Tu; Jun Du; Li-Rong Dai; Chin-Hui Lee },
publisher = {IEEE SigPort},
title = {A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition},
year = {2016} }
TY - EJOUR
T1 - A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition
AU - Yan-Hui Tu; Jun Du; Li-Rong Dai; Chin-Hui Lee
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1216
ER -
Yan-Hui Tu, Jun Du, Li-Rong Dai, Chin-Hui Lee. (2016). A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition. IEEE SigPort. http://sigport.org/1216
Yan-Hui Tu, Jun Du, Li-Rong Dai, Chin-Hui Lee, 2016. A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition. Available at: http://sigport.org/1216.
Yan-Hui Tu, Jun Du, Li-Rong Dai, Chin-Hui Lee. (2016). "A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition." Web.
1. Yan-Hui Tu, Jun Du, Li-Rong Dai, Chin-Hui Lee. A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1216

Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition


In this paper, we address the problem of speech recognition in
the presence of additive noise. We investigate the applicability
and efficacy of auditory masking in devising a robust front end
for noisy features. This is achieved by introducing a masking
factor into the Vector Taylor Series (VTS) equations. The resultant
first order VTS approximation is used to compensate the parameters
of a clean speech model and a Minimum Mean Square
Error (MMSE) estimate is used to estimate the clean speech

Paper Details

Authors:
Biswajit Das, Ashish Panda
Submitted On:
14 October 2016 - 8:15am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Paper17_BD.pdf

(300)

Subscribe

[1] Biswajit Das, Ashish Panda, "Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1193. Accessed: Jun. 25, 2019.
@article{1193-16,
url = {http://sigport.org/1193},
author = {Biswajit Das; Ashish Panda },
publisher = {IEEE SigPort},
title = {Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition},
year = {2016} }
TY - EJOUR
T1 - Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition
AU - Biswajit Das; Ashish Panda
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1193
ER -
Biswajit Das, Ashish Panda. (2016). Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition. IEEE SigPort. http://sigport.org/1193
Biswajit Das, Ashish Panda, 2016. Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition. Available at: http://sigport.org/1193.
Biswajit Das, Ashish Panda. (2016). "Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition." Web.
1. Biswajit Das, Ashish Panda. Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1193

Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition


In this paper, we address the problem of speech recognition in
the presence of additive noise. We investigate the applicability
and efficacy of auditory masking in devising a robust front end
for noisy features. This is achieved by introducing a masking
factor into the Vector Taylor Series (VTS) equations. The resultant
first order VTS approximation is used to compensate the parameters
of a clean speech model and a Minimum Mean Square
Error (MMSE) estimate is used to estimate the clean speech

Paper Details

Authors:
Ashish Panda
Submitted On:
14 October 2016 - 8:15am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Paper17_BD.pdf

(331)

Subscribe

[1] Ashish Panda, "Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1192. Accessed: Jun. 25, 2019.
@article{1192-16,
url = {http://sigport.org/1192},
author = {Ashish Panda },
publisher = {IEEE SigPort},
title = {Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition},
year = {2016} }
TY - EJOUR
T1 - Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition
AU - Ashish Panda
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1192
ER -
Ashish Panda. (2016). Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition. IEEE SigPort. http://sigport.org/1192
Ashish Panda, 2016. Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition. Available at: http://sigport.org/1192.
Ashish Panda. (2016). "Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition." Web.
1. Ashish Panda. Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1192

Employing Median Filtering to Enhance the Complex-valued Acoustic Spectrograms in Modulation Domain for Noise-robust Speech Recognition

Paper Details

Authors:
Hsin-Ju Hsieh, Berlin Chen, Jeih-weih Hung
Submitted On:
14 October 2016 - 6:28am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ISCSLP_2016.pdf

(335)

Subscribe

[1] Hsin-Ju Hsieh, Berlin Chen, Jeih-weih Hung, "Employing Median Filtering to Enhance the Complex-valued Acoustic Spectrograms in Modulation Domain for Noise-robust Speech Recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1189. Accessed: Jun. 25, 2019.
@article{1189-16,
url = {http://sigport.org/1189},
author = {Hsin-Ju Hsieh; Berlin Chen; Jeih-weih Hung },
publisher = {IEEE SigPort},
title = {Employing Median Filtering to Enhance the Complex-valued Acoustic Spectrograms in Modulation Domain for Noise-robust Speech Recognition},
year = {2016} }
TY - EJOUR
T1 - Employing Median Filtering to Enhance the Complex-valued Acoustic Spectrograms in Modulation Domain for Noise-robust Speech Recognition
AU - Hsin-Ju Hsieh; Berlin Chen; Jeih-weih Hung
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1189
ER -
Hsin-Ju Hsieh, Berlin Chen, Jeih-weih Hung. (2016). Employing Median Filtering to Enhance the Complex-valued Acoustic Spectrograms in Modulation Domain for Noise-robust Speech Recognition. IEEE SigPort. http://sigport.org/1189
Hsin-Ju Hsieh, Berlin Chen, Jeih-weih Hung, 2016. Employing Median Filtering to Enhance the Complex-valued Acoustic Spectrograms in Modulation Domain for Noise-robust Speech Recognition. Available at: http://sigport.org/1189.
Hsin-Ju Hsieh, Berlin Chen, Jeih-weih Hung. (2016). "Employing Median Filtering to Enhance the Complex-valued Acoustic Spectrograms in Modulation Domain for Noise-robust Speech Recognition." Web.
1. Hsin-Ju Hsieh, Berlin Chen, Jeih-weih Hung. Employing Median Filtering to Enhance the Complex-valued Acoustic Spectrograms in Modulation Domain for Noise-robust Speech Recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1189

Two-Stage Noise Aware Training Using Asymmetric Deep Denoising Autoencoder


Ever since the deep neural network (DNN)-based acoustic model appeared, the recognition performance of automatic peech recognition has been greatly improved. Due to this achievement, various researches on DNN-based technique for noise robustness are also in progress. Among these approaches, the noise-aware training (NAT) technique which aims to improve the inherent robustness of DNN using noise estimates has shown remarkable performance. However, despite the great performance, we cannot be certain whether NAT is an optimal method for sufficiently utilizing the inherent robustness of DNN.

Paper Details

Authors:
Shin Jae Kang, Woo Hyun Kang, Nam Soo Kim
Submitted On:
17 March 2016 - 1:45am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

ICASSP2016_포스터_이강현_그래프2.pdf

(69)

Subscribe

[1] Shin Jae Kang, Woo Hyun Kang, Nam Soo Kim, "Two-Stage Noise Aware Training Using Asymmetric Deep Denoising Autoencoder", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/734. Accessed: Jun. 25, 2019.
@article{734-16,
url = {http://sigport.org/734},
author = {Shin Jae Kang; Woo Hyun Kang; Nam Soo Kim },
publisher = {IEEE SigPort},
title = {Two-Stage Noise Aware Training Using Asymmetric Deep Denoising Autoencoder},
year = {2016} }
TY - EJOUR
T1 - Two-Stage Noise Aware Training Using Asymmetric Deep Denoising Autoencoder
AU - Shin Jae Kang; Woo Hyun Kang; Nam Soo Kim
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/734
ER -
Shin Jae Kang, Woo Hyun Kang, Nam Soo Kim. (2016). Two-Stage Noise Aware Training Using Asymmetric Deep Denoising Autoencoder. IEEE SigPort. http://sigport.org/734
Shin Jae Kang, Woo Hyun Kang, Nam Soo Kim, 2016. Two-Stage Noise Aware Training Using Asymmetric Deep Denoising Autoencoder. Available at: http://sigport.org/734.
Shin Jae Kang, Woo Hyun Kang, Nam Soo Kim. (2016). "Two-Stage Noise Aware Training Using Asymmetric Deep Denoising Autoencoder." Web.
1. Shin Jae Kang, Woo Hyun Kang, Nam Soo Kim. Two-Stage Noise Aware Training Using Asymmetric Deep Denoising Autoencoder [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/734

SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION


In practical situations, the emotional speech utterances are often collected from different devices and conditions, which will obviously affect the recognition performance. To address this issue, in this paper, a novel transfer non-negative matrix factorization (TNMF) method is presented for cross-corpus speech emotion recognition. First, the NMF algorithm is adopted to learn a latent common feature space for the source and target datasets.

Paper Details

Authors:
Peng Song, Shifeng Ou, Wenming Zheng, Yun Jin, Li Zhao
Submitted On:
18 March 2016 - 10:46pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

SRC_TNMF_PengSong.pdf

(388)

Subscribe

[1] Peng Song, Shifeng Ou, Wenming Zheng, Yun Jin, Li Zhao, "SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/700. Accessed: Jun. 25, 2019.
@article{700-16,
url = {http://sigport.org/700},
author = {Peng Song; Shifeng Ou; Wenming Zheng; Yun Jin; Li Zhao },
publisher = {IEEE SigPort},
title = {SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION},
year = {2016} }
TY - EJOUR
T1 - SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION
AU - Peng Song; Shifeng Ou; Wenming Zheng; Yun Jin; Li Zhao
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/700
ER -
Peng Song, Shifeng Ou, Wenming Zheng, Yun Jin, Li Zhao. (2016). SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION. IEEE SigPort. http://sigport.org/700
Peng Song, Shifeng Ou, Wenming Zheng, Yun Jin, Li Zhao, 2016. SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION. Available at: http://sigport.org/700.
Peng Song, Shifeng Ou, Wenming Zheng, Yun Jin, Li Zhao. (2016). "SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION." Web.
1. Peng Song, Shifeng Ou, Wenming Zheng, Yun Jin, Li Zhao. SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/700

Pages