Sorry, you need to enable JavaScript to visit this website.

Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR


The speech chain mechanism integrates automatic speech recognition (ASR) and text-to-speech synthesis (TTS) modules into a single cycle during training. In our previous work, we applied a speech chain mechanism as a semi-supervised learning. It provides the ability for ASR and TTS to assist each other when they receive unpaired data and let them infer the missing pair and optimize the model with reconstruction loss.

Paper Details

Authors:
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Submitted On:
14 May 2019 - 8:26pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP19_Poster_V1.pdf

(22)

Subscribe

[1] Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, "END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4519. Accessed: Jul. 19, 2019.
@article{4519-19,
url = {http://sigport.org/4519},
author = {Andros Tjandra; Sakriani Sakti; Satoshi Nakamura },
publisher = {IEEE SigPort},
title = {END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR},
year = {2019} }
TY - EJOUR
T1 - END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR
AU - Andros Tjandra; Sakriani Sakti; Satoshi Nakamura
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4519
ER -
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura. (2019). END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR. IEEE SigPort. http://sigport.org/4519
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, 2019. END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR. Available at: http://sigport.org/4519.
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura. (2019). "END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR." Web.
1. Andros Tjandra, Sakriani Sakti, Satoshi Nakamura. END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4519

PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR

Paper Details

Authors:
Submitted On:
14 May 2019 - 3:25am
Short Link:
Type:
Event:

Document Files

PAPB_icassp-expanded-v2.pdf

(14)

Subscribe

[1] , "PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4502. Accessed: Jul. 19, 2019.
@article{4502-19,
url = {http://sigport.org/4502},
author = { },
publisher = {IEEE SigPort},
title = {PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR},
year = {2019} }
TY - EJOUR
T1 - PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR
AU -
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4502
ER -
. (2019). PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR. IEEE SigPort. http://sigport.org/4502
, 2019. PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR. Available at: http://sigport.org/4502.
. (2019). "PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR." Web.
1. . PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4502

PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR

Paper Details

Authors:
Submitted On:
13 May 2019 - 7:22pm
Short Link:
Type:
Event:

Document Files

PAPB_icassp-expanded.pdf

(19)

Subscribe

[1] , "PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4496. Accessed: Jul. 19, 2019.
@article{4496-19,
url = {http://sigport.org/4496},
author = { },
publisher = {IEEE SigPort},
title = {PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR},
year = {2019} }
TY - EJOUR
T1 - PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR
AU -
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4496
ER -
. (2019). PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR. IEEE SigPort. http://sigport.org/4496
, 2019. PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR. Available at: http://sigport.org/4496.
. (2019). "PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR." Web.
1. . PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4496

Adversarial Speaker Adaptation


We propose a novel adversarial speaker adaptation (ASA) scheme, in which adversarial learning is applied to regularize the distribution of deep hidden features in a speaker-dependent (SD) deep neural network (DNN) acoustic model to be close to that of a fixed speaker-independent (SI) DNN acoustic model during adaptation. An additional discriminator network is introduced to distinguish the deep features generated by the SD model from those produced by the SI model.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Yifan Gong
Submitted On:
12 May 2019 - 9:26pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

asa_oral_v3.pptx

(21)

Subscribe

[1] Zhong Meng, Jinyu Li, Yifan Gong, "Adversarial Speaker Adaptation", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4475. Accessed: Jul. 19, 2019.
@article{4475-19,
url = {http://sigport.org/4475},
author = {Zhong Meng; Jinyu Li; Yifan Gong },
publisher = {IEEE SigPort},
title = {Adversarial Speaker Adaptation},
year = {2019} }
TY - EJOUR
T1 - Adversarial Speaker Adaptation
AU - Zhong Meng; Jinyu Li; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4475
ER -
Zhong Meng, Jinyu Li, Yifan Gong. (2019). Adversarial Speaker Adaptation. IEEE SigPort. http://sigport.org/4475
Zhong Meng, Jinyu Li, Yifan Gong, 2019. Adversarial Speaker Adaptation. Available at: http://sigport.org/4475.
Zhong Meng, Jinyu Li, Yifan Gong. (2019). "Adversarial Speaker Adaptation." Web.
1. Zhong Meng, Jinyu Li, Yifan Gong. Adversarial Speaker Adaptation [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4475

Conditional Teacher-Student Learning


The teacher-student (T/S) learning has been shown to be effective for a variety of problems such as domain adaptation and model compression. One shortcoming of the T/S learning is that a teacher model, not always perfect, sporadically produces wrong guidance in form of posterior probabilities that misleads the student model towards a suboptimal performance.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong
Submitted On:
12 May 2019 - 9:23pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

cts_poster.pptx

(23)

Subscribe

[1] Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong, "Conditional Teacher-Student Learning", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4472. Accessed: Jul. 19, 2019.
@article{4472-19,
url = {http://sigport.org/4472},
author = {Zhong Meng; Jinyu Li; Yong Zhao; Yifan Gong },
publisher = {IEEE SigPort},
title = {Conditional Teacher-Student Learning},
year = {2019} }
TY - EJOUR
T1 - Conditional Teacher-Student Learning
AU - Zhong Meng; Jinyu Li; Yong Zhao; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4472
ER -
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. (2019). Conditional Teacher-Student Learning. IEEE SigPort. http://sigport.org/4472
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong, 2019. Conditional Teacher-Student Learning. Available at: http://sigport.org/4472.
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. (2019). "Conditional Teacher-Student Learning." Web.
1. Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. Conditional Teacher-Student Learning [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4472

PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR

Paper Details

Authors:
Submitted On:
12 May 2019 - 3:13pm
Short Link:
Type:

Document Files

PAPB_icassp-expanded.pdf

(14)

Subscribe

[1] , "PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4469. Accessed: Jul. 19, 2019.
@article{4469-19,
url = {http://sigport.org/4469},
author = { },
publisher = {IEEE SigPort},
title = {PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR},
year = {2019} }
TY - EJOUR
T1 - PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR
AU -
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4469
ER -
. (2019). PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR. IEEE SigPort. http://sigport.org/4469
, 2019. PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR. Available at: http://sigport.org/4469.
. (2019). "PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR." Web.
1. . PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4469

MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION


The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between speech enhancement and ASR optimization objectives.

Paper Details

Authors:
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister
Submitted On:
10 May 2019 - 6:38pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster file

(21)

manuscript file

(19)

Subscribe

[1] Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister, "MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4420. Accessed: Jul. 19, 2019.
@article{4420-19,
url = {http://sigport.org/4420},
author = {Shiva Sundaram; Nikko Strom; Bjorn Hoffmeister },
publisher = {IEEE SigPort},
title = {MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION},
year = {2019} }
TY - EJOUR
T1 - MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
AU - Shiva Sundaram; Nikko Strom; Bjorn Hoffmeister
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4420
ER -
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. (2019). MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/4420
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister, 2019. MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION. Available at: http://sigport.org/4420.
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. (2019). "MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION." Web.
1. Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4420

FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION


Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques do not always yield ASR accuracy improvement because the optimization criterion for speech enhancement is not directly relevant to the ASR objective. In this work, we develop new acoustic modeling techniques that optimize spatial filtering and long short-term memory (LSTM) layers from multi-channel (MC) input based on an ASR criterion directly.

Paper Details

Authors:
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister
Submitted On:
10 May 2019 - 6:36pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster file

(19)

manuscript file

(19)

Subscribe

[1] Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister, "FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4419. Accessed: Jul. 19, 2019.
@article{4419-19,
url = {http://sigport.org/4419},
author = {Shiva Sundaram; Nikko Strom; Bjorn Hoffmeister },
publisher = {IEEE SigPort},
title = {FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION},
year = {2019} }
TY - EJOUR
T1 - FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
AU - Shiva Sundaram; Nikko Strom; Bjorn Hoffmeister
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4419
ER -
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. (2019). FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/4419
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister, 2019. FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION. Available at: http://sigport.org/4419.
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. (2019). "FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION." Web.
1. Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4419

Improving Children Speech Recognition through Feature Learning from Raw Speech Signal

Paper Details

Authors:
S. Pavankumar Dubagunta, Selen Hande Kabil, Mathew Magimai Doss
Submitted On:
18 May 2019 - 6:39am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ChildrenSpeechASR.pdf

(18)

ChildrenSpeechASR.pdf

(13)

Subscribe

[1] S. Pavankumar Dubagunta, Selen Hande Kabil, Mathew Magimai Doss, "Improving Children Speech Recognition through Feature Learning from Raw Speech Signal", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4357. Accessed: Jul. 19, 2019.
@article{4357-19,
url = {http://sigport.org/4357},
author = {S. Pavankumar Dubagunta; Selen Hande Kabil; Mathew Magimai Doss },
publisher = {IEEE SigPort},
title = {Improving Children Speech Recognition through Feature Learning from Raw Speech Signal},
year = {2019} }
TY - EJOUR
T1 - Improving Children Speech Recognition through Feature Learning from Raw Speech Signal
AU - S. Pavankumar Dubagunta; Selen Hande Kabil; Mathew Magimai Doss
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4357
ER -
S. Pavankumar Dubagunta, Selen Hande Kabil, Mathew Magimai Doss. (2019). Improving Children Speech Recognition through Feature Learning from Raw Speech Signal. IEEE SigPort. http://sigport.org/4357
S. Pavankumar Dubagunta, Selen Hande Kabil, Mathew Magimai Doss, 2019. Improving Children Speech Recognition through Feature Learning from Raw Speech Signal. Available at: http://sigport.org/4357.
S. Pavankumar Dubagunta, Selen Hande Kabil, Mathew Magimai Doss. (2019). "Improving Children Speech Recognition through Feature Learning from Raw Speech Signal." Web.
1. S. Pavankumar Dubagunta, Selen Hande Kabil, Mathew Magimai Doss. Improving Children Speech Recognition through Feature Learning from Raw Speech Signal [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4357

Segment-level training based on Confidence Measures for Hybrid HMM/ANN Speech Recognition

Paper Details

Authors:
S. Pavankumar Dubagunta, Mathew Magimai Doss
Submitted On:
10 May 2019 - 10:51am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster___Segment_level_training.pdf

(13)

Subscribe

[1] S. Pavankumar Dubagunta, Mathew Magimai Doss, "Segment-level training based on Confidence Measures for Hybrid HMM/ANN Speech Recognition", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4351. Accessed: Jul. 19, 2019.
@article{4351-19,
url = {http://sigport.org/4351},
author = {S. Pavankumar Dubagunta; Mathew Magimai Doss },
publisher = {IEEE SigPort},
title = {Segment-level training based on Confidence Measures for Hybrid HMM/ANN Speech Recognition},
year = {2019} }
TY - EJOUR
T1 - Segment-level training based on Confidence Measures for Hybrid HMM/ANN Speech Recognition
AU - S. Pavankumar Dubagunta; Mathew Magimai Doss
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4351
ER -
S. Pavankumar Dubagunta, Mathew Magimai Doss. (2019). Segment-level training based on Confidence Measures for Hybrid HMM/ANN Speech Recognition. IEEE SigPort. http://sigport.org/4351
S. Pavankumar Dubagunta, Mathew Magimai Doss, 2019. Segment-level training based on Confidence Measures for Hybrid HMM/ANN Speech Recognition. Available at: http://sigport.org/4351.
S. Pavankumar Dubagunta, Mathew Magimai Doss. (2019). "Segment-level training based on Confidence Measures for Hybrid HMM/ANN Speech Recognition." Web.
1. S. Pavankumar Dubagunta, Mathew Magimai Doss. Segment-level training based on Confidence Measures for Hybrid HMM/ANN Speech Recognition [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4351

Pages