Sorry, you need to enable JavaScript to visit this website.

Audio and Acoustic Signal Processing

wav2letter++ : A Fast Open-Source Speech Recognition Framework


This paper introduces wav2letter++, a fast open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition.

Paper Details

Authors:
Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert
Submitted On:
13 May 2019 - 8:40am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

wav2letter++-poster.pdf

(459)

Keywords

Additional Categories

Subscribe

[1] Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert, "wav2letter++ : A Fast Open-Source Speech Recognition Framework", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4483. Accessed: Jul. 05, 2020.
@article{4483-19,
url = {http://sigport.org/4483},
author = {Vineel Pratap; Awni Hannun; Qiantong Xu; Jeff Cai; Jacob Kahn; Gabriel Synnaeve; Vitaliy Liptchinsky; Ronan Collobert },
publisher = {IEEE SigPort},
title = {wav2letter++ : A Fast Open-Source Speech Recognition Framework},
year = {2019} }
TY - EJOUR
T1 - wav2letter++ : A Fast Open-Source Speech Recognition Framework
AU - Vineel Pratap; Awni Hannun; Qiantong Xu; Jeff Cai; Jacob Kahn; Gabriel Synnaeve; Vitaliy Liptchinsky; Ronan Collobert
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4483
ER -
Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert. (2019). wav2letter++ : A Fast Open-Source Speech Recognition Framework. IEEE SigPort. http://sigport.org/4483
Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert, 2019. wav2letter++ : A Fast Open-Source Speech Recognition Framework. Available at: http://sigport.org/4483.
Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert. (2019). "wav2letter++ : A Fast Open-Source Speech Recognition Framework." Web.
1. Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert. wav2letter++ : A Fast Open-Source Speech Recognition Framework [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4483

Adversarial Speaker Adaptation


We propose a novel adversarial speaker adaptation (ASA) scheme, in which adversarial learning is applied to regularize the distribution of deep hidden features in a speaker-dependent (SD) deep neural network (DNN) acoustic model to be close to that of a fixed speaker-independent (SI) DNN acoustic model during adaptation. An additional discriminator network is introduced to distinguish the deep features generated by the SD model from those produced by the SI model.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Yifan Gong
Submitted On:
12 May 2019 - 9:26pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

asa_oral_v3.pptx

(145)

Subscribe

[1] Zhong Meng, Jinyu Li, Yifan Gong, "Adversarial Speaker Adaptation", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4475. Accessed: Jul. 05, 2020.
@article{4475-19,
url = {http://sigport.org/4475},
author = {Zhong Meng; Jinyu Li; Yifan Gong },
publisher = {IEEE SigPort},
title = {Adversarial Speaker Adaptation},
year = {2019} }
TY - EJOUR
T1 - Adversarial Speaker Adaptation
AU - Zhong Meng; Jinyu Li; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4475
ER -
Zhong Meng, Jinyu Li, Yifan Gong. (2019). Adversarial Speaker Adaptation. IEEE SigPort. http://sigport.org/4475
Zhong Meng, Jinyu Li, Yifan Gong, 2019. Adversarial Speaker Adaptation. Available at: http://sigport.org/4475.
Zhong Meng, Jinyu Li, Yifan Gong. (2019). "Adversarial Speaker Adaptation." Web.
1. Zhong Meng, Jinyu Li, Yifan Gong. Adversarial Speaker Adaptation [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4475

Attentive Adversarial Learning for Domain-Invariant Training


Adversarial domain-invariant training (ADIT) proves to be effective in suppressing the effects of domain variability in acoustic modeling and has led to improved performance in automatic speech recognition (ASR). In ADIT, an auxiliary domain classifier takes in equally-weighted deep features from a deep neural network (DNN) acoustic model and is trained to improve their domain-invariance by optimizing an adversarial loss function.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Yifan Gong
Submitted On:
12 May 2019 - 9:03pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

aadit_poster.pptx

(107)

Subscribe

[1] Zhong Meng, Jinyu Li, Yifan Gong, "Attentive Adversarial Learning for Domain-Invariant Training", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4474. Accessed: Jul. 05, 2020.
@article{4474-19,
url = {http://sigport.org/4474},
author = {Zhong Meng; Jinyu Li; Yifan Gong },
publisher = {IEEE SigPort},
title = {Attentive Adversarial Learning for Domain-Invariant Training},
year = {2019} }
TY - EJOUR
T1 - Attentive Adversarial Learning for Domain-Invariant Training
AU - Zhong Meng; Jinyu Li; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4474
ER -
Zhong Meng, Jinyu Li, Yifan Gong. (2019). Attentive Adversarial Learning for Domain-Invariant Training. IEEE SigPort. http://sigport.org/4474
Zhong Meng, Jinyu Li, Yifan Gong, 2019. Attentive Adversarial Learning for Domain-Invariant Training. Available at: http://sigport.org/4474.
Zhong Meng, Jinyu Li, Yifan Gong. (2019). "Attentive Adversarial Learning for Domain-Invariant Training." Web.
1. Zhong Meng, Jinyu Li, Yifan Gong. Attentive Adversarial Learning for Domain-Invariant Training [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4474

Adversarial Speaker Verification


The use of deep networks to extract embeddings for speaker recognition has proven successfully. However, such embeddings are susceptible to performance degradation due to the mismatches among the training, enrollment, and test conditions. In this work, we propose an adversarial speaker verification (ASV) scheme to learn the condition-invariant deep embedding via adversarial multi-task training. In ASV, a speaker classification network and a condition identification network are jointly optimized to minimize the speaker classification loss and simultaneously mini-maximize the condition loss.

Paper Details

Authors:
Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong
Submitted On:
12 May 2019 - 9:24pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

asv_poster_v3.pptx

(113)

Subscribe

[1] Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong, "Adversarial Speaker Verification", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4473. Accessed: Jul. 05, 2020.
@article{4473-19,
url = {http://sigport.org/4473},
author = {Zhong Meng; Yong Zhao; Jinyu Li; Yifan Gong },
publisher = {IEEE SigPort},
title = {Adversarial Speaker Verification},
year = {2019} }
TY - EJOUR
T1 - Adversarial Speaker Verification
AU - Zhong Meng; Yong Zhao; Jinyu Li; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4473
ER -
Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong. (2019). Adversarial Speaker Verification. IEEE SigPort. http://sigport.org/4473
Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong, 2019. Adversarial Speaker Verification. Available at: http://sigport.org/4473.
Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong. (2019). "Adversarial Speaker Verification." Web.
1. Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong. Adversarial Speaker Verification [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4473

Conditional Teacher-Student Learning


The teacher-student (T/S) learning has been shown to be effective for a variety of problems such as domain adaptation and model compression. One shortcoming of the T/S learning is that a teacher model, not always perfect, sporadically produces wrong guidance in form of posterior probabilities that misleads the student model towards a suboptimal performance.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong
Submitted On:
12 May 2019 - 9:23pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

cts_poster.pptx

(102)

Subscribe

[1] Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong, "Conditional Teacher-Student Learning", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4472. Accessed: Jul. 05, 2020.
@article{4472-19,
url = {http://sigport.org/4472},
author = {Zhong Meng; Jinyu Li; Yong Zhao; Yifan Gong },
publisher = {IEEE SigPort},
title = {Conditional Teacher-Student Learning},
year = {2019} }
TY - EJOUR
T1 - Conditional Teacher-Student Learning
AU - Zhong Meng; Jinyu Li; Yong Zhao; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4472
ER -
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. (2019). Conditional Teacher-Student Learning. IEEE SigPort. http://sigport.org/4472
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong, 2019. Conditional Teacher-Student Learning. Available at: http://sigport.org/4472.
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. (2019). "Conditional Teacher-Student Learning." Web.
1. Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. Conditional Teacher-Student Learning [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4472

Universal Acoustic Using Neural Mixture Models

Paper Details

Authors:
Amit Das, Jinyu Li, Yifan Gong, Changliang Lu
Submitted On:
12 May 2019 - 2:42am
Short Link:
Type:
Event:
Document Year:
Cite

Document Files

UAM_v3.pdf

(132)

Subscribe

[1] Amit Das, Jinyu Li, Yifan Gong, Changliang Lu, "Universal Acoustic Using Neural Mixture Models", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4458. Accessed: Jul. 05, 2020.
@article{4458-19,
url = {http://sigport.org/4458},
author = {Amit Das; Jinyu Li; Yifan Gong; Changliang Lu },
publisher = {IEEE SigPort},
title = {Universal Acoustic Using Neural Mixture Models},
year = {2019} }
TY - EJOUR
T1 - Universal Acoustic Using Neural Mixture Models
AU - Amit Das; Jinyu Li; Yifan Gong; Changliang Lu
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4458
ER -
Amit Das, Jinyu Li, Yifan Gong, Changliang Lu. (2019). Universal Acoustic Using Neural Mixture Models. IEEE SigPort. http://sigport.org/4458
Amit Das, Jinyu Li, Yifan Gong, Changliang Lu, 2019. Universal Acoustic Using Neural Mixture Models. Available at: http://sigport.org/4458.
Amit Das, Jinyu Li, Yifan Gong, Changliang Lu. (2019). "Universal Acoustic Using Neural Mixture Models." Web.
1. Amit Das, Jinyu Li, Yifan Gong, Changliang Lu. Universal Acoustic Using Neural Mixture Models [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4458

Geometric Invariants for Sparse Unknown View Tomography


In this paper, we study a 2D tomography problem for point source models with random unknown view angles. Rather than recovering the projection angles, we reconstruct the model through a set of rotation-invariant features that are estimated from the projection data. For a point source model, we show that these features reveal geometric information about the model such as the radial and pairwise distances. This establishes a connection between unknown view tomography and unassigned distance geometry problem (uDGP).

Paper Details

Authors:
Mona Zehni, Shuai Huang, Ivan Dokmanic, Zhizhen Zhao
Submitted On:
11 May 2019 - 7:56pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

CryoPDF_poster.pdf

(73)

Keywords

Additional Categories

Subscribe

[1] Mona Zehni, Shuai Huang, Ivan Dokmanic, Zhizhen Zhao, "Geometric Invariants for Sparse Unknown View Tomography", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4456. Accessed: Jul. 05, 2020.
@article{4456-19,
url = {http://sigport.org/4456},
author = {Mona Zehni; Shuai Huang; Ivan Dokmanic; Zhizhen Zhao },
publisher = {IEEE SigPort},
title = {Geometric Invariants for Sparse Unknown View Tomography},
year = {2019} }
TY - EJOUR
T1 - Geometric Invariants for Sparse Unknown View Tomography
AU - Mona Zehni; Shuai Huang; Ivan Dokmanic; Zhizhen Zhao
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4456
ER -
Mona Zehni, Shuai Huang, Ivan Dokmanic, Zhizhen Zhao. (2019). Geometric Invariants for Sparse Unknown View Tomography. IEEE SigPort. http://sigport.org/4456
Mona Zehni, Shuai Huang, Ivan Dokmanic, Zhizhen Zhao, 2019. Geometric Invariants for Sparse Unknown View Tomography. Available at: http://sigport.org/4456.
Mona Zehni, Shuai Huang, Ivan Dokmanic, Zhizhen Zhao. (2019). "Geometric Invariants for Sparse Unknown View Tomography." Web.
1. Mona Zehni, Shuai Huang, Ivan Dokmanic, Zhizhen Zhao. Geometric Invariants for Sparse Unknown View Tomography [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4456

ONLINE ESTIMATION AND SMOOTHING OF A TARGET TRAJECTORY IN MIXED STATIONARY/MOVING CONDITIONS


A novel maximum likelihood trajectory estimation algorithm for targets in mixed stationary/moving conditions is presented. The proposed approach is able to estimate position and velocity of the target over arbitrary complex trajectories, while explicitly taking into account the possibility of stop&go motion. Moreover, a novel trajectory reconstruction method based on the theory of Bezier curve is developed for online smoothing of the trajectory, which keeps the advantages of Bayesian smoothing while introducing only a fixed lag in the estimation process.

Paper Details

Authors:
Angelo Coluccia, Alessio Fascista, Giuseppe Ricci
Submitted On:
11 May 2019 - 12:32pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster ICASSP 2019

(64)

Keywords

Additional Categories

Subscribe

[1] Angelo Coluccia, Alessio Fascista, Giuseppe Ricci, "ONLINE ESTIMATION AND SMOOTHING OF A TARGET TRAJECTORY IN MIXED STATIONARY/MOVING CONDITIONS", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4448. Accessed: Jul. 05, 2020.
@article{4448-19,
url = {http://sigport.org/4448},
author = {Angelo Coluccia; Alessio Fascista; Giuseppe Ricci },
publisher = {IEEE SigPort},
title = {ONLINE ESTIMATION AND SMOOTHING OF A TARGET TRAJECTORY IN MIXED STATIONARY/MOVING CONDITIONS},
year = {2019} }
TY - EJOUR
T1 - ONLINE ESTIMATION AND SMOOTHING OF A TARGET TRAJECTORY IN MIXED STATIONARY/MOVING CONDITIONS
AU - Angelo Coluccia; Alessio Fascista; Giuseppe Ricci
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4448
ER -
Angelo Coluccia, Alessio Fascista, Giuseppe Ricci. (2019). ONLINE ESTIMATION AND SMOOTHING OF A TARGET TRAJECTORY IN MIXED STATIONARY/MOVING CONDITIONS. IEEE SigPort. http://sigport.org/4448
Angelo Coluccia, Alessio Fascista, Giuseppe Ricci, 2019. ONLINE ESTIMATION AND SMOOTHING OF A TARGET TRAJECTORY IN MIXED STATIONARY/MOVING CONDITIONS. Available at: http://sigport.org/4448.
Angelo Coluccia, Alessio Fascista, Giuseppe Ricci. (2019). "ONLINE ESTIMATION AND SMOOTHING OF A TARGET TRAJECTORY IN MIXED STATIONARY/MOVING CONDITIONS." Web.
1. Angelo Coluccia, Alessio Fascista, Giuseppe Ricci. ONLINE ESTIMATION AND SMOOTHING OF A TARGET TRAJECTORY IN MIXED STATIONARY/MOVING CONDITIONS [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4448

Deep Learning for Classroom Activity Detection from Audio


Increasingly, post-secondary instructors are incorporating innovative teaching practices into their classrooms to improve student learning outcomes. In order to assess the effect of these techniques, it is helpful to quantify the types of activity being conducted in the classroom. Unfortunately, self-reporting is unreliable and manual annotation is tedious and scales poorly.

Paper Details

Authors:
Robin Cosbey, Allison Wusterbarth, Brian Hutchinson
Submitted On:
10 May 2019 - 4:33pm
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

[POSTER] Deep Learning for Classroom Activity Detection from Audio

(89)

Subscribe

[1] Robin Cosbey, Allison Wusterbarth, Brian Hutchinson, "Deep Learning for Classroom Activity Detection from Audio", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4404. Accessed: Jul. 05, 2020.
@article{4404-19,
url = {http://sigport.org/4404},
author = {Robin Cosbey; Allison Wusterbarth; Brian Hutchinson },
publisher = {IEEE SigPort},
title = {Deep Learning for Classroom Activity Detection from Audio},
year = {2019} }
TY - EJOUR
T1 - Deep Learning for Classroom Activity Detection from Audio
AU - Robin Cosbey; Allison Wusterbarth; Brian Hutchinson
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4404
ER -
Robin Cosbey, Allison Wusterbarth, Brian Hutchinson. (2019). Deep Learning for Classroom Activity Detection from Audio. IEEE SigPort. http://sigport.org/4404
Robin Cosbey, Allison Wusterbarth, Brian Hutchinson, 2019. Deep Learning for Classroom Activity Detection from Audio. Available at: http://sigport.org/4404.
Robin Cosbey, Allison Wusterbarth, Brian Hutchinson. (2019). "Deep Learning for Classroom Activity Detection from Audio." Web.
1. Robin Cosbey, Allison Wusterbarth, Brian Hutchinson. Deep Learning for Classroom Activity Detection from Audio [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4404

Methodical Design and Trimming of Deep Learning Networks: Enhancing External BP learning with Internal Omnipresent-Supervision Training Paradigm


Back-propagation (BP) is now a classic learning paradigm
whose source of supervision is exclusively from the external
(input/output) nodes. Consequently, BP is easily vulnerable
to curse-of-depth in (very) Deep Learning Networks
(DLNs). This prompts us to advocate Internal Neuron’s
Learnablility (INL) with (1)internal teacher labels (ITL); and
(2)internal optimization metrics (IOM) for evaluating hidden
layers/nodes. Conceptually, INL is a step beyond the notion
of Internal Neuron’s Explainablility (INE), championed by

Paper Details

Authors:
S. Y. Kung, Zejiang Hou, Yuchen Liu
Submitted On:
10 May 2019 - 2:03pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2019.pdf

(72)

Subscribe

[1] S. Y. Kung, Zejiang Hou, Yuchen Liu, "Methodical Design and Trimming of Deep Learning Networks: Enhancing External BP learning with Internal Omnipresent-Supervision Training Paradigm", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4384. Accessed: Jul. 05, 2020.
@article{4384-19,
url = {http://sigport.org/4384},
author = {S. Y. Kung; Zejiang Hou; Yuchen Liu },
publisher = {IEEE SigPort},
title = {Methodical Design and Trimming of Deep Learning Networks: Enhancing External BP learning with Internal Omnipresent-Supervision Training Paradigm},
year = {2019} }
TY - EJOUR
T1 - Methodical Design and Trimming of Deep Learning Networks: Enhancing External BP learning with Internal Omnipresent-Supervision Training Paradigm
AU - S. Y. Kung; Zejiang Hou; Yuchen Liu
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4384
ER -
S. Y. Kung, Zejiang Hou, Yuchen Liu. (2019). Methodical Design and Trimming of Deep Learning Networks: Enhancing External BP learning with Internal Omnipresent-Supervision Training Paradigm. IEEE SigPort. http://sigport.org/4384
S. Y. Kung, Zejiang Hou, Yuchen Liu, 2019. Methodical Design and Trimming of Deep Learning Networks: Enhancing External BP learning with Internal Omnipresent-Supervision Training Paradigm. Available at: http://sigport.org/4384.
S. Y. Kung, Zejiang Hou, Yuchen Liu. (2019). "Methodical Design and Trimming of Deep Learning Networks: Enhancing External BP learning with Internal Omnipresent-Supervision Training Paradigm." Web.
1. S. Y. Kung, Zejiang Hou, Yuchen Liu. Methodical Design and Trimming of Deep Learning Networks: Enhancing External BP learning with Internal Omnipresent-Supervision Training Paradigm [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4384

Pages