Sorry, you need to enable JavaScript to visit this website.

Audio and Acoustic Signal Processing

Linearly Augmented Deep Neural Network


Deep neural networks (DNN) are a powerful tool for many large vocabulary continuous speech recognition (LVCSR) tasks. Training a very deep network is a challenging problem and pre-training techniques are needed in order to achieve the best results. In this paper, we propose a new type of network architecture, Linear Augmented Deep Neural Network (LA-DNN). This type of network augments each non-linear layer with a linear connection from layer input to layer output.

Paper Details

Authors:
Pegah Ghahremani, Jasha Droppo, Michael L. Seltzer
Submitted On:
30 April 2016 - 7:54pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

LinearlyAugmented-Icassp Presentation.pdf

(211 downloads)

Keywords

Subscribe

[1] Pegah Ghahremani, Jasha Droppo, Michael L. Seltzer, "Linearly Augmented Deep Neural Network", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1101. Accessed: Oct. 19, 2017.
@article{1101-16,
url = {http://sigport.org/1101},
author = {Pegah Ghahremani; Jasha Droppo; Michael L. Seltzer },
publisher = {IEEE SigPort},
title = {Linearly Augmented Deep Neural Network},
year = {2016} }
TY - EJOUR
T1 - Linearly Augmented Deep Neural Network
AU - Pegah Ghahremani; Jasha Droppo; Michael L. Seltzer
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1101
ER -
Pegah Ghahremani, Jasha Droppo, Michael L. Seltzer. (2016). Linearly Augmented Deep Neural Network. IEEE SigPort. http://sigport.org/1101
Pegah Ghahremani, Jasha Droppo, Michael L. Seltzer, 2016. Linearly Augmented Deep Neural Network. Available at: http://sigport.org/1101.
Pegah Ghahremani, Jasha Droppo, Michael L. Seltzer. (2016). "Linearly Augmented Deep Neural Network." Web.
1. Pegah Ghahremani, Jasha Droppo, Michael L. Seltzer. Linearly Augmented Deep Neural Network [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1101

Recurrent neural networks for polyphonic sound event detection in real life recordings


RECURRENT NEURAL NETWORKS FOR POLYPHONIC SOUND EVENT DETECTION IN REAL LIFE RECORDINGS

Slides from the presentation held at ICASSP 2016 for the paper: Recurrent neural networks for polyphonic sound event detection in real life recordings

Paper Details

Authors:
Heikki Huttunen, Tuomas Virtanen
Submitted On:
4 April 2016 - 9:45am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_2016_slides.pdf

(257 downloads)

Keywords

Subscribe

[1] Heikki Huttunen, Tuomas Virtanen, "Recurrent neural networks for polyphonic sound event detection in real life recordings", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1082. Accessed: Oct. 19, 2017.
@article{1082-16,
url = {http://sigport.org/1082},
author = {Heikki Huttunen; Tuomas Virtanen },
publisher = {IEEE SigPort},
title = {Recurrent neural networks for polyphonic sound event detection in real life recordings},
year = {2016} }
TY - EJOUR
T1 - Recurrent neural networks for polyphonic sound event detection in real life recordings
AU - Heikki Huttunen; Tuomas Virtanen
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1082
ER -
Heikki Huttunen, Tuomas Virtanen. (2016). Recurrent neural networks for polyphonic sound event detection in real life recordings. IEEE SigPort. http://sigport.org/1082
Heikki Huttunen, Tuomas Virtanen, 2016. Recurrent neural networks for polyphonic sound event detection in real life recordings. Available at: http://sigport.org/1082.
Heikki Huttunen, Tuomas Virtanen. (2016). "Recurrent neural networks for polyphonic sound event detection in real life recordings." Web.
1. Heikki Huttunen, Tuomas Virtanen. Recurrent neural networks for polyphonic sound event detection in real life recordings [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1082

Hardware Implementation of FIR/IIR Digital Filters Using Integral Stochastic Computation

Paper Details

Authors:
Francois Leduc-Primeau, Warren Gross
Submitted On:
23 March 2016 - 8:21pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP Pres.pdf

(202 downloads)

Keywords

Additional Categories

Subscribe

[1] Francois Leduc-Primeau, Warren Gross, "Hardware Implementation of FIR/IIR Digital Filters Using Integral Stochastic Computation", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1010. Accessed: Oct. 19, 2017.
@article{1010-16,
url = {http://sigport.org/1010},
author = {Francois Leduc-Primeau; Warren Gross },
publisher = {IEEE SigPort},
title = {Hardware Implementation of FIR/IIR Digital Filters Using Integral Stochastic Computation},
year = {2016} }
TY - EJOUR
T1 - Hardware Implementation of FIR/IIR Digital Filters Using Integral Stochastic Computation
AU - Francois Leduc-Primeau; Warren Gross
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1010
ER -
Francois Leduc-Primeau, Warren Gross. (2016). Hardware Implementation of FIR/IIR Digital Filters Using Integral Stochastic Computation. IEEE SigPort. http://sigport.org/1010
Francois Leduc-Primeau, Warren Gross, 2016. Hardware Implementation of FIR/IIR Digital Filters Using Integral Stochastic Computation. Available at: http://sigport.org/1010.
Francois Leduc-Primeau, Warren Gross. (2016). "Hardware Implementation of FIR/IIR Digital Filters Using Integral Stochastic Computation." Web.
1. Francois Leduc-Primeau, Warren Gross. Hardware Implementation of FIR/IIR Digital Filters Using Integral Stochastic Computation [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1010

INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION

Paper Details

Authors:
Mickael Rouvier
Submitted On:
21 March 2016 - 9:02pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

lif_poster_asru2015.pdf

(226 downloads)

Keywords

Subscribe

[1] Mickael Rouvier, "INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/944. Accessed: Oct. 19, 2017.
@article{944-16,
url = {http://sigport.org/944},
author = {Mickael Rouvier },
publisher = {IEEE SigPort},
title = {INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION},
year = {2016} }
TY - EJOUR
T1 - INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
AU - Mickael Rouvier
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/944
ER -
Mickael Rouvier. (2016). INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION. IEEE SigPort. http://sigport.org/944
Mickael Rouvier, 2016. INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION. Available at: http://sigport.org/944.
Mickael Rouvier. (2016). "INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION." Web.
1. Mickael Rouvier. INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/944

Fast variational Bayesian signal recovery in the presence of Poisson-Gaussian Noise


This paper presents a new method for solving linear inverse problems where the observations are corrupted with a mixed Poisson-Gaussian noise.

slides.pdf

PDF icon slides.pdf (220 downloads)

Paper Details

Authors:
Yosra Marnissi, Yuling Zheng, and Jean-Christophe Pesquet
Submitted On:
21 March 2016 - 7:25pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

slides.pdf

(220 downloads)

Keywords

Subscribe

[1] Yosra Marnissi, Yuling Zheng, and Jean-Christophe Pesquet, "Fast variational Bayesian signal recovery in the presence of Poisson-Gaussian Noise", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/939. Accessed: Oct. 19, 2017.
@article{939-16,
url = {http://sigport.org/939},
author = {Yosra Marnissi; Yuling Zheng; and Jean-Christophe Pesquet },
publisher = {IEEE SigPort},
title = {Fast variational Bayesian signal recovery in the presence of Poisson-Gaussian Noise},
year = {2016} }
TY - EJOUR
T1 - Fast variational Bayesian signal recovery in the presence of Poisson-Gaussian Noise
AU - Yosra Marnissi; Yuling Zheng; and Jean-Christophe Pesquet
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/939
ER -
Yosra Marnissi, Yuling Zheng, and Jean-Christophe Pesquet. (2016). Fast variational Bayesian signal recovery in the presence of Poisson-Gaussian Noise. IEEE SigPort. http://sigport.org/939
Yosra Marnissi, Yuling Zheng, and Jean-Christophe Pesquet, 2016. Fast variational Bayesian signal recovery in the presence of Poisson-Gaussian Noise. Available at: http://sigport.org/939.
Yosra Marnissi, Yuling Zheng, and Jean-Christophe Pesquet. (2016). "Fast variational Bayesian signal recovery in the presence of Poisson-Gaussian Noise." Web.
1. Yosra Marnissi, Yuling Zheng, and Jean-Christophe Pesquet. Fast variational Bayesian signal recovery in the presence of Poisson-Gaussian Noise [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/939

Simple Multi Frame Analysis Methods for Estimation of Amplitude Spectral Envelope in Singing Voice


SFA vs MFA

In the state of the art, a single frame of DFT transform is commonly used as a basis for building amplitude spectral envelopes.
Multiple Frame Analysis (MFA) has already been suggested for envelope estimation, but often with excessive complexity.
In this paper, two MFA-based methods are presented: one simplifying an existing Least Square (LS) solution, and another one based on a simple linear interpolation.

Paper Details

Authors:
Gilles Degottex, Luc Ardaillon, Axel Roebel
Submitted On:
21 March 2016 - 11:37am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

SP-L2.2-2418-degottex.pdf

(179 downloads)

Keywords

Subscribe

[1] Gilles Degottex, Luc Ardaillon, Axel Roebel, "Simple Multi Frame Analysis Methods for Estimation of Amplitude Spectral Envelope in Singing Voice", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/930. Accessed: Oct. 19, 2017.
@article{930-16,
url = {http://sigport.org/930},
author = {Gilles Degottex; Luc Ardaillon; Axel Roebel },
publisher = {IEEE SigPort},
title = {Simple Multi Frame Analysis Methods for Estimation of Amplitude Spectral Envelope in Singing Voice},
year = {2016} }
TY - EJOUR
T1 - Simple Multi Frame Analysis Methods for Estimation of Amplitude Spectral Envelope in Singing Voice
AU - Gilles Degottex; Luc Ardaillon; Axel Roebel
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/930
ER -
Gilles Degottex, Luc Ardaillon, Axel Roebel. (2016). Simple Multi Frame Analysis Methods for Estimation of Amplitude Spectral Envelope in Singing Voice. IEEE SigPort. http://sigport.org/930
Gilles Degottex, Luc Ardaillon, Axel Roebel, 2016. Simple Multi Frame Analysis Methods for Estimation of Amplitude Spectral Envelope in Singing Voice. Available at: http://sigport.org/930.
Gilles Degottex, Luc Ardaillon, Axel Roebel. (2016). "Simple Multi Frame Analysis Methods for Estimation of Amplitude Spectral Envelope in Singing Voice." Web.
1. Gilles Degottex, Luc Ardaillon, Axel Roebel. Simple Multi Frame Analysis Methods for Estimation of Amplitude Spectral Envelope in Singing Voice [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/930

UTD-CRSS System For The NIST 2015 Language Recognition I-Vector Machine Learning Challenge

Paper Details

Authors:
Chengzhu Yu, Chunlei Zhang, Shivesh Ranjan, Qian Zhang, Abhinav Misra, Finnian Kelly, and John H.L. Hansen
Submitted On:
20 March 2016 - 9:54am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Final-2016-ICASSP-TEMPLATE-POSTER-OneBig-rev1-WIDE-1.ppt

(213 downloads)

Keywords

Subscribe

[1] Chengzhu Yu, Chunlei Zhang, Shivesh Ranjan, Qian Zhang, Abhinav Misra, Finnian Kelly, and John H.L. Hansen, "UTD-CRSS System For The NIST 2015 Language Recognition I-Vector Machine Learning Challenge", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/875. Accessed: Oct. 19, 2017.
@article{875-16,
url = {http://sigport.org/875},
author = {Chengzhu Yu; Chunlei Zhang; Shivesh Ranjan; Qian Zhang; Abhinav Misra; Finnian Kelly; and John H.L. Hansen },
publisher = {IEEE SigPort},
title = {UTD-CRSS System For The NIST 2015 Language Recognition I-Vector Machine Learning Challenge},
year = {2016} }
TY - EJOUR
T1 - UTD-CRSS System For The NIST 2015 Language Recognition I-Vector Machine Learning Challenge
AU - Chengzhu Yu; Chunlei Zhang; Shivesh Ranjan; Qian Zhang; Abhinav Misra; Finnian Kelly; and John H.L. Hansen
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/875
ER -
Chengzhu Yu, Chunlei Zhang, Shivesh Ranjan, Qian Zhang, Abhinav Misra, Finnian Kelly, and John H.L. Hansen. (2016). UTD-CRSS System For The NIST 2015 Language Recognition I-Vector Machine Learning Challenge. IEEE SigPort. http://sigport.org/875
Chengzhu Yu, Chunlei Zhang, Shivesh Ranjan, Qian Zhang, Abhinav Misra, Finnian Kelly, and John H.L. Hansen, 2016. UTD-CRSS System For The NIST 2015 Language Recognition I-Vector Machine Learning Challenge. Available at: http://sigport.org/875.
Chengzhu Yu, Chunlei Zhang, Shivesh Ranjan, Qian Zhang, Abhinav Misra, Finnian Kelly, and John H.L. Hansen. (2016). "UTD-CRSS System For The NIST 2015 Language Recognition I-Vector Machine Learning Challenge." Web.
1. Chengzhu Yu, Chunlei Zhang, Shivesh Ranjan, Qian Zhang, Abhinav Misra, Finnian Kelly, and John H.L. Hansen. UTD-CRSS System For The NIST 2015 Language Recognition I-Vector Machine Learning Challenge [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/875

STRUCTURALLY-CONSTRAINED GRADIENT DESCENT FOR MATRIX FACTORIZATION IN HAPLOTYPE ASSEMBLY PROBLEMS

Paper Details

Authors:
Changxiao Cai, Sujay Sanghavi, Haris Vikalo
Submitted On:
19 March 2016 - 11:00pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

poster.pdf

(287 downloads)

Keywords

Subscribe

[1] Changxiao Cai, Sujay Sanghavi, Haris Vikalo, "STRUCTURALLY-CONSTRAINED GRADIENT DESCENT FOR MATRIX FACTORIZATION IN HAPLOTYPE ASSEMBLY PROBLEMS", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/854. Accessed: Oct. 19, 2017.
@article{854-16,
url = {http://sigport.org/854},
author = {Changxiao Cai; Sujay Sanghavi; Haris Vikalo },
publisher = {IEEE SigPort},
title = {STRUCTURALLY-CONSTRAINED GRADIENT DESCENT FOR MATRIX FACTORIZATION IN HAPLOTYPE ASSEMBLY PROBLEMS},
year = {2016} }
TY - EJOUR
T1 - STRUCTURALLY-CONSTRAINED GRADIENT DESCENT FOR MATRIX FACTORIZATION IN HAPLOTYPE ASSEMBLY PROBLEMS
AU - Changxiao Cai; Sujay Sanghavi; Haris Vikalo
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/854
ER -
Changxiao Cai, Sujay Sanghavi, Haris Vikalo. (2016). STRUCTURALLY-CONSTRAINED GRADIENT DESCENT FOR MATRIX FACTORIZATION IN HAPLOTYPE ASSEMBLY PROBLEMS. IEEE SigPort. http://sigport.org/854
Changxiao Cai, Sujay Sanghavi, Haris Vikalo, 2016. STRUCTURALLY-CONSTRAINED GRADIENT DESCENT FOR MATRIX FACTORIZATION IN HAPLOTYPE ASSEMBLY PROBLEMS. Available at: http://sigport.org/854.
Changxiao Cai, Sujay Sanghavi, Haris Vikalo. (2016). "STRUCTURALLY-CONSTRAINED GRADIENT DESCENT FOR MATRIX FACTORIZATION IN HAPLOTYPE ASSEMBLY PROBLEMS." Web.
1. Changxiao Cai, Sujay Sanghavi, Haris Vikalo. STRUCTURALLY-CONSTRAINED GRADIENT DESCENT FOR MATRIX FACTORIZATION IN HAPLOTYPE ASSEMBLY PROBLEMS [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/854

Audio Word Similarity for Clustering with Zero Resources based on iterative HMM Classification


Recent work on zero resource word discovery makes intensive use of audio fragment clustering to find repeating speech patterns. In the absence of acoustic models, the clustering step traditionally relies on dynamic time warping (DTW) to compare two samples and thus suffers from the known limitations of this technique. We propose a new sample comparison method, called 'similarity by terative classification', that exploits the modeling capacities of hidden Markov models (HMM) with no supervision.

Paper Details

Authors:
Amélie Royer, Guillaume Gravier, Vincent Claveau
Submitted On:
19 March 2016 - 2:37pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Poster_ICASSP.pdf

(564 downloads)

Keywords

Subscribe

[1] Amélie Royer, Guillaume Gravier, Vincent Claveau, "Audio Word Similarity for Clustering with Zero Resources based on iterative HMM Classification", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/829. Accessed: Oct. 19, 2017.
@article{829-16,
url = {http://sigport.org/829},
author = {Amélie Royer; Guillaume Gravier; Vincent Claveau },
publisher = {IEEE SigPort},
title = {Audio Word Similarity for Clustering with Zero Resources based on iterative HMM Classification},
year = {2016} }
TY - EJOUR
T1 - Audio Word Similarity for Clustering with Zero Resources based on iterative HMM Classification
AU - Amélie Royer; Guillaume Gravier; Vincent Claveau
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/829
ER -
Amélie Royer, Guillaume Gravier, Vincent Claveau. (2016). Audio Word Similarity for Clustering with Zero Resources based on iterative HMM Classification. IEEE SigPort. http://sigport.org/829
Amélie Royer, Guillaume Gravier, Vincent Claveau, 2016. Audio Word Similarity for Clustering with Zero Resources based on iterative HMM Classification. Available at: http://sigport.org/829.
Amélie Royer, Guillaume Gravier, Vincent Claveau. (2016). "Audio Word Similarity for Clustering with Zero Resources based on iterative HMM Classification." Web.
1. Amélie Royer, Guillaume Gravier, Vincent Claveau. Audio Word Similarity for Clustering with Zero Resources based on iterative HMM Classification [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/829

EQUALIZATION MATCHING OF SPEECH RECORDINGS IN REAL-WORLD ENVIRONMENTS


EQUALIZATION MATCHING OF SPEECH RECORDINGS  IN REAL-WORLD ENVIRONMENTS

When different parts of speech content such as voice-overs and narration are recorded in real-world environments with different acoustic properties and background noise, the difference in sound quality between the recordings is typically quite audible and therefore undesirable. We propose an algorithm to equalize multiple such speech recordings so that they sound like they were recorded in the same environment. As the timbral content of the speech and background noise typically differ considerably, a simple equalization matching results in a noticeable mismatch in the output signals.

Paper Details

Authors:
Francois G. Germain, Gautham J. Mysore, Takako Fujioka
Submitted On:
19 March 2016 - 10:11am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

poster_fg_icassp16.pdf

(186 downloads)

Keywords

Subscribe

[1] Francois G. Germain, Gautham J. Mysore, Takako Fujioka, "EQUALIZATION MATCHING OF SPEECH RECORDINGS IN REAL-WORLD ENVIRONMENTS", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/812. Accessed: Oct. 19, 2017.
@article{812-16,
url = {http://sigport.org/812},
author = {Francois G. Germain; Gautham J. Mysore; Takako Fujioka },
publisher = {IEEE SigPort},
title = {EQUALIZATION MATCHING OF SPEECH RECORDINGS IN REAL-WORLD ENVIRONMENTS},
year = {2016} }
TY - EJOUR
T1 - EQUALIZATION MATCHING OF SPEECH RECORDINGS IN REAL-WORLD ENVIRONMENTS
AU - Francois G. Germain; Gautham J. Mysore; Takako Fujioka
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/812
ER -
Francois G. Germain, Gautham J. Mysore, Takako Fujioka. (2016). EQUALIZATION MATCHING OF SPEECH RECORDINGS IN REAL-WORLD ENVIRONMENTS. IEEE SigPort. http://sigport.org/812
Francois G. Germain, Gautham J. Mysore, Takako Fujioka, 2016. EQUALIZATION MATCHING OF SPEECH RECORDINGS IN REAL-WORLD ENVIRONMENTS. Available at: http://sigport.org/812.
Francois G. Germain, Gautham J. Mysore, Takako Fujioka. (2016). "EQUALIZATION MATCHING OF SPEECH RECORDINGS IN REAL-WORLD ENVIRONMENTS." Web.
1. Francois G. Germain, Gautham J. Mysore, Takako Fujioka. EQUALIZATION MATCHING OF SPEECH RECORDINGS IN REAL-WORLD ENVIRONMENTS [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/812

Pages