Sorry, you need to enable JavaScript to visit this website.

ICASSP 2018

ICASSP is the world's largest and most comprehensive technical conference on signal processing and its applications. It provides a fantastic networking opportunity for like-minded professionals from around the world. ICASSP 2018 conference will feature world-class presentations by internationally renowned speakers and cutting-edge session topics. Visit ICASSP 2018.

Sufficiency quantification for seamless text-independent speaker enrollment


Text-independent speaker recognition (TI-SR) requires a lengthy enrollment process that involves asking dedicated time from the user to create a reliable model of their voice. Seamless enrollment is a highly attractive feature which refers to the enrollment process that happens in the background and asks for no dedicated time from the user. One of the key problems in a fully automated seamless enrollment process is to determine the sufficiency of a given utterance collection for the purpose of TI-SR. No known metric exists in the literature to quantify sufficiency.

Paper Details

Authors:
Gokcen Cilingir, Jonathan Huang, Mandar S Joshi, Narayan Biswal
Submitted On:
13 July 2018 - 3:38pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster presented at ICASSP 2018

(3 downloads)

Paper for ICASSP 2018

(3 downloads)

Keywords

Subscribe

[1] Gokcen Cilingir, Jonathan Huang, Mandar S Joshi, Narayan Biswal, "Sufficiency quantification for seamless text-independent speaker enrollment", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3379. Accessed: Jul. 15, 2018.
@article{3379-18,
url = {http://sigport.org/3379},
author = {Gokcen Cilingir; Jonathan Huang; Mandar S Joshi; Narayan Biswal },
publisher = {IEEE SigPort},
title = {Sufficiency quantification for seamless text-independent speaker enrollment},
year = {2018} }
TY - EJOUR
T1 - Sufficiency quantification for seamless text-independent speaker enrollment
AU - Gokcen Cilingir; Jonathan Huang; Mandar S Joshi; Narayan Biswal
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3379
ER -
Gokcen Cilingir, Jonathan Huang, Mandar S Joshi, Narayan Biswal. (2018). Sufficiency quantification for seamless text-independent speaker enrollment. IEEE SigPort. http://sigport.org/3379
Gokcen Cilingir, Jonathan Huang, Mandar S Joshi, Narayan Biswal, 2018. Sufficiency quantification for seamless text-independent speaker enrollment. Available at: http://sigport.org/3379.
Gokcen Cilingir, Jonathan Huang, Mandar S Joshi, Narayan Biswal. (2018). "Sufficiency quantification for seamless text-independent speaker enrollment." Web.
1. Gokcen Cilingir, Jonathan Huang, Mandar S Joshi, Narayan Biswal. Sufficiency quantification for seamless text-independent speaker enrollment [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3379

HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK

Paper Details

Authors:
Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba
Submitted On:
17 June 2018 - 4:42am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

poster.pdf

(28 downloads)

Keywords

Subscribe

[1] Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba, "HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3233. Accessed: Jul. 15, 2018.
@article{3233-18,
url = {http://sigport.org/3233},
author = {Fuming Fang; Junichi Yamagishi; Isao Echizen; Jaime Lorenzo-Trueba },
publisher = {IEEE SigPort},
title = {HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK},
year = {2018} }
TY - EJOUR
T1 - HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK
AU - Fuming Fang; Junichi Yamagishi; Isao Echizen; Jaime Lorenzo-Trueba
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3233
ER -
Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba. (2018). HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK. IEEE SigPort. http://sigport.org/3233
Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba, 2018. HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK. Available at: http://sigport.org/3233.
Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba. (2018). "HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK." Web.
1. Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba. HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3233

Self-paced mixture of t distribution model


Gaussian mixture model (GMM) is a powerful probabilistic model for representing the probability distribution of observations in the population. However, the fitness of Gaussian mixture model can be significantly degraded when the data contain a certain amount of outliers. Although there are certain variants of GMM (e.g., mixture of Laplace, mixture of t distribution) attempting to handle outliers, none of them can sufficiently mitigate the effect of outliers if the outliers are far from the centroids.

Paper Details

Authors:
Qingtao Tang, Li Niu, Tao Dai, Xi Xiao, Shu-Tao Xia
Submitted On:
27 May 2018 - 10:23pm
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

icassp-landscape.pdf

(30 downloads)

Keywords

Additional Categories

Subscribe

[1] Qingtao Tang, Li Niu, Tao Dai, Xi Xiao, Shu-Tao Xia, "Self-paced mixture of t distribution model", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3210. Accessed: Jul. 15, 2018.
@article{3210-18,
url = {http://sigport.org/3210},
author = {Qingtao Tang; Li Niu; Tao Dai; Xi Xiao; Shu-Tao Xia },
publisher = {IEEE SigPort},
title = {Self-paced mixture of t distribution model},
year = {2018} }
TY - EJOUR
T1 - Self-paced mixture of t distribution model
AU - Qingtao Tang; Li Niu; Tao Dai; Xi Xiao; Shu-Tao Xia
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3210
ER -
Qingtao Tang, Li Niu, Tao Dai, Xi Xiao, Shu-Tao Xia. (2018). Self-paced mixture of t distribution model. IEEE SigPort. http://sigport.org/3210
Qingtao Tang, Li Niu, Tao Dai, Xi Xiao, Shu-Tao Xia, 2018. Self-paced mixture of t distribution model. Available at: http://sigport.org/3210.
Qingtao Tang, Li Niu, Tao Dai, Xi Xiao, Shu-Tao Xia. (2018). "Self-paced mixture of t distribution model." Web.
1. Qingtao Tang, Li Niu, Tao Dai, Xi Xiao, Shu-Tao Xia. Self-paced mixture of t distribution model [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3210

Unsupervised Learning of Semantic Audio Representations

Paper Details

Authors:
Submitted On:
24 May 2018 - 8:46pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP18_ Unsupervised Learning of Semantic Audio Representations.pdf

(47 downloads)

Keywords

Subscribe

[1] , "Unsupervised Learning of Semantic Audio Representations", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3208. Accessed: Jul. 15, 2018.
@article{3208-18,
url = {http://sigport.org/3208},
author = { },
publisher = {IEEE SigPort},
title = {Unsupervised Learning of Semantic Audio Representations},
year = {2018} }
TY - EJOUR
T1 - Unsupervised Learning of Semantic Audio Representations
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3208
ER -
. (2018). Unsupervised Learning of Semantic Audio Representations. IEEE SigPort. http://sigport.org/3208
, 2018. Unsupervised Learning of Semantic Audio Representations. Available at: http://sigport.org/3208.
. (2018). "Unsupervised Learning of Semantic Audio Representations." Web.
1. . Unsupervised Learning of Semantic Audio Representations [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3208

Low Complexity Joint Rate-Distortion Optimization of Prediction Units Couples for HEVC Intra Coding


HEVC is the latest block-based video compression standard, outperforming H.264/AVC by 50% bitrate savings for the same perceptual quality. An HEVC encoder provides Rate-Distortion optimization coding tools for block-wise compression. Because of complexity limitations, Rate-Distortion Optimization (RDO) is usually performed independently for each block, assuming coding efficiency losses to be negligible.

Paper Details

Authors:
Maxime Bichon, Julien Le Tanou, Michael Ropert, Wassim Hamidouche, Luce Morin, Lu Zhang
Submitted On:
9 May 2018 - 4:09am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

mbichon_ICASSP18_poster_portrait.pdf

(46 downloads)

Keywords

Subscribe

[1] Maxime Bichon, Julien Le Tanou, Michael Ropert, Wassim Hamidouche, Luce Morin, Lu Zhang, "Low Complexity Joint Rate-Distortion Optimization of Prediction Units Couples for HEVC Intra Coding", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3206. Accessed: Jul. 15, 2018.
@article{3206-18,
url = {http://sigport.org/3206},
author = {Maxime Bichon; Julien Le Tanou; Michael Ropert; Wassim Hamidouche; Luce Morin; Lu Zhang },
publisher = {IEEE SigPort},
title = {Low Complexity Joint Rate-Distortion Optimization of Prediction Units Couples for HEVC Intra Coding},
year = {2018} }
TY - EJOUR
T1 - Low Complexity Joint Rate-Distortion Optimization of Prediction Units Couples for HEVC Intra Coding
AU - Maxime Bichon; Julien Le Tanou; Michael Ropert; Wassim Hamidouche; Luce Morin; Lu Zhang
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3206
ER -
Maxime Bichon, Julien Le Tanou, Michael Ropert, Wassim Hamidouche, Luce Morin, Lu Zhang. (2018). Low Complexity Joint Rate-Distortion Optimization of Prediction Units Couples for HEVC Intra Coding. IEEE SigPort. http://sigport.org/3206
Maxime Bichon, Julien Le Tanou, Michael Ropert, Wassim Hamidouche, Luce Morin, Lu Zhang, 2018. Low Complexity Joint Rate-Distortion Optimization of Prediction Units Couples for HEVC Intra Coding. Available at: http://sigport.org/3206.
Maxime Bichon, Julien Le Tanou, Michael Ropert, Wassim Hamidouche, Luce Morin, Lu Zhang. (2018). "Low Complexity Joint Rate-Distortion Optimization of Prediction Units Couples for HEVC Intra Coding." Web.
1. Maxime Bichon, Julien Le Tanou, Michael Ropert, Wassim Hamidouche, Luce Morin, Lu Zhang. Low Complexity Joint Rate-Distortion Optimization of Prediction Units Couples for HEVC Intra Coding [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3206

SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION


In this paper, we present an algorithm called Reliable Mask Selection-Phase Difference Channel Weighting (RMS-PDCW) which selects the target source masked by a noise source using the Angle of Arrival (AoA) information calculated using the phase difference information. The RMS-PDCW algorithm selects masks to apply using the information about the localized sound source and the onset detection of speech.

Paper Details

Authors:
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern
Submitted On:
7 May 2018 - 12:38am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

icassp_4465_poster.pdf

(59 downloads)

Keywords

Subscribe

[1] Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern, "SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3203. Accessed: Jul. 15, 2018.
@article{3203-18,
url = {http://sigport.org/3203},
author = {Chanwoo Kim; Anjali Menon; Michiel Bacchiani ; Richard Stern },
publisher = {IEEE SigPort},
title = {SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION},
year = {2018} }
TY - EJOUR
T1 - SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION
AU - Chanwoo Kim; Anjali Menon; Michiel Bacchiani ; Richard Stern
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3203
ER -
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern. (2018). SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION. IEEE SigPort. http://sigport.org/3203
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern, 2018. SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION. Available at: http://sigport.org/3203.
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern. (2018). "SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION." Web.
1. Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern. SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3203

SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION


In this paper, we present an algorithm which introduces phase-perturbation to the training database when training phase-sensitive deep neural-network models. Traditional features such as log-mel or cepstral features do not have have any phase-relevant information.However features such as raw-waveform or complex spectra features contain phase-relevant information. Phase-sensitive features have the advantage of being able to detect differences in time of

Paper Details

Authors:
Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani
Submitted On:
7 May 2018 - 12:19am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp_4404_poster.pdf

(50 downloads)

Keywords

Subscribe

[1] Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani, "SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3202. Accessed: Jul. 15, 2018.
@article{3202-18,
url = {http://sigport.org/3202},
author = {Chanwoo Kim; Tara Sainath; Arun Narayanan; Ananya Misra; Rajeev Nongpiur; Michiel Bacchiani },
publisher = {IEEE SigPort},
title = {SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION},
year = {2018} }
TY - EJOUR
T1 - SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION
AU - Chanwoo Kim; Tara Sainath; Arun Narayanan; Ananya Misra; Rajeev Nongpiur; Michiel Bacchiani
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3202
ER -
Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani. (2018). SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/3202
Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani, 2018. SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION. Available at: http://sigport.org/3202.
Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani. (2018). "SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION." Web.
1. Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani. SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3202

WAVENET BASED LOW RATE SPEECH CODING


Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty.

Paper Details

Authors:
W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters
Submitted On:
4 May 2018 - 2:28pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

WaveNetCoding_b.pdf

(62 downloads)

Keywords

Subscribe

[1] W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters, "WAVENET BASED LOW RATE SPEECH CODING", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3201. Accessed: Jul. 15, 2018.
@article{3201-18,
url = {http://sigport.org/3201},
author = {W. Bastiaan Kleijn; Felicia S. C. Lim; Alejandro Luebs; Jan Skoglund; Florian Stimberg; Quan Wang; Thomas C. Walters },
publisher = {IEEE SigPort},
title = {WAVENET BASED LOW RATE SPEECH CODING},
year = {2018} }
TY - EJOUR
T1 - WAVENET BASED LOW RATE SPEECH CODING
AU - W. Bastiaan Kleijn; Felicia S. C. Lim; Alejandro Luebs; Jan Skoglund; Florian Stimberg; Quan Wang; Thomas C. Walters
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3201
ER -
W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters. (2018). WAVENET BASED LOW RATE SPEECH CODING. IEEE SigPort. http://sigport.org/3201
W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters, 2018. WAVENET BASED LOW RATE SPEECH CODING. Available at: http://sigport.org/3201.
W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters. (2018). "WAVENET BASED LOW RATE SPEECH CODING." Web.
1. W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters. WAVENET BASED LOW RATE SPEECH CODING [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3201

PLUG-IN MEASURE-TRANSFORMED QUASI-LIKELIHOOD RATIO TEST FOR RANDOM SIGNAL DETECTION


Recently, we developed a robust generalization of the Gaussian quasi-likelihood ratio test (GQLRT). This generalization, called measure-transformed GQLRT (MT-GQLRT), operates by selecting a Gaussian model that best empirically fits a transformed probability measure of the data. In this letter, a plug-in version of the MT-GQLRT is developed for robust detection of a random signal in nonspherical noise. The proposed detector is derived by plugging an empirical measure-transformed noise covariance, ob- tained from noise-only secondary data, into the MT-GQLRT.

Paper Details

Authors:
Nir Halay, Koby Todros
Submitted On:
2 May 2018 - 3:30pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_2018_POSTER_VER_3.pdf

(45 downloads)

Keywords

Subscribe

[1] Nir Halay, Koby Todros, "PLUG-IN MEASURE-TRANSFORMED QUASI-LIKELIHOOD RATIO TEST FOR RANDOM SIGNAL DETECTION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3200. Accessed: Jul. 15, 2018.
@article{3200-18,
url = {http://sigport.org/3200},
author = {Nir Halay; Koby Todros },
publisher = {IEEE SigPort},
title = {PLUG-IN MEASURE-TRANSFORMED QUASI-LIKELIHOOD RATIO TEST FOR RANDOM SIGNAL DETECTION},
year = {2018} }
TY - EJOUR
T1 - PLUG-IN MEASURE-TRANSFORMED QUASI-LIKELIHOOD RATIO TEST FOR RANDOM SIGNAL DETECTION
AU - Nir Halay; Koby Todros
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3200
ER -
Nir Halay, Koby Todros. (2018). PLUG-IN MEASURE-TRANSFORMED QUASI-LIKELIHOOD RATIO TEST FOR RANDOM SIGNAL DETECTION. IEEE SigPort. http://sigport.org/3200
Nir Halay, Koby Todros, 2018. PLUG-IN MEASURE-TRANSFORMED QUASI-LIKELIHOOD RATIO TEST FOR RANDOM SIGNAL DETECTION. Available at: http://sigport.org/3200.
Nir Halay, Koby Todros. (2018). "PLUG-IN MEASURE-TRANSFORMED QUASI-LIKELIHOOD RATIO TEST FOR RANDOM SIGNAL DETECTION." Web.
1. Nir Halay, Koby Todros. PLUG-IN MEASURE-TRANSFORMED QUASI-LIKELIHOOD RATIO TEST FOR RANDOM SIGNAL DETECTION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3200

Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing


Recently, several papers have demonstrated that neural networks (NN) are able to perform the feature extraction as part of the acoustic model. Motivated by the Gammatone feature extraction pipeline, in this paper we extend the waveform based NN model by a sec- ond level of time-convolutional element. The proposed extension generalizes the envelope extraction block, and allows the model to learn multi-resolutional representations.

Paper Details

Authors:
Zoltán Tüske, Ralf Schlüter, Hermann Ney
Submitted On:
2 May 2018 - 3:00pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

slides-template.pdf

(63 downloads)

Keywords

Subscribe

[1] Zoltán Tüske, Ralf Schlüter, Hermann Ney, "Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3199. Accessed: Jul. 15, 2018.
@article{3199-18,
url = {http://sigport.org/3199},
author = {Zoltán Tüske; Ralf Schlüter; Hermann Ney },
publisher = {IEEE SigPort},
title = {Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing},
year = {2018} }
TY - EJOUR
T1 - Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing
AU - Zoltán Tüske; Ralf Schlüter; Hermann Ney
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3199
ER -
Zoltán Tüske, Ralf Schlüter, Hermann Ney. (2018). Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing. IEEE SigPort. http://sigport.org/3199
Zoltán Tüske, Ralf Schlüter, Hermann Ney, 2018. Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing. Available at: http://sigport.org/3199.
Zoltán Tüske, Ralf Schlüter, Hermann Ney. (2018). "Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing." Web.
1. Zoltán Tüske, Ralf Schlüter, Hermann Ney. Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3199

Pages