Sorry, you need to enable JavaScript to visit this website.

Audio and Acoustic Signal Processing

Human and Machine Speaker Recognition on Short Trivial Events


In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, leading to acceptable equal error rates (EERs) ranging from 5% to 15% despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, ‘hmm’ seems more speaker discriminative.

trivial.pdf

PDF icon trivial.pdf (32 downloads)

Paper Details

Authors:
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai
Submitted On:
13 April 2018 - 6:58am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

trivial.pdf

(32 downloads)

Subscribe

[1] Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai, "Human and Machine Speaker Recognition on Short Trivial Events", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2687. Accessed: Jul. 23, 2018.
@article{2687-18,
url = {http://sigport.org/2687},
author = {Miao Zhang; Xiaofei Kang; Yanqing Wang; Lantian Li; Zhuyuan Tang; Haisheng Dai },
publisher = {IEEE SigPort},
title = {Human and Machine Speaker Recognition on Short Trivial Events},
year = {2018} }
TY - EJOUR
T1 - Human and Machine Speaker Recognition on Short Trivial Events
AU - Miao Zhang; Xiaofei Kang; Yanqing Wang; Lantian Li; Zhuyuan Tang; Haisheng Dai
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2687
ER -
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai. (2018). Human and Machine Speaker Recognition on Short Trivial Events. IEEE SigPort. http://sigport.org/2687
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai, 2018. Human and Machine Speaker Recognition on Short Trivial Events. Available at: http://sigport.org/2687.
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai. (2018). "Human and Machine Speaker Recognition on Short Trivial Events." Web.
1. Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai. Human and Machine Speaker Recognition on Short Trivial Events [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2687

Human and Machine Speaker Recognition on Short Trivial Events


In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, leading to acceptable equal error rates (EERs) ranging from 5% to 15% despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, ‘hmm’ seems more speaker discriminative.

trivial.pdf

PDF icon trivial.pdf (37 downloads)

Paper Details

Authors:
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai
Submitted On:
13 April 2018 - 6:58am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

trivial.pdf

(37 downloads)

Subscribe

[1] Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai, "Human and Machine Speaker Recognition on Short Trivial Events", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2684. Accessed: Jul. 23, 2018.
@article{2684-18,
url = {http://sigport.org/2684},
author = {Miao Zhang; Xiaofei Kang; Yanqing Wang; Lantian Li; Zhiyuan Tang; Haisheng Dai },
publisher = {IEEE SigPort},
title = {Human and Machine Speaker Recognition on Short Trivial Events},
year = {2018} }
TY - EJOUR
T1 - Human and Machine Speaker Recognition on Short Trivial Events
AU - Miao Zhang; Xiaofei Kang; Yanqing Wang; Lantian Li; Zhiyuan Tang; Haisheng Dai
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2684
ER -
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai. (2018). Human and Machine Speaker Recognition on Short Trivial Events. IEEE SigPort. http://sigport.org/2684
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai, 2018. Human and Machine Speaker Recognition on Short Trivial Events. Available at: http://sigport.org/2684.
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai. (2018). "Human and Machine Speaker Recognition on Short Trivial Events." Web.
1. Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai. Human and Machine Speaker Recognition on Short Trivial Events [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2684

Passive online geometry calibration of acoustic sensor networks


As we are surrounded by an increased number of mobile devices equipped with wireless links and multiple microphones, e.g., smartphones, tablets, laptops and hearing aids, using them collaboratively for acoustic processing is a promising platform for emerging applications.

Paper Details

Authors:
Axel Plinge, Gernot A. Fink, Sharon Gannot
Submitted On:
13 April 2018 - 5:46am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

pog-poster-v6.pdf

(34 downloads)

Subscribe

[1] Axel Plinge, Gernot A. Fink, Sharon Gannot, "Passive online geometry calibration of acoustic sensor networks", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2673. Accessed: Jul. 23, 2018.
@article{2673-18,
url = {http://sigport.org/2673},
author = {Axel Plinge; Gernot A. Fink; Sharon Gannot },
publisher = {IEEE SigPort},
title = {Passive online geometry calibration of acoustic sensor networks},
year = {2018} }
TY - EJOUR
T1 - Passive online geometry calibration of acoustic sensor networks
AU - Axel Plinge; Gernot A. Fink; Sharon Gannot
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2673
ER -
Axel Plinge, Gernot A. Fink, Sharon Gannot. (2018). Passive online geometry calibration of acoustic sensor networks. IEEE SigPort. http://sigport.org/2673
Axel Plinge, Gernot A. Fink, Sharon Gannot, 2018. Passive online geometry calibration of acoustic sensor networks. Available at: http://sigport.org/2673.
Axel Plinge, Gernot A. Fink, Sharon Gannot. (2018). "Passive online geometry calibration of acoustic sensor networks." Web.
1. Axel Plinge, Gernot A. Fink, Sharon Gannot. Passive online geometry calibration of acoustic sensor networks [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2673

DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION


Deep learning is still not a very common tool in speaker verification field. We study deep convolutional neural network performance in the text-prompted speaker verification task. The prompted passphrase is segmented into word states — i.e. digits — to test each digit utterance separately. We train a single high-level feature extractor for all states and use cosine similarity metric for scoring. The key feature of our network is the Max-Feature-Map activation function, which acts as an embedded feature selector.

Paper Details

Authors:
Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva
Submitted On:
13 April 2018 - 5:08am
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

Novoselov_ICASSP-2018_validated.pdf

(33 downloads)

Subscribe

[1] Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva, "DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2664. Accessed: Jul. 23, 2018.
@article{2664-18,
url = {http://sigport.org/2664},
author = {Oleg Kudashev; Vadim Shchemelinin; Ivan Kremnev; Galina Lavrentyeva },
publisher = {IEEE SigPort},
title = {DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION},
year = {2018} }
TY - EJOUR
T1 - DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION
AU - Oleg Kudashev; Vadim Shchemelinin; Ivan Kremnev; Galina Lavrentyeva
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2664
ER -
Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva. (2018). DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION. IEEE SigPort. http://sigport.org/2664
Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva, 2018. DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION. Available at: http://sigport.org/2664.
Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva. (2018). "DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION." Web.
1. Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva. DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2664

Being low-rank in the time-frequency plane


When using optimization methods with matrix variables in signal processing and machine learning, it is customary to assume some low-rank prior on the targeted solution. Nonnegative matrix factorization of spectrograms is a case in point in audio signal processing. However, this low-rank prior is not straightforwardly related to complex matrices obtained from a short-time Fourier -- or discrete Gabor -- transform (STFT), which is generally defined from and studied based on a modulation operator and a translation operator applied to a so-called window.

Paper Details

Authors:
Valentin Emiya, Ronan Hamon, Caroline Chaux
Submitted On:
13 April 2018 - 4:18am
Short Link:
Type:
Event:
Document Year:
Cite

Document Files

2018_icassp_poster.pdf

(33 downloads)

Subscribe

[1] Valentin Emiya, Ronan Hamon, Caroline Chaux, "Being low-rank in the time-frequency plane", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2651. Accessed: Jul. 23, 2018.
@article{2651-18,
url = {http://sigport.org/2651},
author = {Valentin Emiya; Ronan Hamon; Caroline Chaux },
publisher = {IEEE SigPort},
title = {Being low-rank in the time-frequency plane},
year = {2018} }
TY - EJOUR
T1 - Being low-rank in the time-frequency plane
AU - Valentin Emiya; Ronan Hamon; Caroline Chaux
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2651
ER -
Valentin Emiya, Ronan Hamon, Caroline Chaux. (2018). Being low-rank in the time-frequency plane. IEEE SigPort. http://sigport.org/2651
Valentin Emiya, Ronan Hamon, Caroline Chaux, 2018. Being low-rank in the time-frequency plane. Available at: http://sigport.org/2651.
Valentin Emiya, Ronan Hamon, Caroline Chaux. (2018). "Being low-rank in the time-frequency plane." Web.
1. Valentin Emiya, Ronan Hamon, Caroline Chaux. Being low-rank in the time-frequency plane [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2651

EADNET: EFFICIENT ARCHITECTURE FOR DECOMPOSED CONVOLUTIONAL NEURAL NETWORKS


CPD.pdf

PDF icon CPD.pdf (33 downloads)

Paper Details

Authors:
Fangxuan Sun; Jun Lin; Zhongfeng Wang
Submitted On:
13 April 2018 - 2:49am
Short Link:
Type:
Paper Code:
Document Year:
Cite

Document Files

CPD.pdf

(33 downloads)

Subscribe

[1] Fangxuan Sun; Jun Lin; Zhongfeng Wang, "EADNET: EFFICIENT ARCHITECTURE FOR DECOMPOSED CONVOLUTIONAL NEURAL NETWORKS", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2626. Accessed: Jul. 23, 2018.
@article{2626-18,
url = {http://sigport.org/2626},
author = {Fangxuan Sun; Jun Lin; Zhongfeng Wang },
publisher = {IEEE SigPort},
title = {EADNET: EFFICIENT ARCHITECTURE FOR DECOMPOSED CONVOLUTIONAL NEURAL NETWORKS},
year = {2018} }
TY - EJOUR
T1 - EADNET: EFFICIENT ARCHITECTURE FOR DECOMPOSED CONVOLUTIONAL NEURAL NETWORKS
AU - Fangxuan Sun; Jun Lin; Zhongfeng Wang
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2626
ER -
Fangxuan Sun; Jun Lin; Zhongfeng Wang. (2018). EADNET: EFFICIENT ARCHITECTURE FOR DECOMPOSED CONVOLUTIONAL NEURAL NETWORKS. IEEE SigPort. http://sigport.org/2626
Fangxuan Sun; Jun Lin; Zhongfeng Wang, 2018. EADNET: EFFICIENT ARCHITECTURE FOR DECOMPOSED CONVOLUTIONAL NEURAL NETWORKS. Available at: http://sigport.org/2626.
Fangxuan Sun; Jun Lin; Zhongfeng Wang. (2018). "EADNET: EFFICIENT ARCHITECTURE FOR DECOMPOSED CONVOLUTIONAL NEURAL NETWORKS." Web.
1. Fangxuan Sun; Jun Lin; Zhongfeng Wang. EADNET: EFFICIENT ARCHITECTURE FOR DECOMPOSED CONVOLUTIONAL NEURAL NETWORKS [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2626

High-speed light field image formation analysis using wavefield modeling with flexible sampling


In order to understand the image formation inside plenoptic systems, a wave-optic-based model is proposed in this paper that uses the Fresnel diffraction equation to propagate the whole object field into the plenoptic systems. The proposed model is much flexible at sampling on propagation planes by utilizing the method of multiple partial propagations. In order to verify the effectiveness of the proposed model, numerical simulations are conducted by comparing with existing wave optic model under different optical configurations of plenoptic cameras.

Paper Details

Authors:
Xin Jin, Li Liu, Qionghai Dai
Submitted On:
13 April 2018 - 2:49am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2018Poster.pdf

(29 downloads)

Subscribe

[1] Xin Jin, Li Liu, Qionghai Dai, "High-speed light field image formation analysis using wavefield modeling with flexible sampling", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2625. Accessed: Jul. 23, 2018.
@article{2625-18,
url = {http://sigport.org/2625},
author = {Xin Jin; Li Liu; Qionghai Dai },
publisher = {IEEE SigPort},
title = {High-speed light field image formation analysis using wavefield modeling with flexible sampling},
year = {2018} }
TY - EJOUR
T1 - High-speed light field image formation analysis using wavefield modeling with flexible sampling
AU - Xin Jin; Li Liu; Qionghai Dai
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2625
ER -
Xin Jin, Li Liu, Qionghai Dai. (2018). High-speed light field image formation analysis using wavefield modeling with flexible sampling. IEEE SigPort. http://sigport.org/2625
Xin Jin, Li Liu, Qionghai Dai, 2018. High-speed light field image formation analysis using wavefield modeling with flexible sampling. Available at: http://sigport.org/2625.
Xin Jin, Li Liu, Qionghai Dai. (2018). "High-speed light field image formation analysis using wavefield modeling with flexible sampling." Web.
1. Xin Jin, Li Liu, Qionghai Dai. High-speed light field image formation analysis using wavefield modeling with flexible sampling [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2625

High-speed light field image formation analysis using wavefield modeling with flexible sampling


In order to understand the image formation inside plenoptic systems, a wave-optic-based model is proposed in this paper that uses the Fresnel diffraction equation to propagate the whole object field into the plenoptic systems. The proposed model is much flexible at sampling on propagation planes by utilizing the method of multiple partial propagations. In order to verify the effectiveness of the proposed model, numerical simulations are conducted by comparing with existing wave optic model under different optical configurations of plenoptic cameras.

Paper Details

Authors:
Xin Jin, Li Liu, Qionghai Dai
Submitted On:
13 April 2018 - 2:49am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2018Poster.pdf

(38 downloads)

Keywords

Additional Categories

Subscribe

[1] Xin Jin, Li Liu, Qionghai Dai, "High-speed light field image formation analysis using wavefield modeling with flexible sampling", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2624. Accessed: Jul. 23, 2018.
@article{2624-18,
url = {http://sigport.org/2624},
author = {Xin Jin; Li Liu; Qionghai Dai },
publisher = {IEEE SigPort},
title = {High-speed light field image formation analysis using wavefield modeling with flexible sampling},
year = {2018} }
TY - EJOUR
T1 - High-speed light field image formation analysis using wavefield modeling with flexible sampling
AU - Xin Jin; Li Liu; Qionghai Dai
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2624
ER -
Xin Jin, Li Liu, Qionghai Dai. (2018). High-speed light field image formation analysis using wavefield modeling with flexible sampling. IEEE SigPort. http://sigport.org/2624
Xin Jin, Li Liu, Qionghai Dai, 2018. High-speed light field image formation analysis using wavefield modeling with flexible sampling. Available at: http://sigport.org/2624.
Xin Jin, Li Liu, Qionghai Dai. (2018). "High-speed light field image formation analysis using wavefield modeling with flexible sampling." Web.
1. Xin Jin, Li Liu, Qionghai Dai. High-speed light field image formation analysis using wavefield modeling with flexible sampling [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2624

ROLE OF PROSODIC FEATURES ON CHILDREN’S SPEECH RECOGNITION

Paper Details

Authors:
Hemant K. Kathania, S. Shahnawazuddin , Nagaraj Adiga and Waquar Ahmad
Submitted On:
13 April 2018 - 12:39am
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_2018_poster_final.pdf

(36 downloads)

Subscribe

[1] Hemant K. Kathania, S. Shahnawazuddin , Nagaraj Adiga and Waquar Ahmad, "ROLE OF PROSODIC FEATURES ON CHILDREN’S SPEECH RECOGNITION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2588. Accessed: Jul. 23, 2018.
@article{2588-18,
url = {http://sigport.org/2588},
author = {Hemant K. Kathania; S. Shahnawazuddin ; Nagaraj Adiga and Waquar Ahmad },
publisher = {IEEE SigPort},
title = {ROLE OF PROSODIC FEATURES ON CHILDREN’S SPEECH RECOGNITION},
year = {2018} }
TY - EJOUR
T1 - ROLE OF PROSODIC FEATURES ON CHILDREN’S SPEECH RECOGNITION
AU - Hemant K. Kathania; S. Shahnawazuddin ; Nagaraj Adiga and Waquar Ahmad
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2588
ER -
Hemant K. Kathania, S. Shahnawazuddin , Nagaraj Adiga and Waquar Ahmad. (2018). ROLE OF PROSODIC FEATURES ON CHILDREN’S SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/2588
Hemant K. Kathania, S. Shahnawazuddin , Nagaraj Adiga and Waquar Ahmad, 2018. ROLE OF PROSODIC FEATURES ON CHILDREN’S SPEECH RECOGNITION. Available at: http://sigport.org/2588.
Hemant K. Kathania, S. Shahnawazuddin , Nagaraj Adiga and Waquar Ahmad. (2018). "ROLE OF PROSODIC FEATURES ON CHILDREN’S SPEECH RECOGNITION." Web.
1. Hemant K. Kathania, S. Shahnawazuddin , Nagaraj Adiga and Waquar Ahmad. ROLE OF PROSODIC FEATURES ON CHILDREN’S SPEECH RECOGNITION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2588

REDUCING MODEL COMPLEXITY FOR DNN BASED LARGE-SCALE AUDIO CLASSIFICATION

Paper Details

Authors:
Yuzhong WU, Tan LEE
Submitted On:
12 April 2018 - 11:39pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp2018_yzwu_poster_ver5.pdf

(37 downloads)

Subscribe

[1] Yuzhong WU, Tan LEE, "REDUCING MODEL COMPLEXITY FOR DNN BASED LARGE-SCALE AUDIO CLASSIFICATION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2572. Accessed: Jul. 23, 2018.
@article{2572-18,
url = {http://sigport.org/2572},
author = {Yuzhong WU; Tan LEE },
publisher = {IEEE SigPort},
title = {REDUCING MODEL COMPLEXITY FOR DNN BASED LARGE-SCALE AUDIO CLASSIFICATION},
year = {2018} }
TY - EJOUR
T1 - REDUCING MODEL COMPLEXITY FOR DNN BASED LARGE-SCALE AUDIO CLASSIFICATION
AU - Yuzhong WU; Tan LEE
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2572
ER -
Yuzhong WU, Tan LEE. (2018). REDUCING MODEL COMPLEXITY FOR DNN BASED LARGE-SCALE AUDIO CLASSIFICATION. IEEE SigPort. http://sigport.org/2572
Yuzhong WU, Tan LEE, 2018. REDUCING MODEL COMPLEXITY FOR DNN BASED LARGE-SCALE AUDIO CLASSIFICATION. Available at: http://sigport.org/2572.
Yuzhong WU, Tan LEE. (2018). "REDUCING MODEL COMPLEXITY FOR DNN BASED LARGE-SCALE AUDIO CLASSIFICATION." Web.
1. Yuzhong WU, Tan LEE. REDUCING MODEL COMPLEXITY FOR DNN BASED LARGE-SCALE AUDIO CLASSIFICATION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2572

Pages