Sorry, you need to enable JavaScript to visit this website.

Audio and Acoustic Signal Processing

MULTI-VIEW AUDIO-ARTICULATORY FEATURES FOR PHONETIC RECOGNITION ON RTMRI-TIMIT DATABASE


In this paper, we investigate the use of articulatory informa-
tion, and more specifically real time Magnetic Resonance
Imaging (rtMRI) data of the vocal tract, to improve speech
recognition performance. For the purpose of our experiments,
we use data from the rtMRI-TIMIT database. Firstly, Scale
Invariant Feature Transform (SIFT) features are extracted for
each video frame. Afterwards, the SIFT descriptors of each
frame are transformed to a single histogram per picture, by
using the Bag of Visual Words methodology. Since this kind

Paper Details

Authors:
Ioannis Douros, Athanasios Katsamanis, Petros Maragos
Submitted On:
13 April 2018 - 2:13pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:

Document Files

ICASSP_2018_poster_final.pdf

(296 downloads)

Subscribe

[1] Ioannis Douros, Athanasios Katsamanis, Petros Maragos, "MULTI-VIEW AUDIO-ARTICULATORY FEATURES FOR PHONETIC RECOGNITION ON RTMRI-TIMIT DATABASE", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2734. Accessed: Nov. 19, 2018.
@article{2734-18,
url = {http://sigport.org/2734},
author = {Ioannis Douros; Athanasios Katsamanis; Petros Maragos },
publisher = {IEEE SigPort},
title = {MULTI-VIEW AUDIO-ARTICULATORY FEATURES FOR PHONETIC RECOGNITION ON RTMRI-TIMIT DATABASE},
year = {2018} }
TY - EJOUR
T1 - MULTI-VIEW AUDIO-ARTICULATORY FEATURES FOR PHONETIC RECOGNITION ON RTMRI-TIMIT DATABASE
AU - Ioannis Douros; Athanasios Katsamanis; Petros Maragos
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2734
ER -
Ioannis Douros, Athanasios Katsamanis, Petros Maragos. (2018). MULTI-VIEW AUDIO-ARTICULATORY FEATURES FOR PHONETIC RECOGNITION ON RTMRI-TIMIT DATABASE. IEEE SigPort. http://sigport.org/2734
Ioannis Douros, Athanasios Katsamanis, Petros Maragos, 2018. MULTI-VIEW AUDIO-ARTICULATORY FEATURES FOR PHONETIC RECOGNITION ON RTMRI-TIMIT DATABASE. Available at: http://sigport.org/2734.
Ioannis Douros, Athanasios Katsamanis, Petros Maragos. (2018). "MULTI-VIEW AUDIO-ARTICULATORY FEATURES FOR PHONETIC RECOGNITION ON RTMRI-TIMIT DATABASE." Web.
1. Ioannis Douros, Athanasios Katsamanis, Petros Maragos. MULTI-VIEW AUDIO-ARTICULATORY FEATURES FOR PHONETIC RECOGNITION ON RTMRI-TIMIT DATABASE [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2734

A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS


Speech recognition in digital assistants such as Google Assistant can
potentially benefit from the use of conversational context consisting of user
queries and responses from the agent. We explore the use of recurrent,
Long Short-Term Memory (LSTM), neural language models (LMs) to model the conversations
in a digital assistant. Our proposed methods effectively capture the context of
previous utterances in a conversation without modifying the underlying LSTM
architecture. We demonstrate a 4% relative improvement in recognition performance

Paper Details

Authors:
Eunjoon Cho, Shankar Kumar
Submitted On:
13 April 2018 - 1:19pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

conversation.pdf

(82 downloads)

Subscribe

[1] Eunjoon Cho, Shankar Kumar, "A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2732. Accessed: Nov. 19, 2018.
@article{2732-18,
url = {http://sigport.org/2732},
author = {Eunjoon Cho; Shankar Kumar },
publisher = {IEEE SigPort},
title = {A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS},
year = {2018} }
TY - EJOUR
T1 - A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS
AU - Eunjoon Cho; Shankar Kumar
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2732
ER -
Eunjoon Cho, Shankar Kumar. (2018). A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS. IEEE SigPort. http://sigport.org/2732
Eunjoon Cho, Shankar Kumar, 2018. A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS. Available at: http://sigport.org/2732.
Eunjoon Cho, Shankar Kumar. (2018). "A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS." Web.
1. Eunjoon Cho, Shankar Kumar. A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2732

USING ACCELEROMETRIC AND GYROSCOPIC DATA TO IMPROVE BLOOD PRESSURE PREDICTION FROM PULSE TRANSIT TIME USING RECURRENT NEURAL NETWORK

Paper Details

Authors:
Shrimanti Ghosh, Ankur Banerjee, Nilanjan Ray, Peter W Wood, Pierre Boulanger, Raj Padwal
Submitted On:
13 April 2018 - 12:27pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_2018_Poster.pdf

(54 downloads)

Subscribe

[1] Shrimanti Ghosh, Ankur Banerjee, Nilanjan Ray, Peter W Wood, Pierre Boulanger, Raj Padwal, "USING ACCELEROMETRIC AND GYROSCOPIC DATA TO IMPROVE BLOOD PRESSURE PREDICTION FROM PULSE TRANSIT TIME USING RECURRENT NEURAL NETWORK", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2727. Accessed: Nov. 19, 2018.
@article{2727-18,
url = {http://sigport.org/2727},
author = {Shrimanti Ghosh; Ankur Banerjee; Nilanjan Ray; Peter W Wood; Pierre Boulanger; Raj Padwal },
publisher = {IEEE SigPort},
title = {USING ACCELEROMETRIC AND GYROSCOPIC DATA TO IMPROVE BLOOD PRESSURE PREDICTION FROM PULSE TRANSIT TIME USING RECURRENT NEURAL NETWORK},
year = {2018} }
TY - EJOUR
T1 - USING ACCELEROMETRIC AND GYROSCOPIC DATA TO IMPROVE BLOOD PRESSURE PREDICTION FROM PULSE TRANSIT TIME USING RECURRENT NEURAL NETWORK
AU - Shrimanti Ghosh; Ankur Banerjee; Nilanjan Ray; Peter W Wood; Pierre Boulanger; Raj Padwal
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2727
ER -
Shrimanti Ghosh, Ankur Banerjee, Nilanjan Ray, Peter W Wood, Pierre Boulanger, Raj Padwal. (2018). USING ACCELEROMETRIC AND GYROSCOPIC DATA TO IMPROVE BLOOD PRESSURE PREDICTION FROM PULSE TRANSIT TIME USING RECURRENT NEURAL NETWORK. IEEE SigPort. http://sigport.org/2727
Shrimanti Ghosh, Ankur Banerjee, Nilanjan Ray, Peter W Wood, Pierre Boulanger, Raj Padwal, 2018. USING ACCELEROMETRIC AND GYROSCOPIC DATA TO IMPROVE BLOOD PRESSURE PREDICTION FROM PULSE TRANSIT TIME USING RECURRENT NEURAL NETWORK. Available at: http://sigport.org/2727.
Shrimanti Ghosh, Ankur Banerjee, Nilanjan Ray, Peter W Wood, Pierre Boulanger, Raj Padwal. (2018). "USING ACCELEROMETRIC AND GYROSCOPIC DATA TO IMPROVE BLOOD PRESSURE PREDICTION FROM PULSE TRANSIT TIME USING RECURRENT NEURAL NETWORK." Web.
1. Shrimanti Ghosh, Ankur Banerjee, Nilanjan Ray, Peter W Wood, Pierre Boulanger, Raj Padwal. USING ACCELEROMETRIC AND GYROSCOPIC DATA TO IMPROVE BLOOD PRESSURE PREDICTION FROM PULSE TRANSIT TIME USING RECURRENT NEURAL NETWORK [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2727

Grid-Free Direction-of-Arrival Estimation with Compressed Sensing and Arbitrary Antenna Arrays


We study the problem of direction of arrival estimation for arbitrary antenna
arrays. We formulate it as a continuous line spectral estimation problem and solve it under
a sparsity prior without any gridding assumptions. Moreover, we incorporate the
array's beampattern in form of the Effective Aperture Distribution Function
(EADF), which allows to use arbitrary (synthetic as well as measured) antenna
arrays. This generalizes known atomic norm based grid-free DOA estimation methods (that

Paper Details

Authors:
Florian Roemer, Thomas Hotz, Giovanni Del Galdo
Submitted On:
13 April 2018 - 9:54am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

slides_main.pdf

(71 downloads)

Keywords

Additional Categories

Subscribe

[1] Florian Roemer, Thomas Hotz, Giovanni Del Galdo, "Grid-Free Direction-of-Arrival Estimation with Compressed Sensing and Arbitrary Antenna Arrays", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2705. Accessed: Nov. 19, 2018.
@article{2705-18,
url = {http://sigport.org/2705},
author = {Florian Roemer; Thomas Hotz; Giovanni Del Galdo },
publisher = {IEEE SigPort},
title = {Grid-Free Direction-of-Arrival Estimation with Compressed Sensing and Arbitrary Antenna Arrays},
year = {2018} }
TY - EJOUR
T1 - Grid-Free Direction-of-Arrival Estimation with Compressed Sensing and Arbitrary Antenna Arrays
AU - Florian Roemer; Thomas Hotz; Giovanni Del Galdo
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2705
ER -
Florian Roemer, Thomas Hotz, Giovanni Del Galdo. (2018). Grid-Free Direction-of-Arrival Estimation with Compressed Sensing and Arbitrary Antenna Arrays. IEEE SigPort. http://sigport.org/2705
Florian Roemer, Thomas Hotz, Giovanni Del Galdo, 2018. Grid-Free Direction-of-Arrival Estimation with Compressed Sensing and Arbitrary Antenna Arrays. Available at: http://sigport.org/2705.
Florian Roemer, Thomas Hotz, Giovanni Del Galdo. (2018). "Grid-Free Direction-of-Arrival Estimation with Compressed Sensing and Arbitrary Antenna Arrays." Web.
1. Florian Roemer, Thomas Hotz, Giovanni Del Galdo. Grid-Free Direction-of-Arrival Estimation with Compressed Sensing and Arbitrary Antenna Arrays [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2705

REAL-TIME TOTAL FOCUSING METHOD FOR ULTRASONIC IMAGING OF MULTILAYERED OBJECT

Paper Details

Authors:
Wenkai Cui, Kaihuai Qin
Submitted On:
13 April 2018 - 8:22am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

2265_lecture

(60 downloads)

Subscribe

[1] Wenkai Cui, Kaihuai Qin, "REAL-TIME TOTAL FOCUSING METHOD FOR ULTRASONIC IMAGING OF MULTILAYERED OBJECT", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2697. Accessed: Nov. 19, 2018.
@article{2697-18,
url = {http://sigport.org/2697},
author = {Wenkai Cui; Kaihuai Qin },
publisher = {IEEE SigPort},
title = {REAL-TIME TOTAL FOCUSING METHOD FOR ULTRASONIC IMAGING OF MULTILAYERED OBJECT},
year = {2018} }
TY - EJOUR
T1 - REAL-TIME TOTAL FOCUSING METHOD FOR ULTRASONIC IMAGING OF MULTILAYERED OBJECT
AU - Wenkai Cui; Kaihuai Qin
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2697
ER -
Wenkai Cui, Kaihuai Qin. (2018). REAL-TIME TOTAL FOCUSING METHOD FOR ULTRASONIC IMAGING OF MULTILAYERED OBJECT. IEEE SigPort. http://sigport.org/2697
Wenkai Cui, Kaihuai Qin, 2018. REAL-TIME TOTAL FOCUSING METHOD FOR ULTRASONIC IMAGING OF MULTILAYERED OBJECT. Available at: http://sigport.org/2697.
Wenkai Cui, Kaihuai Qin. (2018). "REAL-TIME TOTAL FOCUSING METHOD FOR ULTRASONIC IMAGING OF MULTILAYERED OBJECT." Web.
1. Wenkai Cui, Kaihuai Qin. REAL-TIME TOTAL FOCUSING METHOD FOR ULTRASONIC IMAGING OF MULTILAYERED OBJECT [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2697

REAL-TIME TOTAL FOCUSING METHOD IMAGING FOR ULTRASONIC INSPECTION OF THREE-DIMENSIONAL MULTILAYERED MEDIA

Paper Details

Authors:
Wenkai Cui, Kaihuai Qin
Submitted On:
13 April 2018 - 8:16am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

1502_Poster

(712 downloads)

Subscribe

[1] Wenkai Cui, Kaihuai Qin, "REAL-TIME TOTAL FOCUSING METHOD IMAGING FOR ULTRASONIC INSPECTION OF THREE-DIMENSIONAL MULTILAYERED MEDIA", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2695. Accessed: Nov. 19, 2018.
@article{2695-18,
url = {http://sigport.org/2695},
author = {Wenkai Cui; Kaihuai Qin },
publisher = {IEEE SigPort},
title = {REAL-TIME TOTAL FOCUSING METHOD IMAGING FOR ULTRASONIC INSPECTION OF THREE-DIMENSIONAL MULTILAYERED MEDIA},
year = {2018} }
TY - EJOUR
T1 - REAL-TIME TOTAL FOCUSING METHOD IMAGING FOR ULTRASONIC INSPECTION OF THREE-DIMENSIONAL MULTILAYERED MEDIA
AU - Wenkai Cui; Kaihuai Qin
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2695
ER -
Wenkai Cui, Kaihuai Qin. (2018). REAL-TIME TOTAL FOCUSING METHOD IMAGING FOR ULTRASONIC INSPECTION OF THREE-DIMENSIONAL MULTILAYERED MEDIA. IEEE SigPort. http://sigport.org/2695
Wenkai Cui, Kaihuai Qin, 2018. REAL-TIME TOTAL FOCUSING METHOD IMAGING FOR ULTRASONIC INSPECTION OF THREE-DIMENSIONAL MULTILAYERED MEDIA. Available at: http://sigport.org/2695.
Wenkai Cui, Kaihuai Qin. (2018). "REAL-TIME TOTAL FOCUSING METHOD IMAGING FOR ULTRASONIC INSPECTION OF THREE-DIMENSIONAL MULTILAYERED MEDIA." Web.
1. Wenkai Cui, Kaihuai Qin. REAL-TIME TOTAL FOCUSING METHOD IMAGING FOR ULTRASONIC INSPECTION OF THREE-DIMENSIONAL MULTILAYERED MEDIA [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2695

Human and Machine Speaker Recognition on Short Trivial Events


In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, leading to acceptable equal error rates (EERs) ranging from 5% to 15% despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, ‘hmm’ seems more speaker discriminative.

trivial.pdf

PDF icon trivial.pdf (60 downloads)

Paper Details

Authors:
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai
Submitted On:
13 April 2018 - 6:58am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

trivial.pdf

(60 downloads)

Subscribe

[1] Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai, "Human and Machine Speaker Recognition on Short Trivial Events", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2687. Accessed: Nov. 19, 2018.
@article{2687-18,
url = {http://sigport.org/2687},
author = {Miao Zhang; Xiaofei Kang; Yanqing Wang; Lantian Li; Zhuyuan Tang; Haisheng Dai },
publisher = {IEEE SigPort},
title = {Human and Machine Speaker Recognition on Short Trivial Events},
year = {2018} }
TY - EJOUR
T1 - Human and Machine Speaker Recognition on Short Trivial Events
AU - Miao Zhang; Xiaofei Kang; Yanqing Wang; Lantian Li; Zhuyuan Tang; Haisheng Dai
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2687
ER -
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai. (2018). Human and Machine Speaker Recognition on Short Trivial Events. IEEE SigPort. http://sigport.org/2687
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai, 2018. Human and Machine Speaker Recognition on Short Trivial Events. Available at: http://sigport.org/2687.
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai. (2018). "Human and Machine Speaker Recognition on Short Trivial Events." Web.
1. Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhuyuan Tang, Haisheng Dai. Human and Machine Speaker Recognition on Short Trivial Events [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2687

Human and Machine Speaker Recognition on Short Trivial Events


In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, leading to acceptable equal error rates (EERs) ranging from 5% to 15% despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, ‘hmm’ seems more speaker discriminative.

trivial.pdf

PDF icon trivial.pdf (60 downloads)

Paper Details

Authors:
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai
Submitted On:
13 April 2018 - 6:58am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

trivial.pdf

(60 downloads)

Subscribe

[1] Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai, "Human and Machine Speaker Recognition on Short Trivial Events", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2684. Accessed: Nov. 19, 2018.
@article{2684-18,
url = {http://sigport.org/2684},
author = {Miao Zhang; Xiaofei Kang; Yanqing Wang; Lantian Li; Zhiyuan Tang; Haisheng Dai },
publisher = {IEEE SigPort},
title = {Human and Machine Speaker Recognition on Short Trivial Events},
year = {2018} }
TY - EJOUR
T1 - Human and Machine Speaker Recognition on Short Trivial Events
AU - Miao Zhang; Xiaofei Kang; Yanqing Wang; Lantian Li; Zhiyuan Tang; Haisheng Dai
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2684
ER -
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai. (2018). Human and Machine Speaker Recognition on Short Trivial Events. IEEE SigPort. http://sigport.org/2684
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai, 2018. Human and Machine Speaker Recognition on Short Trivial Events. Available at: http://sigport.org/2684.
Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai. (2018). "Human and Machine Speaker Recognition on Short Trivial Events." Web.
1. Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai. Human and Machine Speaker Recognition on Short Trivial Events [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2684

Passive online geometry calibration of acoustic sensor networks


As we are surrounded by an increased number of mobile devices equipped with wireless links and multiple microphones, e.g., smartphones, tablets, laptops and hearing aids, using them collaboratively for acoustic processing is a promising platform for emerging applications.

Paper Details

Authors:
Axel Plinge, Gernot A. Fink, Sharon Gannot
Submitted On:
13 April 2018 - 5:46am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

pog-poster-v6.pdf

(101 downloads)

Subscribe

[1] Axel Plinge, Gernot A. Fink, Sharon Gannot, "Passive online geometry calibration of acoustic sensor networks", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2673. Accessed: Nov. 19, 2018.
@article{2673-18,
url = {http://sigport.org/2673},
author = {Axel Plinge; Gernot A. Fink; Sharon Gannot },
publisher = {IEEE SigPort},
title = {Passive online geometry calibration of acoustic sensor networks},
year = {2018} }
TY - EJOUR
T1 - Passive online geometry calibration of acoustic sensor networks
AU - Axel Plinge; Gernot A. Fink; Sharon Gannot
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2673
ER -
Axel Plinge, Gernot A. Fink, Sharon Gannot. (2018). Passive online geometry calibration of acoustic sensor networks. IEEE SigPort. http://sigport.org/2673
Axel Plinge, Gernot A. Fink, Sharon Gannot, 2018. Passive online geometry calibration of acoustic sensor networks. Available at: http://sigport.org/2673.
Axel Plinge, Gernot A. Fink, Sharon Gannot. (2018). "Passive online geometry calibration of acoustic sensor networks." Web.
1. Axel Plinge, Gernot A. Fink, Sharon Gannot. Passive online geometry calibration of acoustic sensor networks [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2673

DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION


Deep learning is still not a very common tool in speaker verification field. We study deep convolutional neural network performance in the text-prompted speaker verification task. The prompted passphrase is segmented into word states — i.e. digits — to test each digit utterance separately. We train a single high-level feature extractor for all states and use cosine similarity metric for scoring. The key feature of our network is the Max-Feature-Map activation function, which acts as an embedded feature selector.

Paper Details

Authors:
Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva
Submitted On:
13 April 2018 - 5:08am
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

Novoselov_ICASSP-2018_validated.pdf

(89 downloads)

Subscribe

[1] Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva, "DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2664. Accessed: Nov. 19, 2018.
@article{2664-18,
url = {http://sigport.org/2664},
author = {Oleg Kudashev; Vadim Shchemelinin; Ivan Kremnev; Galina Lavrentyeva },
publisher = {IEEE SigPort},
title = {DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION},
year = {2018} }
TY - EJOUR
T1 - DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION
AU - Oleg Kudashev; Vadim Shchemelinin; Ivan Kremnev; Galina Lavrentyeva
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2664
ER -
Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva. (2018). DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION. IEEE SigPort. http://sigport.org/2664
Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva, 2018. DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION. Available at: http://sigport.org/2664.
Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva. (2018). "DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION." Web.
1. Oleg Kudashev, Vadim Shchemelinin, Ivan Kremnev, Galina Lavrentyeva. DEEP CNN BASED FEATURE EXTRACTOR FOR TEXT-PROMPTED SPEAKER RECOGNITION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2664

Pages