Sorry, you need to enable JavaScript to visit this website.

ICASSP 2018

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Distributed Model Construction in Radio Interferometric Calibration

Paper Details

Authors:
Submitted On:
12 April 2018 - 3:13pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster

(110)

Subscribe

[1] , "Distributed Model Construction in Radio Interferometric Calibration", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2475. Accessed: May. 19, 2019.
@article{2475-18,
url = {http://sigport.org/2475},
author = { },
publisher = {IEEE SigPort},
title = {Distributed Model Construction in Radio Interferometric Calibration},
year = {2018} }
TY - EJOUR
T1 - Distributed Model Construction in Radio Interferometric Calibration
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2475
ER -
. (2018). Distributed Model Construction in Radio Interferometric Calibration. IEEE SigPort. http://sigport.org/2475
, 2018. Distributed Model Construction in Radio Interferometric Calibration. Available at: http://sigport.org/2475.
. (2018). "Distributed Model Construction in Radio Interferometric Calibration." Web.
1. . Distributed Model Construction in Radio Interferometric Calibration [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2475

Advancing Acoustic-to-Word CTC Model


The acoustic-to-word model based on the connectionist temporal classification (CTC) criterion was shown as a natural end-to-end (E2E) model directly targeting words as output units. However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node. Hence, such a word-based CTC model can only recognize the frequent words modeled by the network output nodes.

Paper Details

Authors:
Submitted On:
12 April 2018 - 3:12pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

AdvanceCTC_poster.pdf

(167)

Subscribe

[1] , "Advancing Acoustic-to-Word CTC Model", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2474. Accessed: May. 19, 2019.
@article{2474-18,
url = {http://sigport.org/2474},
author = { },
publisher = {IEEE SigPort},
title = {Advancing Acoustic-to-Word CTC Model},
year = {2018} }
TY - EJOUR
T1 - Advancing Acoustic-to-Word CTC Model
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2474
ER -
. (2018). Advancing Acoustic-to-Word CTC Model. IEEE SigPort. http://sigport.org/2474
, 2018. Advancing Acoustic-to-Word CTC Model. Available at: http://sigport.org/2474.
. (2018). "Advancing Acoustic-to-Word CTC Model." Web.
1. . Advancing Acoustic-to-Word CTC Model [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2474

DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING


In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system. Specifically, we use teacher-student (T/S) learning to adapt a close-talk well-trained production AM to far-field by using parallel close-talk and simulated far-field data. We also use T/S learning to compress a large-size KWS model into a small-size one to fit the device computational cost. Without the need of transcription, T/S learning well utilizes untranscribed data to boost the model performance in both the AM adaptation and KWS model compression.

Paper Details

Authors:
Submitted On:
12 April 2018 - 3:03pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

speaker_poster.pdf

(110)

Subscribe

[1] , "DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2473. Accessed: May. 19, 2019.
@article{2473-18,
url = {http://sigport.org/2473},
author = { },
publisher = {IEEE SigPort},
title = {DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING},
year = {2018} }
TY - EJOUR
T1 - DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2473
ER -
. (2018). DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING. IEEE SigPort. http://sigport.org/2473
, 2018. DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING. Available at: http://sigport.org/2473.
. (2018). "DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING." Web.
1. . DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2473

Exploring CTC-network derived features with conventional hybrid system

Paper Details

Authors:
Submitted On:
12 April 2018 - 2:55pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp2018.pdf

(228)

Subscribe

[1] , "Exploring CTC-network derived features with conventional hybrid system", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2472. Accessed: May. 19, 2019.
@article{2472-18,
url = {http://sigport.org/2472},
author = { },
publisher = {IEEE SigPort},
title = {Exploring CTC-network derived features with conventional hybrid system},
year = {2018} }
TY - EJOUR
T1 - Exploring CTC-network derived features with conventional hybrid system
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2472
ER -
. (2018). Exploring CTC-network derived features with conventional hybrid system. IEEE SigPort. http://sigport.org/2472
, 2018. Exploring CTC-network derived features with conventional hybrid system. Available at: http://sigport.org/2472.
. (2018). "Exploring CTC-network derived features with conventional hybrid system." Web.
1. . Exploring CTC-network derived features with conventional hybrid system [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2472

EVALUATING MULTIEXPOSURE FUSION USING IMAGE INFORMATION

Paper Details

Authors:
Hisham Rahman, Rajiv Soundararajan, and R. Venkatesh Babu
Submitted On:
12 April 2018 - 2:49pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

mefa0.pdf

(115)

Keywords

Subscribe

[1] Hisham Rahman, Rajiv Soundararajan, and R. Venkatesh Babu, "EVALUATING MULTIEXPOSURE FUSION USING IMAGE INFORMATION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2471. Accessed: May. 19, 2019.
@article{2471-18,
url = {http://sigport.org/2471},
author = {Hisham Rahman; Rajiv Soundararajan; and R. Venkatesh Babu },
publisher = {IEEE SigPort},
title = {EVALUATING MULTIEXPOSURE FUSION USING IMAGE INFORMATION},
year = {2018} }
TY - EJOUR
T1 - EVALUATING MULTIEXPOSURE FUSION USING IMAGE INFORMATION
AU - Hisham Rahman; Rajiv Soundararajan; and R. Venkatesh Babu
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2471
ER -
Hisham Rahman, Rajiv Soundararajan, and R. Venkatesh Babu. (2018). EVALUATING MULTIEXPOSURE FUSION USING IMAGE INFORMATION. IEEE SigPort. http://sigport.org/2471
Hisham Rahman, Rajiv Soundararajan, and R. Venkatesh Babu, 2018. EVALUATING MULTIEXPOSURE FUSION USING IMAGE INFORMATION. Available at: http://sigport.org/2471.
Hisham Rahman, Rajiv Soundararajan, and R. Venkatesh Babu. (2018). "EVALUATING MULTIEXPOSURE FUSION USING IMAGE INFORMATION." Web.
1. Hisham Rahman, Rajiv Soundararajan, and R. Venkatesh Babu. EVALUATING MULTIEXPOSURE FUSION USING IMAGE INFORMATION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2471

CONTENT-BASED REPRESENTATIONS OF AUDIO USING SIAMESE NEURAL NETWORKS


In this paper, we focus on the problem of content-based retrieval for
audio, which aims to retrieve all semantically similar audio recordings
for a given audio clip query. This problem is similar to the
problem of query by example of audio, which aims to retrieve media
samples from a database, which are similar to the user-provided example.
We propose a novel approach which encodes the audio into
a vector representation using Siamese Neural Networks. The goal is
to obtain an encoding similar for files belonging to the same audio

Paper Details

Authors:
Pranay Manocha, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, Bhiksha Raj
Submitted On:
14 April 2018 - 3:50am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2018_Pranay.pdf

(163)

Subscribe

[1] Pranay Manocha, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, Bhiksha Raj, "CONTENT-BASED REPRESENTATIONS OF AUDIO USING SIAMESE NEURAL NETWORKS", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2470. Accessed: May. 19, 2019.
@article{2470-18,
url = {http://sigport.org/2470},
author = {Pranay Manocha; Rohan Badlani; Anurag Kumar; Ankit Shah; Benjamin Elizalde; Bhiksha Raj },
publisher = {IEEE SigPort},
title = {CONTENT-BASED REPRESENTATIONS OF AUDIO USING SIAMESE NEURAL NETWORKS},
year = {2018} }
TY - EJOUR
T1 - CONTENT-BASED REPRESENTATIONS OF AUDIO USING SIAMESE NEURAL NETWORKS
AU - Pranay Manocha; Rohan Badlani; Anurag Kumar; Ankit Shah; Benjamin Elizalde; Bhiksha Raj
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2470
ER -
Pranay Manocha, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, Bhiksha Raj. (2018). CONTENT-BASED REPRESENTATIONS OF AUDIO USING SIAMESE NEURAL NETWORKS. IEEE SigPort. http://sigport.org/2470
Pranay Manocha, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, Bhiksha Raj, 2018. CONTENT-BASED REPRESENTATIONS OF AUDIO USING SIAMESE NEURAL NETWORKS. Available at: http://sigport.org/2470.
Pranay Manocha, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, Bhiksha Raj. (2018). "CONTENT-BASED REPRESENTATIONS OF AUDIO USING SIAMESE NEURAL NETWORKS." Web.
1. Pranay Manocha, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, Bhiksha Raj. CONTENT-BASED REPRESENTATIONS OF AUDIO USING SIAMESE NEURAL NETWORKS [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2470

Document Quality Estimation using Spatial Frequency Response


The current Document Image Quality Assessment (DIQA) algorithms directly relate the Optical Character Recognition (OCR) accuracies with the quality of the document to build supervised learning frameworks. This direct correlation has two major limitations: (a) OCR may be affected by factors independent of the quality of the capture and (b) it cannot account for blur variations within an image. An alternate possibility is to quantify the quality of capture using human judgement, however, it is subjective and prone to error.

Paper Details

Authors:
Pranjal Kumar Rai, Sajal Maheshwari, Vineet Gandhi
Submitted On:
13 April 2018 - 2:24am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

rai_ICASSP.pdf

(80)

Subscribe

[1] Pranjal Kumar Rai, Sajal Maheshwari, Vineet Gandhi, "Document Quality Estimation using Spatial Frequency Response", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2469. Accessed: May. 19, 2019.
@article{2469-18,
url = {http://sigport.org/2469},
author = {Pranjal Kumar Rai; Sajal Maheshwari; Vineet Gandhi },
publisher = {IEEE SigPort},
title = {Document Quality Estimation using Spatial Frequency Response},
year = {2018} }
TY - EJOUR
T1 - Document Quality Estimation using Spatial Frequency Response
AU - Pranjal Kumar Rai; Sajal Maheshwari; Vineet Gandhi
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2469
ER -
Pranjal Kumar Rai, Sajal Maheshwari, Vineet Gandhi. (2018). Document Quality Estimation using Spatial Frequency Response. IEEE SigPort. http://sigport.org/2469
Pranjal Kumar Rai, Sajal Maheshwari, Vineet Gandhi, 2018. Document Quality Estimation using Spatial Frequency Response. Available at: http://sigport.org/2469.
Pranjal Kumar Rai, Sajal Maheshwari, Vineet Gandhi. (2018). "Document Quality Estimation using Spatial Frequency Response." Web.
1. Pranjal Kumar Rai, Sajal Maheshwari, Vineet Gandhi. Document Quality Estimation using Spatial Frequency Response [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2469

Distributed TDOA-based indoor source localisation

Paper Details

Authors:
Nikolay Dian Gaubitch, Richard Heusdens
Submitted On:
12 April 2018 - 2:25pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_Wangyang.pdf

(143)

Keywords

Additional Categories

Subscribe

[1] Nikolay Dian Gaubitch, Richard Heusdens, "Distributed TDOA-based indoor source localisation", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2468. Accessed: May. 19, 2019.
@article{2468-18,
url = {http://sigport.org/2468},
author = {Nikolay Dian Gaubitch; Richard Heusdens },
publisher = {IEEE SigPort},
title = {Distributed TDOA-based indoor source localisation},
year = {2018} }
TY - EJOUR
T1 - Distributed TDOA-based indoor source localisation
AU - Nikolay Dian Gaubitch; Richard Heusdens
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2468
ER -
Nikolay Dian Gaubitch, Richard Heusdens. (2018). Distributed TDOA-based indoor source localisation. IEEE SigPort. http://sigport.org/2468
Nikolay Dian Gaubitch, Richard Heusdens, 2018. Distributed TDOA-based indoor source localisation. Available at: http://sigport.org/2468.
Nikolay Dian Gaubitch, Richard Heusdens. (2018). "Distributed TDOA-based indoor source localisation." Web.
1. Nikolay Dian Gaubitch, Richard Heusdens. Distributed TDOA-based indoor source localisation [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2468

Domain Adversarial Training for Accented Speech Recgnition


In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem. In order to reduce the mismatch between labeled source domain data (“standard” accent) and unlabeled target domain data (with heavy accents), we augment the learning objective for a Kaldi TDNN network with a domain adversarial training (DAT) objective to encourage the model to learn accent-invariant features.

Paper Details

Authors:
Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie
Submitted On:
17 April 2018 - 4:42pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp_slides_snsun_v6.pdf

(247)

Subscribe

[1] Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie, "Domain Adversarial Training for Accented Speech Recgnition", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2467. Accessed: May. 19, 2019.
@article{2467-18,
url = {http://sigport.org/2467},
author = {Ching-Feng Yeh; Mei-Yuh Hwang; Mari Ostendorf; Lei Xie },
publisher = {IEEE SigPort},
title = {Domain Adversarial Training for Accented Speech Recgnition},
year = {2018} }
TY - EJOUR
T1 - Domain Adversarial Training for Accented Speech Recgnition
AU - Ching-Feng Yeh; Mei-Yuh Hwang; Mari Ostendorf; Lei Xie
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2467
ER -
Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie. (2018). Domain Adversarial Training for Accented Speech Recgnition. IEEE SigPort. http://sigport.org/2467
Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie, 2018. Domain Adversarial Training for Accented Speech Recgnition. Available at: http://sigport.org/2467.
Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie. (2018). "Domain Adversarial Training for Accented Speech Recgnition." Web.
1. Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie. Domain Adversarial Training for Accented Speech Recgnition [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2467

Geometric Information Based Monaural Speech Separation Using Deep Neural Network

Paper Details

Authors:
Yang Sun, Jonathon A. Chambers, Syed Mohsen Naqvi
Submitted On:
12 April 2018 - 2:15pm
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

P4.11.pdf

(150)

Subscribe

[1] Yang Sun, Jonathon A. Chambers, Syed Mohsen Naqvi , "Geometric Information Based Monaural Speech Separation Using Deep Neural Network", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2466. Accessed: May. 19, 2019.
@article{2466-18,
url = {http://sigport.org/2466},
author = {Yang Sun; Jonathon A. Chambers; Syed Mohsen Naqvi },
publisher = {IEEE SigPort},
title = {Geometric Information Based Monaural Speech Separation Using Deep Neural Network},
year = {2018} }
TY - EJOUR
T1 - Geometric Information Based Monaural Speech Separation Using Deep Neural Network
AU - Yang Sun; Jonathon A. Chambers; Syed Mohsen Naqvi
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2466
ER -
Yang Sun, Jonathon A. Chambers, Syed Mohsen Naqvi . (2018). Geometric Information Based Monaural Speech Separation Using Deep Neural Network. IEEE SigPort. http://sigport.org/2466
Yang Sun, Jonathon A. Chambers, Syed Mohsen Naqvi , 2018. Geometric Information Based Monaural Speech Separation Using Deep Neural Network. Available at: http://sigport.org/2466.
Yang Sun, Jonathon A. Chambers, Syed Mohsen Naqvi . (2018). "Geometric Information Based Monaural Speech Separation Using Deep Neural Network." Web.
1. Yang Sun, Jonathon A. Chambers, Syed Mohsen Naqvi . Geometric Information Based Monaural Speech Separation Using Deep Neural Network [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2466

Pages