Sorry, you need to enable JavaScript to visit this website.

Audio Processing Systems

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders


We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech. Previous speech representation methods learn through conditioning on past frames and predicting information about future frames. Whereas Mockingjay is designed to predict the current frame through jointly conditioning on both past and future contexts.

Paper Details

Authors:
Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee
Submitted On:
15 May 2020 - 10:18pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Presentation Slides

(14)

Subscribe

[1] Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee, "Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5364. Accessed: Jul. 04, 2020.
@article{5364-20,
url = {http://sigport.org/5364},
author = {Andy T. Liu; Shu-wen Yang; Po-Han Chi; Po-chun Hsu; Hung-yi Lee },
publisher = {IEEE SigPort},
title = {Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
year = {2020} }
TY - EJOUR
T1 - Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
AU - Andy T. Liu; Shu-wen Yang; Po-Han Chi; Po-chun Hsu; Hung-yi Lee
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5364
ER -
Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee. (2020). Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders. IEEE SigPort. http://sigport.org/5364
Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee, 2020. Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders. Available at: http://sigport.org/5364.
Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee. (2020). "Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders." Web.
1. Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee. Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5364

TASK-AWARE MEAN TEACHER METHOD FOR LARGE SCALE WEAKLY LABELED SEMI-SUPERVISED SOUND EVENT DETECTION

Paper Details

Authors:
Jie Yan, Yan Song, Li-Rong Dai, Ian McLoughlin
Submitted On:
15 May 2020 - 1:40am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp2020_yanjie.pdf

(16)

Subscribe

[1] Jie Yan, Yan Song, Li-Rong Dai, Ian McLoughlin, "TASK-AWARE MEAN TEACHER METHOD FOR LARGE SCALE WEAKLY LABELED SEMI-SUPERVISED SOUND EVENT DETECTION", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5336. Accessed: Jul. 04, 2020.
@article{5336-20,
url = {http://sigport.org/5336},
author = {Jie Yan; Yan Song; Li-Rong Dai; Ian McLoughlin },
publisher = {IEEE SigPort},
title = {TASK-AWARE MEAN TEACHER METHOD FOR LARGE SCALE WEAKLY LABELED SEMI-SUPERVISED SOUND EVENT DETECTION},
year = {2020} }
TY - EJOUR
T1 - TASK-AWARE MEAN TEACHER METHOD FOR LARGE SCALE WEAKLY LABELED SEMI-SUPERVISED SOUND EVENT DETECTION
AU - Jie Yan; Yan Song; Li-Rong Dai; Ian McLoughlin
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5336
ER -
Jie Yan, Yan Song, Li-Rong Dai, Ian McLoughlin. (2020). TASK-AWARE MEAN TEACHER METHOD FOR LARGE SCALE WEAKLY LABELED SEMI-SUPERVISED SOUND EVENT DETECTION. IEEE SigPort. http://sigport.org/5336
Jie Yan, Yan Song, Li-Rong Dai, Ian McLoughlin, 2020. TASK-AWARE MEAN TEACHER METHOD FOR LARGE SCALE WEAKLY LABELED SEMI-SUPERVISED SOUND EVENT DETECTION. Available at: http://sigport.org/5336.
Jie Yan, Yan Song, Li-Rong Dai, Ian McLoughlin. (2020). "TASK-AWARE MEAN TEACHER METHOD FOR LARGE SCALE WEAKLY LABELED SEMI-SUPERVISED SOUND EVENT DETECTION." Web.
1. Jie Yan, Yan Song, Li-Rong Dai, Ian McLoughlin. TASK-AWARE MEAN TEACHER METHOD FOR LARGE SCALE WEAKLY LABELED SEMI-SUPERVISED SOUND EVENT DETECTION [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5336

PACO and PCO-DCT: Patch Consensus and Its Application To Inpainting


Many signal processing methods break the target signal into overlapping patches, process them separately, and then stitch them back to produce an output. How to merge the resulting patches at the overlaps is central to such methods. We propose a novel framework for this type of problem based on the idea that estimated patches should coincide at the overlaps (consensus), and develop an algorithm for solving the general problem. In particular, an efficient method for projecting patches onto the consensus constraint is presented.

Paper Details

Authors:
Ignacio Ramirez, Ignacio Hounie
Submitted On:
14 May 2020 - 10:07am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Presentation

(13)

Subscribe

[1] Ignacio Ramirez, Ignacio Hounie, "PACO and PCO-DCT: Patch Consensus and Its Application To Inpainting", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5300. Accessed: Jul. 04, 2020.
@article{5300-20,
url = {http://sigport.org/5300},
author = {Ignacio Ramirez; Ignacio Hounie },
publisher = {IEEE SigPort},
title = {PACO and PCO-DCT: Patch Consensus and Its Application To Inpainting},
year = {2020} }
TY - EJOUR
T1 - PACO and PCO-DCT: Patch Consensus and Its Application To Inpainting
AU - Ignacio Ramirez; Ignacio Hounie
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5300
ER -
Ignacio Ramirez, Ignacio Hounie. (2020). PACO and PCO-DCT: Patch Consensus and Its Application To Inpainting. IEEE SigPort. http://sigport.org/5300
Ignacio Ramirez, Ignacio Hounie, 2020. PACO and PCO-DCT: Patch Consensus and Its Application To Inpainting. Available at: http://sigport.org/5300.
Ignacio Ramirez, Ignacio Hounie. (2020). "PACO and PCO-DCT: Patch Consensus and Its Application To Inpainting." Web.
1. Ignacio Ramirez, Ignacio Hounie. PACO and PCO-DCT: Patch Consensus and Its Application To Inpainting [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5300

High Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification

Paper Details

Authors:
Submitted On:
14 May 2020 - 8:07am
Short Link:
Type:

Document Files

icassp_baixue.pdf

(12)

Subscribe

[1] , "High Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5291. Accessed: Jul. 04, 2020.
@article{5291-20,
url = {http://sigport.org/5291},
author = { },
publisher = {IEEE SigPort},
title = {High Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification},
year = {2020} }
TY - EJOUR
T1 - High Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification
AU -
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5291
ER -
. (2020). High Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification. IEEE SigPort. http://sigport.org/5291
, 2020. High Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification. Available at: http://sigport.org/5291.
. (2020). "High Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification." Web.
1. . High Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5291

Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings

Paper Details

Authors:
Dara Pir, Richard So, Michael I Mandel
Submitted On:
13 May 2020 - 9:35pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP 2020 presentation-2.pdf

(20)

Subscribe

[1] Dara Pir, Richard So, Michael I Mandel, "Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5181. Accessed: Jul. 04, 2020.
@article{5181-20,
url = {http://sigport.org/5181},
author = {Dara Pir; Richard So; Michael I Mandel },
publisher = {IEEE SigPort},
title = {Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings},
year = {2020} }
TY - EJOUR
T1 - Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings
AU - Dara Pir; Richard So; Michael I Mandel
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5181
ER -
Dara Pir, Richard So, Michael I Mandel. (2020). Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings. IEEE SigPort. http://sigport.org/5181
Dara Pir, Richard So, Michael I Mandel, 2020. Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings. Available at: http://sigport.org/5181.
Dara Pir, Richard So, Michael I Mandel. (2020). "Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings." Web.
1. Dara Pir, Richard So, Michael I Mandel. Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5181

SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS

Paper Details

Authors:
Yanxiong Li, Mingle Liu, Konstantinos Drossos, Tuomas Virtanen
Submitted On:
13 May 2020 - 7:34pm
Short Link:
Type:
Document Year:
Cite

Document Files

ICASSP2020 Poster_PaperID#4901_Final.pdf

(13)

Subscribe

[1] Yanxiong Li, Mingle Liu, Konstantinos Drossos, Tuomas Virtanen, "SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5166. Accessed: Jul. 04, 2020.
@article{5166-20,
url = {http://sigport.org/5166},
author = {Yanxiong Li; Mingle Liu; Konstantinos Drossos; Tuomas Virtanen },
publisher = {IEEE SigPort},
title = {SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS},
year = {2020} }
TY - EJOUR
T1 - SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS
AU - Yanxiong Li; Mingle Liu; Konstantinos Drossos; Tuomas Virtanen
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5166
ER -
Yanxiong Li, Mingle Liu, Konstantinos Drossos, Tuomas Virtanen. (2020). SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS. IEEE SigPort. http://sigport.org/5166
Yanxiong Li, Mingle Liu, Konstantinos Drossos, Tuomas Virtanen, 2020. SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS. Available at: http://sigport.org/5166.
Yanxiong Li, Mingle Liu, Konstantinos Drossos, Tuomas Virtanen. (2020). "SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS." Web.
1. Yanxiong Li, Mingle Liu, Konstantinos Drossos, Tuomas Virtanen. SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5166

HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION

Paper Details

Authors:
Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou
Submitted On:
14 May 2019 - 7:11am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

aed_hal

(91)

Subscribe

[1] Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou, "HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4481. Accessed: Jul. 04, 2020.
@article{4481-19,
url = {http://sigport.org/4481},
author = {Arindam Jati; Naveen Kumar; Ruxin Chen; Panayiotis Georgiou },
publisher = {IEEE SigPort},
title = {HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION},
year = {2019} }
TY - EJOUR
T1 - HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION
AU - Arindam Jati; Naveen Kumar; Ruxin Chen; Panayiotis Georgiou
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4481
ER -
Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou. (2019). HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION. IEEE SigPort. http://sigport.org/4481
Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou, 2019. HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION. Available at: http://sigport.org/4481.
Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou. (2019). "HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION." Web.
1. Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou. HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4481

Modeling nonlinear audio effects with end-to-end deep neural networks


Audio processors whose parameters are modified periodically
over time are often referred as time-varying or modulation based
audio effects. Most existing methods for modeling these type of
effect units are often optimized to a very specific circuit and cannot
be efficiently generalized to other time-varying effects. Based on
convolutional and recurrent neural networks, we propose a deep
learning architecture for generic black-box modeling of audio processors
with long-term memory. We explore the capabilities of

Paper Details

Authors:
Emmanouil Benetos, Joshua D. Reiss
Submitted On:
10 May 2019 - 12:06pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP___Presentation_Martinez_Ramirez.pdf

(134)

Subscribe

[1] Emmanouil Benetos, Joshua D. Reiss, "Modeling nonlinear audio effects with end-to-end deep neural networks", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4368. Accessed: Jul. 04, 2020.
@article{4368-19,
url = {http://sigport.org/4368},
author = {Emmanouil Benetos; Joshua D. Reiss },
publisher = {IEEE SigPort},
title = {Modeling nonlinear audio effects with end-to-end deep neural networks},
year = {2019} }
TY - EJOUR
T1 - Modeling nonlinear audio effects with end-to-end deep neural networks
AU - Emmanouil Benetos; Joshua D. Reiss
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4368
ER -
Emmanouil Benetos, Joshua D. Reiss. (2019). Modeling nonlinear audio effects with end-to-end deep neural networks. IEEE SigPort. http://sigport.org/4368
Emmanouil Benetos, Joshua D. Reiss, 2019. Modeling nonlinear audio effects with end-to-end deep neural networks. Available at: http://sigport.org/4368.
Emmanouil Benetos, Joshua D. Reiss. (2019). "Modeling nonlinear audio effects with end-to-end deep neural networks." Web.
1. Emmanouil Benetos, Joshua D. Reiss. Modeling nonlinear audio effects with end-to-end deep neural networks [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4368

CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction


Inspired by human hearing perception, we propose a twostage multi-resolution end-to-end model for singing melody extraction in this paper. The convolutional neural network (CNN) is the core of the proposed model to generate multiresolution representations. The 1-D and 2-D multi-resolution analysis on waveform and spectrogram-like graph are successively carried out by using 1-D and 2-D CNN kernels of different lengths and sizes.

Paper Details

Authors:
Bo-Jun Li, Tai-Shih Chi
Submitted On:
9 May 2019 - 1:00pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2019_MINGTSO.pdf

(84)

Subscribe

[1] Bo-Jun Li, Tai-Shih Chi, "CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4223. Accessed: Jul. 04, 2020.
@article{4223-19,
url = {http://sigport.org/4223},
author = {Bo-Jun Li; Tai-Shih Chi },
publisher = {IEEE SigPort},
title = {CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction},
year = {2019} }
TY - EJOUR
T1 - CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction
AU - Bo-Jun Li; Tai-Shih Chi
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4223
ER -
Bo-Jun Li, Tai-Shih Chi. (2019). CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction. IEEE SigPort. http://sigport.org/4223
Bo-Jun Li, Tai-Shih Chi, 2019. CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction. Available at: http://sigport.org/4223.
Bo-Jun Li, Tai-Shih Chi. (2019). "CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction." Web.
1. Bo-Jun Li, Tai-Shih Chi. CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4223

Contextual Speech Recognition with Difficult Negative Training Examples

Paper Details

Authors:
Uri Alon, Golan Pundak, Tara N. Sainath
Submitted On:
7 May 2019 - 9:07pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster.pdf

(103)

Subscribe

[1] Uri Alon, Golan Pundak, Tara N. Sainath, "Contextual Speech Recognition with Difficult Negative Training Examples", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3977. Accessed: Jul. 04, 2020.
@article{3977-19,
url = {http://sigport.org/3977},
author = {Uri Alon; Golan Pundak; Tara N. Sainath },
publisher = {IEEE SigPort},
title = {Contextual Speech Recognition with Difficult Negative Training Examples},
year = {2019} }
TY - EJOUR
T1 - Contextual Speech Recognition with Difficult Negative Training Examples
AU - Uri Alon; Golan Pundak; Tara N. Sainath
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3977
ER -
Uri Alon, Golan Pundak, Tara N. Sainath. (2019). Contextual Speech Recognition with Difficult Negative Training Examples. IEEE SigPort. http://sigport.org/3977
Uri Alon, Golan Pundak, Tara N. Sainath, 2019. Contextual Speech Recognition with Difficult Negative Training Examples. Available at: http://sigport.org/3977.
Uri Alon, Golan Pundak, Tara N. Sainath. (2019). "Contextual Speech Recognition with Difficult Negative Training Examples." Web.
1. Uri Alon, Golan Pundak, Tara N. Sainath. Contextual Speech Recognition with Difficult Negative Training Examples [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3977

Pages