Sorry, you need to enable JavaScript to visit this website.

Audio Processing Systems

HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION

Paper Details

Authors:
Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou
Submitted On:
14 May 2019 - 7:11am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

aed_hal

(50)

Subscribe

[1] Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou, "HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4481. Accessed: Dec. 12, 2019.
@article{4481-19,
url = {http://sigport.org/4481},
author = {Arindam Jati; Naveen Kumar; Ruxin Chen; Panayiotis Georgiou },
publisher = {IEEE SigPort},
title = {HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION},
year = {2019} }
TY - EJOUR
T1 - HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION
AU - Arindam Jati; Naveen Kumar; Ruxin Chen; Panayiotis Georgiou
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4481
ER -
Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou. (2019). HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION. IEEE SigPort. http://sigport.org/4481
Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou, 2019. HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION. Available at: http://sigport.org/4481.
Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou. (2019). "HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION." Web.
1. Arindam Jati, Naveen Kumar, Ruxin Chen, Panayiotis Georgiou. HIERARCHY-AWARE LOSS FUNCTION ON A TREE STRUCTURED LABEL SPACE FOR AUDIO EVENT DETECTION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4481

Modeling nonlinear audio effects with end-to-end deep neural networks


Audio processors whose parameters are modified periodically
over time are often referred as time-varying or modulation based
audio effects. Most existing methods for modeling these type of
effect units are often optimized to a very specific circuit and cannot
be efficiently generalized to other time-varying effects. Based on
convolutional and recurrent neural networks, we propose a deep
learning architecture for generic black-box modeling of audio processors
with long-term memory. We explore the capabilities of

Paper Details

Authors:
Emmanouil Benetos, Joshua D. Reiss
Submitted On:
10 May 2019 - 12:06pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP___Presentation_Martinez_Ramirez.pdf

(63)

Subscribe

[1] Emmanouil Benetos, Joshua D. Reiss, "Modeling nonlinear audio effects with end-to-end deep neural networks", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4368. Accessed: Dec. 12, 2019.
@article{4368-19,
url = {http://sigport.org/4368},
author = {Emmanouil Benetos; Joshua D. Reiss },
publisher = {IEEE SigPort},
title = {Modeling nonlinear audio effects with end-to-end deep neural networks},
year = {2019} }
TY - EJOUR
T1 - Modeling nonlinear audio effects with end-to-end deep neural networks
AU - Emmanouil Benetos; Joshua D. Reiss
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4368
ER -
Emmanouil Benetos, Joshua D. Reiss. (2019). Modeling nonlinear audio effects with end-to-end deep neural networks. IEEE SigPort. http://sigport.org/4368
Emmanouil Benetos, Joshua D. Reiss, 2019. Modeling nonlinear audio effects with end-to-end deep neural networks. Available at: http://sigport.org/4368.
Emmanouil Benetos, Joshua D. Reiss. (2019). "Modeling nonlinear audio effects with end-to-end deep neural networks." Web.
1. Emmanouil Benetos, Joshua D. Reiss. Modeling nonlinear audio effects with end-to-end deep neural networks [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4368

CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction


Inspired by human hearing perception, we propose a twostage multi-resolution end-to-end model for singing melody extraction in this paper. The convolutional neural network (CNN) is the core of the proposed model to generate multiresolution representations. The 1-D and 2-D multi-resolution analysis on waveform and spectrogram-like graph are successively carried out by using 1-D and 2-D CNN kernels of different lengths and sizes.

Paper Details

Authors:
Bo-Jun Li, Tai-Shih Chi
Submitted On:
9 May 2019 - 1:00pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2019_MINGTSO.pdf

(43)

Subscribe

[1] Bo-Jun Li, Tai-Shih Chi, "CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4223. Accessed: Dec. 12, 2019.
@article{4223-19,
url = {http://sigport.org/4223},
author = {Bo-Jun Li; Tai-Shih Chi },
publisher = {IEEE SigPort},
title = {CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction},
year = {2019} }
TY - EJOUR
T1 - CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction
AU - Bo-Jun Li; Tai-Shih Chi
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4223
ER -
Bo-Jun Li, Tai-Shih Chi. (2019). CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction. IEEE SigPort. http://sigport.org/4223
Bo-Jun Li, Tai-Shih Chi, 2019. CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction. Available at: http://sigport.org/4223.
Bo-Jun Li, Tai-Shih Chi. (2019). "CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction." Web.
1. Bo-Jun Li, Tai-Shih Chi. CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4223

Contextual Speech Recognition with Difficult Negative Training Examples

Paper Details

Authors:
Uri Alon, Golan Pundak, Tara N. Sainath
Submitted On:
7 May 2019 - 9:07pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster.pdf

(32)

Subscribe

[1] Uri Alon, Golan Pundak, Tara N. Sainath, "Contextual Speech Recognition with Difficult Negative Training Examples", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3977. Accessed: Dec. 12, 2019.
@article{3977-19,
url = {http://sigport.org/3977},
author = {Uri Alon; Golan Pundak; Tara N. Sainath },
publisher = {IEEE SigPort},
title = {Contextual Speech Recognition with Difficult Negative Training Examples},
year = {2019} }
TY - EJOUR
T1 - Contextual Speech Recognition with Difficult Negative Training Examples
AU - Uri Alon; Golan Pundak; Tara N. Sainath
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3977
ER -
Uri Alon, Golan Pundak, Tara N. Sainath. (2019). Contextual Speech Recognition with Difficult Negative Training Examples. IEEE SigPort. http://sigport.org/3977
Uri Alon, Golan Pundak, Tara N. Sainath, 2019. Contextual Speech Recognition with Difficult Negative Training Examples. Available at: http://sigport.org/3977.
Uri Alon, Golan Pundak, Tara N. Sainath. (2019). "Contextual Speech Recognition with Difficult Negative Training Examples." Web.
1. Uri Alon, Golan Pundak, Tara N. Sainath. Contextual Speech Recognition with Difficult Negative Training Examples [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3977

Exploring CTC-network derived features with conventional hybrid system

Paper Details

Authors:
Submitted On:
12 April 2018 - 2:55pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp2018.pdf

(268)

Subscribe

[1] , "Exploring CTC-network derived features with conventional hybrid system", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2472. Accessed: Dec. 12, 2019.
@article{2472-18,
url = {http://sigport.org/2472},
author = { },
publisher = {IEEE SigPort},
title = {Exploring CTC-network derived features with conventional hybrid system},
year = {2018} }
TY - EJOUR
T1 - Exploring CTC-network derived features with conventional hybrid system
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2472
ER -
. (2018). Exploring CTC-network derived features with conventional hybrid system. IEEE SigPort. http://sigport.org/2472
, 2018. Exploring CTC-network derived features with conventional hybrid system. Available at: http://sigport.org/2472.
. (2018). "Exploring CTC-network derived features with conventional hybrid system." Web.
1. . Exploring CTC-network derived features with conventional hybrid system [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2472

Learning Environmental Sounds with End-to-end Convolutional Neural Network


Environmental sound classification (ESC) is usually conducted based on handcrafted features such as the log-mel feature. Meanwhile, end-to-end classification systems perform feature extraction jointly with classification and have achieved success particularly in image classification. In the same manner, if environmental sounds could be directly learned from the raw waveforms, we would be able to extract a new feature effective for classification that could not have been designed by humans, and this new feature could improve the classification performance.

Paper Details

Authors:
Yuji Tokozume, Tatsuya Harada
Submitted On:
3 March 2017 - 12:53am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster

(718)

Subscribe

[1] Yuji Tokozume, Tatsuya Harada, "Learning Environmental Sounds with End-to-end Convolutional Neural Network", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1599. Accessed: Dec. 12, 2019.
@article{1599-17,
url = {http://sigport.org/1599},
author = {Yuji Tokozume; Tatsuya Harada },
publisher = {IEEE SigPort},
title = {Learning Environmental Sounds with End-to-end Convolutional Neural Network},
year = {2017} }
TY - EJOUR
T1 - Learning Environmental Sounds with End-to-end Convolutional Neural Network
AU - Yuji Tokozume; Tatsuya Harada
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1599
ER -
Yuji Tokozume, Tatsuya Harada. (2017). Learning Environmental Sounds with End-to-end Convolutional Neural Network. IEEE SigPort. http://sigport.org/1599
Yuji Tokozume, Tatsuya Harada, 2017. Learning Environmental Sounds with End-to-end Convolutional Neural Network. Available at: http://sigport.org/1599.
Yuji Tokozume, Tatsuya Harada. (2017). "Learning Environmental Sounds with End-to-end Convolutional Neural Network." Web.
1. Yuji Tokozume, Tatsuya Harada. Learning Environmental Sounds with End-to-end Convolutional Neural Network [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1599

Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models


This is oral presentation at ISCSLP, for more information, please refer to paper:

Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai, "Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models", ISCSLP, 2016.

Paper Details

Authors:
Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai
Submitted On:
11 October 2016 - 10:00pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

20161001_dnn_cluster_v2.pptx

(411)

Subscribe

[1] Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai, "Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1156. Accessed: Dec. 12, 2019.
@article{1156-16,
url = {http://sigport.org/1156},
author = {Jun-Hua Liu; Zhen-Hua Ling; Si Wei; Guo-Ping Hu; Li-Rong Dai },
publisher = {IEEE SigPort},
title = {Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models},
year = {2016} }
TY - EJOUR
T1 - Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models
AU - Jun-Hua Liu; Zhen-Hua Ling; Si Wei; Guo-Ping Hu; Li-Rong Dai
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1156
ER -
Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai. (2016). Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models. IEEE SigPort. http://sigport.org/1156
Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai, 2016. Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models. Available at: http://sigport.org/1156.
Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai. (2016). "Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models." Web.
1. Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai. Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1156

Acoustic detection and localization of impulsive events in urban environments

Paper Details

Authors:
Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor
Submitted On:
31 July 2016 - 2:03pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

SPM Student submission_Tahir.zip

(77)

Subscribe

[1] Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor, "Acoustic detection and localization of impulsive events in urban environments", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1140. Accessed: Dec. 12, 2019.
@article{1140-16,
url = {http://sigport.org/1140},
author = {Momin Uppal; Sabeeh Irfan Ahmad; Hassan Shahbaz; Hassam Noor },
publisher = {IEEE SigPort},
title = {Acoustic detection and localization of impulsive events in urban environments},
year = {2016} }
TY - EJOUR
T1 - Acoustic detection and localization of impulsive events in urban environments
AU - Momin Uppal; Sabeeh Irfan Ahmad; Hassan Shahbaz; Hassam Noor
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1140
ER -
Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor. (2016). Acoustic detection and localization of impulsive events in urban environments. IEEE SigPort. http://sigport.org/1140
Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor, 2016. Acoustic detection and localization of impulsive events in urban environments. Available at: http://sigport.org/1140.
Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor. (2016). "Acoustic detection and localization of impulsive events in urban environments." Web.
1. Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor. Acoustic detection and localization of impulsive events in urban environments [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1140

LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS


Bank of regressors

We introduce a new learned descriptor for audio signals which is efficient for event representation. The entries of the descriptor are produced by evaluating a set of regressors on the input signal. The regressors are class-specific and trained using the random regression forests framework. Given an input signal, each regressor estimates the onset and offset positions of the target event. The estimation confidence scores output by a regressor are then used to quantify how the target event aligns with the temporal structure of the corresponding category.

Paper Details

Authors:
Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins
Submitted On:
16 March 2016 - 9:03am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

1838_poster.pdf

(469)

Subscribe

[1] Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins, "LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/712. Accessed: Dec. 12, 2019.
@article{712-16,
url = {http://sigport.org/712},
author = {Huy Phan; Marco Maass; Lars Hertel; Radoslaw Mazur; Ian McLoughlin; Alfred Mertins },
publisher = {IEEE SigPort},
title = {LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS},
year = {2016} }
TY - EJOUR
T1 - LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS
AU - Huy Phan; Marco Maass; Lars Hertel; Radoslaw Mazur; Ian McLoughlin; Alfred Mertins
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/712
ER -
Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins. (2016). LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS. IEEE SigPort. http://sigport.org/712
Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins, 2016. LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS. Available at: http://sigport.org/712.
Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins. (2016). "LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS." Web.
1. Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins. LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/712

Temporal Alignment for Deep Neural Networks

Paper Details

Authors:
Submitted On:
23 February 2016 - 1:44pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

GlobalSIP2015(1).pdf

(61)

Subscribe

[1] , "Temporal Alignment for Deep Neural Networks", IEEE SigPort, 2015. [Online]. Available: http://sigport.org/277. Accessed: Dec. 12, 2019.
@article{277-15,
url = {http://sigport.org/277},
author = { },
publisher = {IEEE SigPort},
title = {Temporal Alignment for Deep Neural Networks},
year = {2015} }
TY - EJOUR
T1 - Temporal Alignment for Deep Neural Networks
AU -
PY - 2015
PB - IEEE SigPort
UR - http://sigport.org/277
ER -
. (2015). Temporal Alignment for Deep Neural Networks. IEEE SigPort. http://sigport.org/277
, 2015. Temporal Alignment for Deep Neural Networks. Available at: http://sigport.org/277.
. (2015). "Temporal Alignment for Deep Neural Networks." Web.
1. . Temporal Alignment for Deep Neural Networks [Internet]. IEEE SigPort; 2015. Available from : http://sigport.org/277

Pages