Sorry, you need to enable JavaScript to visit this website.

Audio Processing Systems

Learning Environmental Sounds with End-to-end Convolutional Neural Network


Environmental sound classification (ESC) is usually conducted based on handcrafted features such as the log-mel feature. Meanwhile, end-to-end classification systems perform feature extraction jointly with classification and have achieved success particularly in image classification. In the same manner, if environmental sounds could be directly learned from the raw waveforms, we would be able to extract a new feature effective for classification that could not have been designed by humans, and this new feature could improve the classification performance.

poster1.pdf

PDF icon Poster (61 downloads)

Paper Details

Authors:
Yuji Tokozume, Tatsuya Harada
Submitted On:
3 March 2017 - 12:53am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster

(61 downloads)

Keywords

Subscribe

[1] Yuji Tokozume, Tatsuya Harada, "Learning Environmental Sounds with End-to-end Convolutional Neural Network", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1599. Accessed: Apr. 28, 2017.
@article{1599-17,
url = {http://sigport.org/1599},
author = {Yuji Tokozume; Tatsuya Harada },
publisher = {IEEE SigPort},
title = {Learning Environmental Sounds with End-to-end Convolutional Neural Network},
year = {2017} }
TY - EJOUR
T1 - Learning Environmental Sounds with End-to-end Convolutional Neural Network
AU - Yuji Tokozume; Tatsuya Harada
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1599
ER -
Yuji Tokozume, Tatsuya Harada. (2017). Learning Environmental Sounds with End-to-end Convolutional Neural Network. IEEE SigPort. http://sigport.org/1599
Yuji Tokozume, Tatsuya Harada, 2017. Learning Environmental Sounds with End-to-end Convolutional Neural Network. Available at: http://sigport.org/1599.
Yuji Tokozume, Tatsuya Harada. (2017). "Learning Environmental Sounds with End-to-end Convolutional Neural Network." Web.
1. Yuji Tokozume, Tatsuya Harada. Learning Environmental Sounds with End-to-end Convolutional Neural Network [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1599

Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models


This is oral presentation at ISCSLP, for more information, please refer to paper:

Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai, "Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models", ISCSLP, 2016.

Paper Details

Authors:
Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai
Submitted On:
11 October 2016 - 10:00pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

20161001_dnn_cluster_v2.pptx

(70 downloads)

Keywords

Subscribe

[1] Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai, "Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1156. Accessed: Apr. 28, 2017.
@article{1156-16,
url = {http://sigport.org/1156},
author = {Jun-Hua Liu; Zhen-Hua Ling; Si Wei; Guo-Ping Hu; Li-Rong Dai },
publisher = {IEEE SigPort},
title = {Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models},
year = {2016} }
TY - EJOUR
T1 - Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models
AU - Jun-Hua Liu; Zhen-Hua Ling; Si Wei; Guo-Ping Hu; Li-Rong Dai
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1156
ER -
Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai. (2016). Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models. IEEE SigPort. http://sigport.org/1156
Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai, 2016. Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models. Available at: http://sigport.org/1156.
Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai. (2016). "Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models." Web.
1. Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, Li-Rong Dai. Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1156

Acoustic detection and localization of impulsive events in urban environments

Paper Details

Authors:
Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor
Submitted On:
31 July 2016 - 2:03pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

SPM Student submission_Tahir.zip

(89 downloads)

Keywords

Subscribe

[1] Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor, "Acoustic detection and localization of impulsive events in urban environments", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1140. Accessed: Apr. 28, 2017.
@article{1140-16,
url = {http://sigport.org/1140},
author = {Momin Uppal; Sabeeh Irfan Ahmad; Hassan Shahbaz; Hassam Noor },
publisher = {IEEE SigPort},
title = {Acoustic detection and localization of impulsive events in urban environments},
year = {2016} }
TY - EJOUR
T1 - Acoustic detection and localization of impulsive events in urban environments
AU - Momin Uppal; Sabeeh Irfan Ahmad; Hassan Shahbaz; Hassam Noor
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1140
ER -
Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor. (2016). Acoustic detection and localization of impulsive events in urban environments. IEEE SigPort. http://sigport.org/1140
Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor, 2016. Acoustic detection and localization of impulsive events in urban environments. Available at: http://sigport.org/1140.
Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor. (2016). "Acoustic detection and localization of impulsive events in urban environments." Web.
1. Momin Uppal, Sabeeh Irfan Ahmad, Hassan Shahbaz, Hassam Noor. Acoustic detection and localization of impulsive events in urban environments [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1140

LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS


Bank of regressors

We introduce a new learned descriptor for audio signals which is efficient for event representation. The entries of the descriptor are produced by evaluating a set of regressors on the input signal. The regressors are class-specific and trained using the random regression forests framework. Given an input signal, each regressor estimates the onset and offset positions of the target event. The estimation confidence scores output by a regressor are then used to quantify how the target event aligns with the temporal structure of the corresponding category.

Paper Details

Authors:
Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins
Submitted On:
16 March 2016 - 9:03am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

1838_poster.pdf

(132 downloads)

Keywords

Subscribe

[1] Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins, "LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/712. Accessed: Apr. 28, 2017.
@article{712-16,
url = {http://sigport.org/712},
author = {Huy Phan; Marco Maass; Lars Hertel; Radoslaw Mazur; Ian McLoughlin; Alfred Mertins },
publisher = {IEEE SigPort},
title = {LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS},
year = {2016} }
TY - EJOUR
T1 - LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS
AU - Huy Phan; Marco Maass; Lars Hertel; Radoslaw Mazur; Ian McLoughlin; Alfred Mertins
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/712
ER -
Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins. (2016). LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS. IEEE SigPort. http://sigport.org/712
Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins, 2016. LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS. Available at: http://sigport.org/712.
Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins. (2016). "LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS." Web.
1. Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, Alfred Mertins. LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/712

Temporal Alignment for Deep Neural Networks

Paper Details

Authors:
Submitted On:
23 February 2016 - 1:44pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

GlobalSIP2015(1).pdf

(206 downloads)

Keywords

Subscribe

[1] , "Temporal Alignment for Deep Neural Networks", IEEE SigPort, 2015. [Online]. Available: http://sigport.org/277. Accessed: Apr. 28, 2017.
@article{277-15,
url = {http://sigport.org/277},
author = { },
publisher = {IEEE SigPort},
title = {Temporal Alignment for Deep Neural Networks},
year = {2015} }
TY - EJOUR
T1 - Temporal Alignment for Deep Neural Networks
AU -
PY - 2015
PB - IEEE SigPort
UR - http://sigport.org/277
ER -
. (2015). Temporal Alignment for Deep Neural Networks. IEEE SigPort. http://sigport.org/277
, 2015. Temporal Alignment for Deep Neural Networks. Available at: http://sigport.org/277.
. (2015). "Temporal Alignment for Deep Neural Networks." Web.
1. . Temporal Alignment for Deep Neural Networks [Internet]. IEEE SigPort; 2015. Available from : http://sigport.org/277

Natural Sound Rendering for Headphones: Integration of signal processing techniques


With the strong growth of assistive and personal listening devices, natural sound rendering over headphones is becoming a necessity for prolonged listening in multimedia and virtual reality applications. The aim of natural sound rendering is to naturally recreate the sound scenes with the spatial and timbral quality as natural as possible, so as to achieve a truly immersive listening experience. However, rendering natural sound over headphones encounters many challenges. This tutorial article presents signal processing techniques to tackle these challenges to assist human listening.

Paper Details

Authors:
Kaushik Sunder, Ee-Leng Tan
Submitted On:
23 February 2016 - 1:44pm
Short Link:
Type:

Document Files

SPM2015manuscript-Natural Sound Rendering for Headphones.pdf

(323 downloads)

Keywords

Subscribe

[1] Kaushik Sunder, Ee-Leng Tan, "Natural Sound Rendering for Headphones: Integration of signal processing techniques", IEEE SigPort, 2015. [Online]. Available: http://sigport.org/166. Accessed: Apr. 28, 2017.
@article{166-15,
url = {http://sigport.org/166},
author = {Kaushik Sunder; Ee-Leng Tan },
publisher = {IEEE SigPort},
title = {Natural Sound Rendering for Headphones: Integration of signal processing techniques},
year = {2015} }
TY - EJOUR
T1 - Natural Sound Rendering for Headphones: Integration of signal processing techniques
AU - Kaushik Sunder; Ee-Leng Tan
PY - 2015
PB - IEEE SigPort
UR - http://sigport.org/166
ER -
Kaushik Sunder, Ee-Leng Tan. (2015). Natural Sound Rendering for Headphones: Integration of signal processing techniques. IEEE SigPort. http://sigport.org/166
Kaushik Sunder, Ee-Leng Tan, 2015. Natural Sound Rendering for Headphones: Integration of signal processing techniques. Available at: http://sigport.org/166.
Kaushik Sunder, Ee-Leng Tan. (2015). "Natural Sound Rendering for Headphones: Integration of signal processing techniques." Web.
1. Kaushik Sunder, Ee-Leng Tan. Natural Sound Rendering for Headphones: Integration of signal processing techniques [Internet]. IEEE SigPort; 2015. Available from : http://sigport.org/166

Linear estimation based primary-ambient extraction for stereo audio signals (slides)


Audio signals for moving pictures and video games are often linear combinations of primary and ambient components. In spatial audio analysis-synthesis, these mixed signals are usually decomposed into primary and ambient components to facilitate flexible spatial rendering and enhancement. Existing approaches such as principal component analysis (PCA) and least squares (LS) are widely used to perform this decomposition from stereo signals.

Paper Details

Authors:
Ee Leng Tan
Submitted On:
23 February 2016 - 1:43pm
Short Link:
Type:

Document Files

ASLP14slides_Linear estimation based primary-ambient extraction for stereo audio signals-short.pdf

(266 downloads)

Keywords

Subscribe

[1] Ee Leng Tan, "Linear estimation based primary-ambient extraction for stereo audio signals (slides)", IEEE SigPort, 2015. [Online]. Available: http://sigport.org/159. Accessed: Apr. 28, 2017.
@article{159-15,
url = {http://sigport.org/159},
author = {Ee Leng Tan },
publisher = {IEEE SigPort},
title = {Linear estimation based primary-ambient extraction for stereo audio signals (slides)},
year = {2015} }
TY - EJOUR
T1 - Linear estimation based primary-ambient extraction for stereo audio signals (slides)
AU - Ee Leng Tan
PY - 2015
PB - IEEE SigPort
UR - http://sigport.org/159
ER -
Ee Leng Tan. (2015). Linear estimation based primary-ambient extraction for stereo audio signals (slides). IEEE SigPort. http://sigport.org/159
Ee Leng Tan, 2015. Linear estimation based primary-ambient extraction for stereo audio signals (slides). Available at: http://sigport.org/159.
Ee Leng Tan. (2015). "Linear estimation based primary-ambient extraction for stereo audio signals (slides)." Web.
1. Ee Leng Tan. Linear estimation based primary-ambient extraction for stereo audio signals (slides) [Internet]. IEEE SigPort; 2015. Available from : http://sigport.org/159

Linear estimation based primary-ambient extraction for stereo audio signals


Linear estimation based primary-ambient extraction for stereo audio signals

Audio signals for moving pictures and video games are often linear combinations of primary and ambient components. In spatial audio analysis-synthesis, these mixed signals are usually decomposed into primary and ambient components to facilitate flexible spatial rendering and enhancement. Existing approaches such as principal component analysis (PCA) and least squares (LS) are widely used to perform this decomposition from stereo signals.

Paper Details

Authors:
Ee Leng Tan
Submitted On:
23 February 2016 - 1:43pm
Short Link:
Type:

Document Files

ASLP14manuscript_Linear estimation based primary-ambient extraction for stereo audio signals.pdf

(273 downloads)

Keywords

Subscribe

[1] Ee Leng Tan, "Linear estimation based primary-ambient extraction for stereo audio signals", IEEE SigPort, 2015. [Online]. Available: http://sigport.org/158. Accessed: Apr. 28, 2017.
@article{158-15,
url = {http://sigport.org/158},
author = {Ee Leng Tan },
publisher = {IEEE SigPort},
title = {Linear estimation based primary-ambient extraction for stereo audio signals},
year = {2015} }
TY - EJOUR
T1 - Linear estimation based primary-ambient extraction for stereo audio signals
AU - Ee Leng Tan
PY - 2015
PB - IEEE SigPort
UR - http://sigport.org/158
ER -
Ee Leng Tan. (2015). Linear estimation based primary-ambient extraction for stereo audio signals. IEEE SigPort. http://sigport.org/158
Ee Leng Tan, 2015. Linear estimation based primary-ambient extraction for stereo audio signals. Available at: http://sigport.org/158.
Ee Leng Tan. (2015). "Linear estimation based primary-ambient extraction for stereo audio signals." Web.
1. Ee Leng Tan. Linear estimation based primary-ambient extraction for stereo audio signals [Internet]. IEEE SigPort; 2015. Available from : http://sigport.org/158