Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks

Abstract: 

Multimedia event detection (MED) is the task of detecting given events (e.g. birthday party, making a sandwich) in a large collection of video clips. While visual features and automatic speech recognition typically provide the best features for this task, non-speech audio can also contribute useful information, such as crowds cheering, engine noises, or animal sounds.

MED is typically formulated as a two-stage process: the first stage generates clip-level feature representations, often by aggregating frame-level features; the second stage performs binary or multi-class classification to decide whether a given event occurs in a video clip. Both stages are usually performed "statically", i.e. using only local temporal information, or bag-of-words models.

In this paper, we introduce longer-range temporal information with deep recurrent neural networks (RNNs) for both stages. We classify each audio frame among a set of semantic units called "noisemes"; the sequence of frame-level confidence distributions is used as a variable-length clip-level representation. Such confidence vector sequences are then fed into long short-term memory (LSTM) networks for clip-level classification. We observe improvements in both frame-level and clip-level performance compared to SVM and feed-forward neural network baselines.

up
0 users have voted:

Paper Details

Authors:
Yun Wang, Leonardo Neves, Florian Metze
Submitted On:
17 March 2016 - 4:13pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Yun Wang
Document Year:
2016
Cite

Document Files

2016.03 For ICASSP.ppt

(224 downloads)

Subscribe

[1] Yun Wang, Leonardo Neves, Florian Metze, "Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/753. Accessed: Aug. 24, 2017.
@article{753-16,
url = {http://sigport.org/753},
author = {Yun Wang; Leonardo Neves; Florian Metze },
publisher = {IEEE SigPort},
title = {Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks},
year = {2016} }
TY - EJOUR
T1 - Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks
AU - Yun Wang; Leonardo Neves; Florian Metze
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/753
ER -
Yun Wang, Leonardo Neves, Florian Metze. (2016). Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks. IEEE SigPort. http://sigport.org/753
Yun Wang, Leonardo Neves, Florian Metze, 2016. Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks. Available at: http://sigport.org/753.
Yun Wang, Leonardo Neves, Florian Metze. (2016). "Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks." Web.
1. Yun Wang, Leonardo Neves, Florian Metze. Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/753