Sorry, you need to enable JavaScript to visit this website.

Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification

Citation Author(s):
Jia Dai, Shan Liang, Wei Xue, Chongjia Ni, Wenju Liu
Submitted by:
Wenju Liu
Last updated:
14 October 2016 - 9:18am
Document Type:
Presentation Slides
Document Year:
Jia Dai


In the conventional frame feature based music genre
classification methods, the audio data is represented by
independent frames and the sequential nature of audio is totally
ignored. If the sequential knowledge is well modeled and
combined, the classification performance can be significantly
improved. The long short-term memory(LSTM) recurrent
neural network (RNN) which uses a set of special memory
cells to model for long-range feature sequence, has been
successfully used for many sequence labeling and sequence
prediction tasks. In this paper, we propose the LSTM RNN
based segment features for music genre classification. The
LSTM RNN is used to learn the representation of LSTM frame
feature. The segment features are the statistics of frame features
in each segment. Furthermore, the LSTM segment feature
is combined with the segment representation of initial frame
feature to obtain the fusional segment feature. The evaluation
on ISMIR database show that the LSTM segment feature
performs better than the frame feature. Overall, the fusional
segment feature achieves 89.71% classification accuracy,
about 4.19% improvement over the baseline model using deep
neural network (DNN). This significant improvement show the
effectiveness of the proposed segment feature.

0 users have voted: