Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification

In the conventional frame feature based music genre
classification methods, the audio data is represented by
independent frames and the sequential nature of audio is totally
ignored. If the sequential knowledge is well modeled and
combined, the classification performance can be significantly
improved. The long short-term memory(LSTM) recurrent
neural network (RNN) which uses a set of special memory
cells to model for long-range feature sequence, has been
successfully used for many sequence labeling and sequence
prediction tasks. In this paper, we propose the LSTM RNN
based segment features for music genre classification. The
LSTM RNN is used to learn the representation of LSTM frame
feature. The segment features are the statistics of frame features
in each segment. Furthermore, the LSTM segment feature
is combined with the segment representation of initial frame
feature to obtain the fusional segment feature. The evaluation
on ISMIR database show that the LSTM segment feature
performs better than the frame feature. Overall, the fusional
segment feature achieves 89.71% classification accuracy,
about 4.19% improvement over the baseline model using deep
neural network (DNN). This significant improvement show the
effectiveness of the proposed segment feature.

ISCSLP2016_JiaDai_pptA4.pdf

ISCSLP2016_JiaDai_pptA4.pdf (826)

Thumbs Up

CITE

Documents

Presentation Slides

Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification

ISCSLP2016_JiaDai_pptA4.pdf

QUESTIONS?