
- Read more about Learning to rank music tracks using triplet loss
- Log in to post comments
Most music streaming services rely on automatic recommendation algorithms to exploit their large music catalogs. These algorithms aim at retrieving a ranked list of music tracks based on their similarity with a target music track. In this work, we propose a method for direct recommendation based on the audio content without explicitly tagging the music tracks. To that aim, we propose several strategies to perform triplet mining from ranked lists. We train a Convolutional Neural Network to learn the similarity via triplet loss.
icassp.pdf

- Categories:

- Read more about MUSIC BOUNDARY DETECTION BASED ON A HYBRID DEEP MODEL OF NOVELTY, HOMOGENEITY, REPETITION AND DURATION
- Log in to post comments
Current state-of-the-art music boundary detection methods use local features for boundary detection, but such an approach fails to explicitly incorporate the statistical properties of the detected segments. This paper presents a music boundary detection method that simultaneously considers a fitness measure based on the boundary posterior probability, the likelihood of the segmentation duration sequence, and the acoustic consistency within a segment.
- Categories:

- Read more about Modeling Melodic Feature Dependency with Modularized Variational Auto-Encoder
- Log in to post comments
Automatic melody generation has been a long-time aspiration for both AI researchers and musicians. However, learning to generate euphonious melodies has turned out to be highly challenging. This paper introduces 1) a new variant of variational autoencoder (VAE), where the model structure is designed in a modularized manner in order to model polyphonic and dynamic music with domain knowledge, and 2) a hierarchical encoding/decoding strategy, which explicitly models the dependency between melodic features.
- Categories:

- Read more about Intonation: a Dataset of Quality Vocal Performances Refined by Spectral Clustering on Pitch Congruence
- Log in to post comments
We introduce the "Intonation" dataset of amateur vocal performances with a tendency for good intonation, collected from Smule, Inc. The dataset can be used for music information retrieval tasks such as autotuning, query by humming, and singing style analysis. It is available upon request on the Stanford CCRMA DAMP website. We describe a semi-supervised approach to selecting the audio recordings from a larger collection of performances based on intonation patterns.
- Categories:

- Read more about ENHANCING MUSIC FEATURES BY KNOWLEDGE TRANSFER FROM USER-ITEM LOG DATA
- Log in to post comments
- Categories:

- Read more about Modeling nonlinear audio effects with end-to-end deep neural networks
- Log in to post comments
Audio processors whose parameters are modified periodically
over time are often referred as time-varying or modulation based
audio effects. Most existing methods for modeling these type of
effect units are often optimized to a very specific circuit and cannot
be efficiently generalized to other time-varying effects. Based on
convolutional and recurrent neural networks, we propose a deep
learning architecture for generic black-box modeling of audio processors
with long-term memory. We explore the capabilities of
- Categories:

In the recent years, singing voice separation systems showed increased performance due to the use of supervised training. The design of training datasets is known as a crucial factor in the performance of such systems. We investigate on how the characteristics of the training dataset impacts the separation performances of state-of-the-art singing voice separation algorithms. We show that the separation quality and diversity are two important and complementary assets of a good training dataset. We also provide insights on possible transforms to perform data augmentation for this task.
- Categories:

- Read more about CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction
- Log in to post comments
Inspired by human hearing perception, we propose a twostage multi-resolution end-to-end model for singing melody extraction in this paper. The convolutional neural network (CNN) is the core of the proposed model to generate multiresolution representations. The 1-D and 2-D multi-resolution analysis on waveform and spectrogram-like graph are successively carried out by using 1-D and 2-D CNN kernels of different lengths and sizes.
- Categories:

- Read more about End-to-End Lyrics Alignment Using An Audio-to-Character Recognition Model
- Log in to post comments
Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications. Compared to text-to-speech alignment, lyrics alignment remains highly challenging, despite many attempts to combine numerous sub-modules including vocal separation and detection in an effort to break down the problem. Furthermore, training required fine-grained annotations to be available in some form.
- Categories:

- Read more about DEEP POLYPHONIC ADSR PIANO NOTE TRANSCRIPTION
- Log in to post comments
- Categories: