Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

End-to-End Lyrics Alignment Using An Audio-to-Character Recognition Model

Abstract: 

Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications. Compared to text-to-speech alignment, lyrics alignment remains highly challenging, despite many attempts to combine numerous sub-modules including vocal separation and detection in an effort to break down the problem. Furthermore, training required fine-grained annotations to be available in some form. Here, we present a novel system based on a modified Wave-U-Net architecture, which predicts character probabilities directly from raw audio using learnt multi-scale representations of the various signal components. There are no sub-modules whose interdependencies need to be optimized. Our training procedure is designed to work with weak, line-level annotations available in the real world. With a mean alignment error of 0.35s on a standard dataset our system outperforms the state-of-the-art by an order of magnitude.

up
0 users have voted:

Paper Details

Authors:
Daniel Stoller, Simon Durand, Sebastian Ewert
Submitted On:
17 May 2019 - 5:14am
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Daniel Stoller
Paper Code:
AASP-L7
Document Year:
2019
Cite

Document Files

ICASSP 2019 Slides as presented in the session

(42)

Demo video for presentation slides

(44)

Subscribe

[1] Daniel Stoller, Simon Durand, Sebastian Ewert, "End-to-End Lyrics Alignment Using An Audio-to-Character Recognition Model", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4220. Accessed: Dec. 14, 2019.
@article{4220-19,
url = {http://sigport.org/4220},
author = {Daniel Stoller; Simon Durand; Sebastian Ewert },
publisher = {IEEE SigPort},
title = {End-to-End Lyrics Alignment Using An Audio-to-Character Recognition Model},
year = {2019} }
TY - EJOUR
T1 - End-to-End Lyrics Alignment Using An Audio-to-Character Recognition Model
AU - Daniel Stoller; Simon Durand; Sebastian Ewert
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4220
ER -
Daniel Stoller, Simon Durand, Sebastian Ewert. (2019). End-to-End Lyrics Alignment Using An Audio-to-Character Recognition Model. IEEE SigPort. http://sigport.org/4220
Daniel Stoller, Simon Durand, Sebastian Ewert, 2019. End-to-End Lyrics Alignment Using An Audio-to-Character Recognition Model. Available at: http://sigport.org/4220.
Daniel Stoller, Simon Durand, Sebastian Ewert. (2019). "End-to-End Lyrics Alignment Using An Audio-to-Character Recognition Model." Web.
1. Daniel Stoller, Simon Durand, Sebastian Ewert. End-to-End Lyrics Alignment Using An Audio-to-Character Recognition Model [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4220