Multi-modal Analysis of Music: a large-scale Evaluation
Abstract
Multimedia data by definition comprises several
different types of content. Music specifically inherits audio at
its core, text in the form of lyrics, images by means of album
covers, and video in the form of music videos. Yet, in many Music
Information Retrieval applications, only the audio content is
utilised. A few recent studies have however shown the usefulness
of incorporating also other modalities; in most of these studies,
textual information in the form of song lyrics or also artist biographies,
were employed. Following this direction, the contribution of
this paper is a large-scale evaluation of the combination of audio
and text (lyrics) features for genre classification, on a database
comprising over 20.000 songs. We briefly present the audio and
lyrics features employed, and provide an in-depth discussion of
the experimental results.