Million Song Dataset - Additions and Benchmarks

Overview

The Million Song Dataset (MSD) [1], a collection of one million western popular music pieces, has enabled a large-scale research for many MIR applications. The dataset comes with a set of features extracted by the former API of The Echonest, which include tempo, loudness, timings of fade-in and fade-out, and MFCC-like features for a number of segments.

The dataset does however not provide an easy download possibility for the audio files, thus researchers are basically limited to the features provided with the dataset. Using a content provider, for which links with unique IDs to the internal database existed in the MSD, we downloaded audio samples, mostly in the form of 30 or 60 second snippets. Subsequently, we provide a multitude of features extracted from these samples, to allow comparison between them.

Contributors

Alexander Schindler
Rudolf Mayer
Thomas Lidy
Peter Knees
Andreas Rauber

Ground Truth Assignments

To allow for popular tasks in Music Information retrieval research such as musical genre classification, we further provide a categorisation of a subset of the collection into genres obtained from the All Music Guide (allmusic.com).

AllMusic is a Web portal of a large music information database including album reviews, artist biographies as well as expert tagging for albums according into genres, styles, moods and themes. Data-collection is aligned to \cite{schindler2012}. Meta-data was automatically collected from AllMusic using a web-scraping script based on direct string matching to query for artist-release combinations. From the resulting Album Web page genre, style, mood and theme tags were collected (see Table \ref{tab:tags_overview}).

Genres

The Genres Tag-Set is the largest collection with a skewed distribution towards the label 'Pop/Rock'. The set further contains cross-genre tags such as Holiday and Children.

Styles

The Styles Tag-Set complement genres and provide a more detailed description of music characteristic such as rhythm, harmony, instrumentation, etc.

Moods

Tags of the Moods Tag-Set depict emotions expressed by the music.

Themes

The Themes Tag-Set describes certain cross-genre scenarios or occasions such as holiday, party or Christmas.

We further provide a number of splits into training / test sets that should ensure comparability of the experiments. These are in detail:

"Traditional" split with 2/3 training and 1/3 testing data
Split with a large proportion of training data
Split with a small proportion of training data

Papers / Studies

Alexander Schindler and Peter Knees. Multi-Task Music Representation Learning from Multi-Label Embeddings. In Proceedings of the International Conference on Content-Based Multimedia Indexing (CBMI2019). Dublin, Ireland, 4-6 Sept 2019.
Alexander Schindler, Thomas Lidy, and Andreas Rauber. Comparing shallow versus deep neural network architectures for automatic music genre classification. In Proceedings of the 9th Forum Media Technology (FMT2016), St. Poelten, Austria, November 23 - November 24 2016.
Alexander Schindler and Andreas Rauber. Capturing the temporal domain in echonest features for improved classification effectiveness. In Adaptive Multimedia Retrieval, Lecture Notes in Computer Science, Copenhagen, Denmark, October 24-25 2012. Springer.
Alexander Schindler, Rudolf Mayer, and Andreas Rauber. Facilitating comprehensive benchmarking experiments on the million song dataset. In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012), pages 469-474, Porto, Portugal, October 8-12 2012.
Alexander Schindler. Million song dataset integration into the clubmixer framework. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), Miami, USA, October 24-28 2011.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ground_truth_assignments		ground_truth_assignments
splits		splits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Million Song Dataset - Additions and Benchmarks

Overview

Contributors

Ground Truth Assignments

Genres

Styles

Moods

Themes

Papers / Studies

About

Releases

Packages

Languages

tuwien-musicir/msd

Folders and files

Latest commit

History

Repository files navigation

Million Song Dataset - Additions and Benchmarks

Overview

Contributors

Ground Truth Assignments

Genres

Styles

Moods

Themes

Papers / Studies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages