Datasets
- Children's Song Dataset (CSD): a collection of 50 Korean and 50 English songs sung by one Korean female professional pop singer. Each song is recorded in two separate keys resulting in a total of 200 audio recordings. Each audio recording is paired with a MIDI transcription and lyrics annotations in both grapheme-level and phoneme-level.
- dim-sim: a collection of user-annotated music similarity triplet ratings used to evaluate music similarity search and related algorithms. Our similarity ratings are linked to the Million Song Dataset (MSD) and were collected for the following paper.
- K-pop Vocal Tagging (KVT) : a collection of 6,787 vocal segments in 466 K-pop songs with 70 vocal tags that describe various characteristics in terms of timbre, technique, pitch range, genre, gender, and so on.
- The DJ Mix Dataset : metadata of DJ mixes played by human DJs and played tracks in the mixes. We extracted the metadata from MixesDB and carefully defined it.
- EMOPIA : multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators. Since the clips are not restricted to one clip per song, they can also be used for song-level analysis.
- YM2413-MDB : an 80s FM video game music dataset with multi-label emotion annotations. It includes 669 audio and MIDI files of music from Sega and MSX PC games in the 80s using YM2413, a programmable sound generator based on FM. The collected game music is arranged with a subset of 15 monophonic instruments and one drum instrument. They were converted from binary commands of the YM2413 sound chip. Each song was labeled with 19 emotion tags by two annotators and validated by three verifiers to obtain refined tags
- Extended Cleaned tag and Artist-Level Stratified split (ECALS): An extended tag version of CALS split (cleaned and artist-level stratified) for the Million Song Dataset (MSD). Different from the previously CALS dataset split, this provides 1054 vocabulary and caption level tag sequences instead of the 50 small vocabs.
- LP-MusicCaps: An LLM-based Pseudo Music Caption dataset for text-to-music and music-to-text tasks. The music-to-caption pairs were constructed with tag-to-caption generation using three existing multi-label tag datasets and four task instructions. The data sources are MusicCaps, Magnatagtune, and Million Song Dataset ECALS subset.
Source Codes and Software Library
- PyTSMod : Python-based Time Scale Modification (TSM) algorithms including Overlap-Add (OLA), Pitch Synchronous Overlap-Add (TD-PSOLA), Waveform Similarity Overlap-Add (WSOLA), Phase Vocoder (PV), and TSM based on harmonic-percussive source separation (HPTSM).
- JDC vocal melody extraction : a pre-trained deep neural network model for singing voice detection and melody extraction
Courses and Tutorial (by Prof. Juhan Nam)
Address: 291 Daehak-ro, Yuseong-gu, Daejeon (34141)
N25 #3236, KAIST, South Korea