Music and Audio Computing Lab

Datasets

  • Children's Song Dataset (CSD): a collection of 50 Korean and 50 English songs sung by one Korean female professional pop singer. Each song is recorded in two separate keys resulting in a total of 200 audio recordings. Each audio recording is paired with a MIDI transcription and lyrics annotations in both grapheme-level and phoneme-level.
  • dim-sim: a collection of user-annotated music similarity triplet ratings used to evaluate music similarity search and related algorithms. Our similarity ratings are linked to the Million Song Dataset (MSD) and were collected for the following paper.
  • K-pop Vocal Tagging (KVT) : a collection of 6,787 vocal segments in 466 K-pop songs with 70 vocal tags that describe various characteristics in terms of timbre, technique, pitch range, genre, gender, and so on.
  • The DJ Mix Dataset : metadata of DJ mixes played by human DJs and played tracks in the mixes. We extracted the metadata from MixesDB and carefully defined it.
  • EMOPIA : multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators. Since the clips are not restricted to one clip per song, they can also be used for song-level analysis.
  • YM2413-MDB : an 80s FM video game music dataset with multi-label emotion annotations. It includes 669 audio and MIDI files of music from Sega and MSX PC games in the 80s using YM2413, a programmable sound generator based on FM. The collected game music is arranged with a subset of 15 monophonic instruments and one drum instrument. They were converted from binary commands of the YM2413 sound chip. Each song was labeled with 19 emotion tags by two annotators and validated by three verifiers to obtain refined tags


Source Codes and Software Library

  • PyTSMod : Python-based Time Scale Modification (TSM) algorithms including Overlap-Add (OLA), Pitch Synchronous Overlap-Add (TD-PSOLA), Waveform Similarity Overlap-Add (WSOLA), Phase Vocoder (PV), and TSM based on harmonic-percussive source separation (HPTSM).
  • JDC vocal melody extraction : a pre-trained deep neural network model for singing voice detection and melody extraction


Courses and Tutorial (by Prof. Juhan Nam)

Address: 291 Daehak-ro, Yuseong-gu, Daejeon (34141)

N25 #3236, KAIST, South Korea