Music and Audio Computing Lab

Vocal Melody Extraction and Transcription


Main Contributor: Sangeun Kum

Vocal melody extraction is the task that identifies the melody pitch contour of singing voice from mixed audio. Vocal melody transcription processes the pitch information further and detects musical notes along with beat or other rhythm information. We have developed deep neural network models for the tasks and have investigated training methods to tackle the scarsity of labeled dataset.


Estimated Vocal Melodic Pitch from Mixed Audio


Related Publications

  • Pseudo-Label Transfer from Frame-Level to Note-Level in a Teacher-Student Framework for Singing Transcription from Polyphonic Music
    Sangeun Kum, Jongpil Lee, Keunhyoung Luke Kim, Taehyoung Kim, and Juhan Nam
    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022 (accepted)
  • Semi-Supervised Learning Using Teacher-Student Models for Vocal Melody Extraction
    Sangeun Kum, Jing-Hua Lin, Li Su, and Juhan Nam
    Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), 2020 [paper]
  • Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks
    Sangeun Kum and Juhan Nam
    Applied Sciences, 2019 [paper] [code]
  • Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks
    Sangeun Kum, Changheun Oh, and Juhan Nam
    Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016 [paper] [website]