Music and Audio Computing Lab | CT

Vocal Melody Extraction and Transcription in Polyphonic Music

Main Contributor: Sangeun Kum

Vocal melody extraction is the task that identifies the melody pitch contour of singing voice from mixed audio. Vocal melody transcription processes the pitch information further and detects musical notes along with beat or other rhythm information. We have developed deep neural network models for the tasks and have investigated training methods to tackle the scarsity of labeled dataset.

Estimated Vocal Melodic Pitch from Mixed Audio

Related Publications

Pseudo-Label Transfer from Frame-Level to Note-Level in a Teacher-Student Framework for Singing Transcription from Polyphonic Music
Sangeun Kum, Jongpil Lee, Keunhyoung Luke Kim, Taehyoung Kim, and Juhan Nam
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022 [paper]
Semi-Supervised Learning Using Teacher-Student Models for Vocal Melody Extraction
Sangeun Kum, Jing-Hua Lin, Li Su, and Juhan Nam
Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), 2020 [paper]
Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks
Sangeun Kum and Juhan Nam
Applied Sciences, 2019 [paper] [code]
Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks
Sangeun Kum, Changheun Oh, and Juhan Nam
Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016 [paper] [website]