Music and Audio Computing Lab

Multimodal Music Retrieval


Main Contributors: Seungheon Doh, Jeong Choi

Artists express their music not only by the sound but also lyrics, album cover or artist images, and music videos. We listen to music with our ears but we also see the visual elements, and read lyrics, review, and comments of music. In this research topic, we explore the multimodality of music and propose novel learning models for cross-modal retrieval in music listening.




Music and Text

With the growth of online music streaming services, users have to find music that suits their taste from a large-scale music database. Semantic search by text is an essential feature that allows users to find songs in their needs. We are currently working on various query-by-text methods by associating audio embedding with word embedding. For example:


  1. Unseen query retireval: Search for music without annotated words
  2. Sentence query retrieval: Search for music with sentence-level query
  3. Multi-lingual query retrieval: Search for music with multi-lingual query



Related Publications

  • Million Song Search: Web Interface for Semantic Music Search Using Musical Word Embedding
    Seungheon Doh, Jongpil Lee, and Juhan Nam
    Late Breaking Demo in the 22st International Society for Music Information Retrieval Conference (ISMIR), 2021
  • Musical Word Embedding: Bridging the Gap between Listening Contexts and Music
    Seungheon Doh, Jongpil Lee, Tae Hong Park, and Juhan Nam
    Machine Learning for Media Discovery Workshop, International Conference on Machine Learning (ICML), 2020 [paper] [website]
  • Zero-shot Learning for Audio-based Music Classification and Tagging
    Jeong Choi, Jongpil Lee, Jiyoung Park, and Juhan Nam
    Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), 2019 [paper] [code]


Music and Image

Images are often more effective than words to express the sentiment of music. This approach explores various multi-modal methods to retieve music using images as a query or retrieve cover images given a music playlist.