Music and Audio Computing Lab

Symbolic Music Generation


Main Contributors: Eunjin Choi, Hayeon Bang

Objective: Our research covers the full pipeline of symbolic music research — from collecting new datasets and analyzing musical content, to generating scores with data-driven models.

Background: Symbolic Music Generation

Music generation has received great attention from researchers for decades. The history of automatic music composition is very old, starting with Mozart's musical dice game in 18th century, and music generation has been mostly studied as rule-based approach. But with the recent revival of deep learning, music generation has been actively carried out by a data-driven approach. The process of music generation can be divided into three parts: score generation, performance generation, and sound generation. Above all, symbolic music generation is aimed at generating a score, which is a notation-based format of music representation.

Our group pursues symbolic music research across the entire spectrum: we build new datasets, develop models for music analysis and understanding, and explore generative approaches to music composition. Our ultimate goal is to build music understanding and generation systems that are both musically intuitive and easily controllable — systems that can be guided by musical knowledge, emotional intent, or structural constraints in a way that is intuitive for humans.


Topic 1: Symbolic Music Generation

Our generative research spans two complementary directions: music-theory-grounded generation and emotion-based generation.

Music-Theory-Grounded Generation

One direction toward more musical and controllable generation is to incorporate music theory as explicit conditioning. We studied this from classical music — where rules are most clearly defined and must be strictly obeyed — to pop music, where musical conventions are looser and more flexible. In the classical domain, our work on four-part Bach chorale generation teaches models to respect fundamental counterpoint rules such as avoiding parallel fifths and octaves. Moving toward pop music, D3PIA tackles lead-sheet-conditioned accompaniment generation — given a melody and chord symbols, the model generates a piano arrangement using a discrete denoising diffusion model (D3PM), operating in a piano roll-level note state space.


Related Publications

  • D3PIA: A Discrete Denoising Diffusion Model for Piano Accompaniment Generation from Lead Sheet
    Eunjin Choi, Hounsu Kim, Hayeon Bang, Taegyun Kwon, Juhan Nam
    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026
    [paper] [github] [demo]
  • Teaching Chorale Generation Model to Avoid Parallel Motions
    Eunjin Choi, Hyerin Kim, Juhan Nam, Dasaem Jeong
    Proceedings of the 15th International Symposium on Computer Music Multidisciplinary Research (CMMR), 2023
    [paper]

Emotion-Based Generation

EMOPIA is a multi-modal pop piano dataset annotated with emotion quadrants based on the Russell valence-arousal model, enabling controllable generation conditioned on target emotional states. YM2413-MDB extends emotion-annotated music research to the game music domain, covering multi-instrumental video game music from the Sega Master System with a tag-based annotation method for game-specific emotional and stylistic labeling. Together, these works explore how emotional semantics can guide symbolic music generation across diverse musical styles.


Related Publications

  • YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with Emotion Annotations
    Eunjin Choi, Yoonjin Chung, Seolhee Lee, Jong Ik Jeon, Taegyun Kwon, Juhan Nam
    Proceedings of the 23rd International Society for Music Information Retrieval Conference (ISMIR), 2022
    [paper] [github] [demo]
  • EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation
    Hsiao-Tzu Hung, Joann Ching, Seungheon Doh, Nabin Kim, Juhan Nam, Yi-Hsuan Yang
    Proceedings of the 22nd International Society for Music Information Retrieval Conference (ISMIR), 2021
    [paper] [github] [demo]


Topic 2: Symbolic Music Understanding & Analysis

Generating musically meaningful output requires not only generative models, but also a deep understanding of musical content. Analysis modules — capable of recognizing structure, style, and semantics — are essential building blocks for controllable and high-quality music generation systems. For this reason, our group actively pursues symbolic music understanding and analysis alongside generation. A key focus is learning multimodal representations that connect audio, symbolic (MIDI), and text modalities. PIAST provides the dataset foundation — a multimodal piano dataset with aligned audio, MIDI, and text annotations. Building on it, PianoBind trains a joint embedding model that brings all three modalities into a shared latent space via contrastive learning, enabling cross-modal retrieval and downstream music understanding tasks. This contrastive embedding approach also proves useful beyond retrieval: in our study on Lakh MIDI Dataset de-duplication, we apply learned symbolic music embeddings to systematically detect duplicate entries in the most widely used symbolic music corpus at scale, providing a cleaned dataset and guidelines for more reliable research.


Related Publications

  • PianoBind: A Multimodal Joint Embedding Model for Pop-piano Music
    Hayeon Bang, Eunjin Choi, Seungheon Doh, Juhan Nam
    Proceedings of the 26th International Society for Music Information Retrieval Conference (ISMIR), 2025
    [paper] [github] [demo]
  • On the De-duplication of the Lakh MIDI Dataset
    Eunjin Choi, Hyerin Kim, Jiwoo Ryu, Juhan Nam, Dasaem Jeong
    Proceedings of the 26th International Society for Music Information Retrieval Conference (ISMIR), 2025
    [paper] [github] [demo]
  • PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text
    Hayeon Bang, Eunjin Choi, Michael Finch, Seungheon Doh, Seolhee Lee, Gyeong-Hoon Lee, Juhan Nam
    3rd Workshop on NLP for Music and Audio (NLP4MusA), 2024
    [paper] [github] [demo]
  • Bridging Audio and Symbolic Piano Data through a Web-Based Annotation Interface
    Seolhee Lee, Eunjin Choi, Joonhyung Bae, Hyerin Kim, Eita Nakamura, Dasaem Jeong, Juhan Nam
    Extended Abstracts for the Late-Breaking Demo Session of the 24th International Society for Music Information Retrieval Conference (ISMIR), 2023
    [paper]