Music and Audio Computing Lab

Neural Voice Processing


Main Contributors: Jaekwon Im

The acoustic characteristics of audio are determined by various conditions (the type and position of microphones, room acoustics, sample rate, ambient noises, etc.) This research aims to manipulate these conditions of recorded audio.



Recording Environment Transfer of Speech

Properly setting up recording conditions, including microphone type and placement, room acoustics, and ambient noise, is essential to obtaining the desired acoustic characteristics of speech. For instance, voice-overs or audiobooks require a clean environment with high-quality microphones and non-reverberant space. On the other hand, automated dialog replacement (ADR) demands post-production to replicate the acoustic qualities that provide environment information of the dialog. While its significance is indisputable, producing speech audio within a targeted recording environment requires considerable professional knowledge and effort. To tackle this problem, we introduce "recording environment transfer" which transforms input speech to have the recording conditions of a reference speech.



Related Publications

  • DIFFRENT: A Diffusion Model for Recording Environment Transfer of Speech
    Jaekwon Im and Juhan Nam
    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
    [demo]
  • Neural Vocoder Feature Estimation for Dry Singing Voice Separation
    Jaekwon Im, Soonbeom Choi, Sangeon Yong, and Juhan Nam
    Proceedings of the 14th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2022
    [paper] [demo]