Acoustic Transfer and Enhancement

Main Contributors: Jaekwon Im

Objective: Acoustic transfer and enhancement using generative models for improved listening experience.

Table of Contents:
Background: Acoustic Transfer and Enhancement Topic 1: Recording Environment Transfer Topic 2: Audio Super-Resolution

Background: Acoustic Transfer and Enhancement

Acoustic characteristics can vary depending on the recording environment (e.g., microphone and room acoustics) and processing methods. The appropriate acoustic setup depends on the intended use of the audio. However, achieving the desired acoustic quality using traditional methods often requires substantial effort and expert knowledge. To address this challenge, we explore acoustic transfer and enhancement using generative models.

Topic 1: Recording Environment Transfer

Diff-R-EN-T introduces a diffusion-based framework for recording environment transfer in speech. The model separates speech content from recording conditions and generates audio that preserves the original speech while adapting it to a new acoustic environment.

Related Publications

DIFFRENT: A Diffusion Model for Recording Environment Transfer of Speech
Jaekwon Im and Juhan Nam
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
[paper] [demo]

Topic 2: Audio Super-Resolution

Audio super-resolution aims to reconstruct high-frequency components that are lost due to bandwidth limitations or compression.

FlashSR is a one-step diffusion-based model designed for versatile audio super-resolution. By utilizing a diffusion distillation method, FlashSR performs restoration efficiently while maintaining high perceptual quality.

To further improve restoration performance, SAGA-SR incorporates semantic and acoustic guidance into the super-resolution process, enabling the model to generate audio that better aligns with both the content and acoustic characteristics of the original signal.

Related Publications

FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation
Jaekwon Im and Juhan Nam
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
[paper] [code] [demo]
SAGA-SR: Semantically and Acoustically Guided Audio Super-Resolution
Jaekwon Im and Juhan Nam
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026
[paper] [demo]