VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

VoiceRestore is a novel approach to speech recording quality restoration using flow-matching transformers. It addresses a wide range of degradations including reverberation, noise, compression artifacts, and low sampling rates.

Repo: Github Repository with inference code and pre-trained model

Key Features

Audio Restoration Demo

Full Degradation

Degraded Audio
Mel-log spectrogram of fully degraded audio
Restored Audio
Mel-log spectrogram of restored fully degraded audio

Distortion

Degraded Audio
Mel-log spectrogram of fully degraded audio
Restored Audio
Mel-log spectrogram of restored fully degraded audio

Reverb Effect

Degraded Audio
Mel-log spectrogram of reverb-affected audio
Restored Audio
Mel-log spectrogram of restored reverb-affected audio

16kHz Noisy Sample

Degraded Audio
Mel-log spectrogram of 16kHz degraded audio
Restored Audio
Mel-log spectrogram of restored 16kHz audio

Resemble-Enhance Comparison

Full Degradation

Original Degraded
Mel-log spectrogram of fully degraded audio
Resemble-Enhance
Mel-log spectrogram of Resemble-Enhance restored fully degraded audio
VoiceRestore
Mel-log spectrogram of VoiceRestore restored fully degraded audio

Combinations - Reverb, Distortion, Random Cut

Original Degraded
Mel-log spectrogram of fully degraded audio
Resemble-Enhance
Mel-log spectrogram of Resemble-Enhance restored fully degraded audio
VoiceRestore
Mel-log spectrogram of VoiceRestore restored fully degraded audio

CMGAN Comparison

Full Degradation

Original Degraded
Mel-log spectrogram of fully degraded audio
CMGAN Restored
Mel-log spectrogram of CMGAN restored fully degraded audio
VoiceRestore
Mel-log spectrogram of VoiceRestore restored fully degraded audio

16kHz Sample

Original Degraded
Mel-log spectrogram of 16kHz degraded audio
CMGAN Restored
Mel-log spectrogram of CMGAN restored 16kHz audio
VoiceRestore
Mel-log spectrogram of VoiceRestore restored 16kHz audio

Reverb Effect

Original Degraded
Mel-log spectrogram of reverb-affected audio
CMGAN Restored
Mel-log spectrogram of CMGAN restored reverb-affected audio
VoiceRestore
Mel-log spectrogram of VoiceRestore restored reverb-affected audio