VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration
VoiceRestore is a novel approach to speech recording quality restoration using flow-matching transformers. It addresses a wide range of degradations including reverberation, noise, compression artifacts, and low sampling rates.
Repo: Github Repository with inference code and pre-trained model
Key Features
- Unified model for diverse speech recording restoration tasks
- Leverages conditional flow matching and classifier-free guidance
- State-of-the-art performance across multiple degradation types
- Strong generalization to unseen degradation combinations
Audio Restoration Demo
- VoiceRestore uses BigVGAN2 24khz pre-trained checkpoint to generate audio from spectrograms
- Generation was done with 32 flow steps and cfg of 1.0
Full Degradation
Degraded Audio

Restored Audio

Distortion
Degraded Audio

Restored Audio

Reverb Effect
Degraded Audio

Restored Audio

16kHz Noisy Sample
Degraded Audio

Restored Audio

Resemble-Enhance Comparison
- Resemble-Enhance is used with default settings provided in the github repository.
Full Degradation
Original Degraded

Resemble-Enhance

VoiceRestore

Combinations - Reverb, Distortion, Random Cut
Original Degraded

Resemble-Enhance

VoiceRestore

CMGAN Comparison
- CMGAN is used with default settings provided in the github repository.
Full Degradation
Original Degraded

CMGAN Restored

VoiceRestore

16kHz Sample
Original Degraded

CMGAN Restored

VoiceRestore

Reverb Effect
Original Degraded

CMGAN Restored

VoiceRestore
