VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration
VoiceRestore is a novel approach to speech recording quality restoration using flow-matching transformers. It addresses a wide range of degradations including reverberation, noise, compression artifacts, and low sampling rates.
Repo: Github Repository with inference code and pre-trained model
Key Features
- Unified model for diverse speech recording restoration tasks
- Leverages conditional flow matching and classifier-free guidance
- State-of-the-art performance across multiple degradation types
- Strong generalization to unseen degradation combinations
Audio Restoration Demo
- VoiceRestore uses BigVGAN2 24khz pre-trained checkpoint to generate audio from spectrograms
- Generation was done with 32 flow steps and cfg of 1.0
Full Degradation
Degraded Audio
Restored Audio
Distortion
Degraded Audio
Restored Audio
Reverb Effect
Degraded Audio
Restored Audio
16kHz Noisy Sample
Degraded Audio
Restored Audio
Resemble-Enhance Comparison
- Resemble-Enhance is used with default settings provided in the github repository.
Full Degradation
Original Degraded
Resemble-Enhance
VoiceRestore
Combinations - Reverb, Distortion, Random Cut
Original Degraded
Resemble-Enhance
VoiceRestore
CMGAN Comparison
- CMGAN is used with default settings provided in the github repository.