Audio samples of DiffWave

(Unofficial) Tensorflow implementation of DiffWave (Zhifeng Kong et al., 2020)

LJ Speech-1.1 Neural Vocoder

Audio samples are conditioned by ground-truth mel-spectrogram.

Ground-truth DiffWave (channels=64, T=20, 500k) DiffWave (channels=64, T=20, 1M)

Denoising Diffusion process of DiffWave

Audio samples from denoising, diffusion process, which add noise steadily.
t=0 for denoised audio, t=N for gaussian noise.

Suggest to listen from t=2 to 0 after turning down the volume.

steps Ground-truth (Diffusion) DiffWave 500k steps DiffWave 1M steps
t=20
t=10
t=5
t=4
t=3
t=2
t=1
t=0