Audio samples of DiffWave
(Unofficial) Tensorflow implementation of DiffWave (Zhifeng Kong et al., 2020)
LJ Speech-1.1 Neural Vocoder
Audio samples are conditioned by ground-truth mel-spectrogram.
Ground-truth |
DiffWave (channels=64, T=20, 500k) |
DiffWave (channels=64, T=20, 1M) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Denoising Diffusion process of DiffWave
Audio samples from denoising, diffusion process, which add noise steadily.
t=0 for denoised audio, t=N for gaussian noise.
Suggest to listen from t=2 to 0 after turning down the volume.
steps |
Ground-truth (Diffusion) |
DiffWave 500k steps |
DiffWave 1M steps |
t=20 |
|
|
|
t=10 |
|
|
|
t=5 |
|
|
|
t=4 |
|
|
|
t=3 |
|
|
|
t=2 |
|
|
|
t=1 |
|
|
|
t=0 |
|
|
|