WaveNet
E39544
WaveNet is a deep generative neural network architecture for raw audio that produces highly natural-sounding speech and other audio signals.
Aliases (1)
Statements (51)
| Predicate | Object |
|---|---|
| instanceOf |
autoregressive model
→
deep generative model → neural network architecture → text-to-speech model → |
| application |
music generation
→
neural vocoder for parametric TTS → voice conversion → |
| arxivId |
1609.03499
→
|
| basedOn |
causal convolutional neural networks
→
|
| designedFor |
audio signal modeling
→
raw audio generation → speech synthesis → |
| developedBy |
DeepMind
→
Google DeepMind → |
| hasProperty |
high computational cost at inference
→
highly natural-sounding speech output → parallelization across time is limited by autoregressive structure → |
| improvedUpon |
concatenative text-to-speech systems
→
parametric HMM-based TTS systems → |
| inputType |
discretized audio waveform samples
→
|
| inspired |
PixelCNN
→
|
| introducedBy |
Aaron van den Oord
→
Alex Graves → Heiga Zen → Karen Simonyan → Nal Kalchbrenner → Oriol Vinyals → Sander Dieleman → |
| introducedIn |
2016
→
|
| introducedInPaper |
WaveNet: A Generative Model for Raw Audio
→
|
| language |
Python
→
|
| ledTo |
Parallel WaveNet
→
WaveGlow → WaveRNN → neural vocoder architectures → |
| outputType |
probability distribution over next audio sample
→
|
| publishedAt |
arXiv
→
|
| relatedTo |
PixelRNN
→
autoregressive image models → |
| supports |
general raw waveform modeling
→
music audio generation → speaker-conditioned speech synthesis → text-conditioned speech synthesis → |
| trainingObjective |
cross-entropy loss over quantized samples
→
maximum likelihood estimation → |
| usedIn |
Google Assistant text-to-speech
→
Google Cloud Text-to-Speech → |
| uses |
autoregressive sample-by-sample prediction
→
conditional generative modeling → dilated causal convolutions → softmax output over quantized audio samples → |