WaveNet

E39544

WaveNet is a deep generative neural network architecture for raw audio that produces highly natural-sounding speech and other audio signals.


Statements (51)
Predicate Object
instanceOf autoregressive model
deep generative model
neural network architecture
text-to-speech model
application music generation
neural vocoder for parametric TTS
voice conversion
arxivId 1609.03499
basedOn causal convolutional neural networks
designedFor audio signal modeling
raw audio generation
speech synthesis
developedBy DeepMind
Google DeepMind
hasProperty high computational cost at inference
highly natural-sounding speech output
parallelization across time is limited by autoregressive structure
improvedUpon concatenative text-to-speech systems
parametric HMM-based TTS systems
inputType discretized audio waveform samples
inspired PixelCNN
introducedBy Aaron van den Oord
Alex Graves
Heiga Zen
Karen Simonyan
Nal Kalchbrenner
Oriol Vinyals
Sander Dieleman
introducedIn 2016
introducedInPaper WaveNet: A Generative Model for Raw Audio
language Python
ledTo Parallel WaveNet
WaveGlow
WaveRNN
neural vocoder architectures
outputType probability distribution over next audio sample
publishedAt arXiv
relatedTo PixelRNN
autoregressive image models
supports general raw waveform modeling
music audio generation
speaker-conditioned speech synthesis
text-conditioned speech synthesis
trainingObjective cross-entropy loss over quantized samples
maximum likelihood estimation
usedIn Google Assistant text-to-speech
Google Cloud Text-to-Speech
uses autoregressive sample-by-sample prediction
conditional generative modeling
dilated causal convolutions
softmax output over quantized audio samples

Referenced by (2)
Subject (surface form when different) Predicate
DeepMind
developed
WaveNet ("WaveNet: A Generative Model for Raw Audio")
introducedInPaper

Please wait…