WaveNet

E39544

autoregressive model deep generative model neural network architecture text-to-speech model

WaveNet is a deep generative neural network architecture for raw audio that produces highly natural-sounding speech and other audio signals.

Aliases (1)

WaveNet: A Generative Model for Raw Audio ×1

Statements (51)

Predicate	Object
instanceOf	autoregressive model → deep generative model → neural network architecture → text-to-speech model →
application	music generation → neural vocoder for parametric TTS → voice conversion →
arxivId	1609.03499 →
basedOn	causal convolutional neural networks →
designedFor	audio signal modeling → raw audio generation → speech synthesis →
developedBy	DeepMind → Google DeepMind →
hasProperty	high computational cost at inference → highly natural-sounding speech output → parallelization across time is limited by autoregressive structure →
improvedUpon	concatenative text-to-speech systems → parametric HMM-based TTS systems →
inputType	discretized audio waveform samples →
inspired	PixelCNN →
introducedBy	Aaron van den Oord → Alex Graves → Heiga Zen → Karen Simonyan → Nal Kalchbrenner → Oriol Vinyals → Sander Dieleman →
introducedIn	2016 →
introducedInPaper	WaveNet: A Generative Model for Raw Audio →
language	Python →
ledTo	Parallel WaveNet → WaveGlow → WaveRNN → neural vocoder architectures →
outputType	probability distribution over next audio sample →
publishedAt	arXiv →
relatedTo	PixelRNN → autoregressive image models →
supports	general raw waveform modeling → music audio generation → speaker-conditioned speech synthesis → text-conditioned speech synthesis →
trainingObjective	cross-entropy loss over quantized samples → maximum likelihood estimation →
usedIn	Google Assistant text-to-speech → Google Cloud Text-to-Speech →
uses	autoregressive sample-by-sample prediction → conditional generative modeling → dilated causal convolutions → softmax output over quantized audio samples →

Referenced by (2)

Subject (surface form when different)	Predicate
DeepMind →	developed
WaveNet ("WaveNet: A Generative Model for Raw Audio") →	introducedInPaper