WaveNet
E39544
WaveNet is a deep generative neural network architecture for raw audio that produces highly natural-sounding speech and other audio signals.
All labels observed (3)
| Label | Occurrences |
|---|---|
| WaveNet canonical | 12 |
| WaveNet: A Generative Model for Raw Audio | 3 |
| WaveNet voices | 1 |
Statements (51)
| Predicate | Object |
|---|---|
| instanceOf |
autoregressive model
ⓘ
deep generative model ⓘ neural network architecture ⓘ text-to-speech model ⓘ |
| application |
music generation
ⓘ
neural vocoder for parametric TTS ⓘ voice conversion ⓘ |
| arxivId | 1609.03499 ⓘ |
| basedOn | causal convolutional neural networks ⓘ |
| designedFor |
audio signal modeling
ⓘ
raw audio generation ⓘ speech synthesis ⓘ |
| developedBy |
DeepMind
ⓘ
DeepMind ⓘ
surface form:
Google DeepMind
|
| hasProperty |
high computational cost at inference
ⓘ
highly natural-sounding speech output ⓘ parallelization across time is limited by autoregressive structure ⓘ |
| improvedUpon |
concatenative text-to-speech systems
ⓘ
parametric HMM-based TTS systems ⓘ |
| inputType | discretized audio waveform samples ⓘ |
| inspired | PixelCNN ⓘ |
| introducedBy |
Aaron van den Oord
ⓘ
Alex Graves ⓘ Heiga Zen ⓘ Karen Simonyan ⓘ Nal Kalchbrenner ⓘ Oriol Vinyals ⓘ Sander Dieleman ⓘ |
| introducedIn | 2016 ⓘ |
| introducedInPaper |
WaveNet
self-linksurface differs
ⓘ
surface form:
WaveNet: A Generative Model for Raw Audio
|
| language | Python ⓘ |
| ledTo |
Parallel WaveNet
ⓘ
WaveGlow ⓘ WaveRNN ⓘ neural vocoder architectures ⓘ |
| outputType | probability distribution over next audio sample ⓘ |
| publishedAt | arXiv ⓘ |
| relatedTo |
PixelRNN
ⓘ
autoregressive image models ⓘ |
| supports |
general raw waveform modeling
ⓘ
music audio generation ⓘ speaker-conditioned speech synthesis ⓘ text-conditioned speech synthesis ⓘ |
| trainingObjective |
cross-entropy loss over quantized samples
ⓘ
maximum likelihood estimation ⓘ |
| usedIn |
Google Assistant
ⓘ
surface form:
Google Assistant text-to-speech
Google Cloud Text-to-Speech ⓘ |
| uses |
autoregressive sample-by-sample prediction
ⓘ
conditional generative modeling ⓘ dilated causal convolutions ⓘ softmax output over quantized audio samples ⓘ |
Referenced by (16)
Full triples — surface form annotated when it differs from this entity's canonical label.
this entity surface form:
WaveNet: A Generative Model for Raw Audio
this entity surface form:
WaveNet: A Generative Model for Raw Audio
this entity surface form:
WaveNet voices
this entity surface form:
WaveNet: A Generative Model for Raw Audio