Common Voice

E405890

Common Voice is an open-source, crowdsourced dataset of voice recordings created to help train and improve speech recognition technologies for diverse languages and accents.

Try in SPARQL Jump to: Surface forms Statements Referenced by

All labels observed (1)

Label Occurrences
Common Voice canonical 1

Statements (68)

Predicate Object
instanceOf Mozilla project
crowdsourced project
open-source project
speech dataset
accessPolicy freely downloadable
collectionMethod web-based recording interface
crowdsourced true
dataFormat audio recordings
text transcripts
dataType read speech
validated speech clips
developer Mozilla Foundation
focus accent diversity
language diversity
open data for speech technology
genre multilingual corpus
speech recognition dataset
hasLanguageCoverage Arabic
Basque
Bengali
Breton
Catalan
Chinese
Dutch
English
Esperanto
French
Galician language
surface form: Galician

German
Hindi
Indonesian
Irish language
surface form: Irish

Italian
Japanese
Kabyle
Kinyarwanda
Korean
Polish
Portuguese language
surface form: Portuguese

Russian
Scottish Gaelic
Spanish
Swahili language
surface form: Swahili

Swedish language
surface form: Swedish

Tamil
Tatar language
surface form: Tatar

Telugu
Thai
Turkish
Urdu language
surface form: Urdu

Vietnamese
Welsh
hostedAt voice.mozilla.org
inception 2017
license CC0
maintainer Mozilla Foundation
openSource true
purpose improving speech recognition for diverse accents
supporting low-resource languages
training automatic speech recognition systems
supportsMetadata recording locale
speaker accent
speaker age
speaker gender
targetUser developers
machine learning practitioners
researchers
validationMethod crowdsourced listening and voting

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Mozilla develops Common Voice