WMT English-French dataset
E899019
The WMT English-French dataset is a large-scale parallel corpus of English–French sentence pairs widely used as a benchmark for training and evaluating machine translation systems.
All labels observed (1)
| Label | Occurrences |
|---|---|
| WMT English-French dataset canonical | 1 |
Statements (47)
| Predicate | Object |
|---|---|
| instanceOf |
English–French corpus
ⓘ
bilingual dataset ⓘ machine translation benchmark ⓘ parallel corpus ⓘ |
| accessMethod | download from WMT or associated sites ⓘ |
| alignmentType | parallel sentences ⓘ |
| associatedEvent | Conference on Machine Translation NERFINISHED ⓘ |
| benchmarkFor |
neural machine translation systems
ⓘ
statistical machine translation systems ⓘ |
| benchmarkLevel | state-of-the-art comparison ⓘ |
| benchmarkStatus | standard benchmark in MT research ⓘ |
| contains | sentence pairs ⓘ |
| curatedBy | Workshop on Machine Translation organizers ⓘ |
| dataFormat |
plain text
ⓘ
tokenized text ⓘ |
| dataType | text ⓘ |
| domain | general-domain text ⓘ |
| evaluationMetric |
BLEU
NERFINISHED
ⓘ
COMET NERFINISHED ⓘ chrF ⓘ |
| evaluationSetting | shared task evaluation campaigns ⓘ |
| field |
machine translation
ⓘ
natural language processing ⓘ |
| granularity | sentence-level alignment ⓘ |
| includes |
development set
ⓘ
test set ⓘ training set ⓘ |
| languagePair | English–French NERFINISHED ⓘ |
| license | research use (varies by component corpus) ⓘ |
| modality | written language ⓘ |
| origin | crawled and curated web and text sources ⓘ |
| scale | large-scale ⓘ |
| sourceLanguage | English ⓘ |
| targetLanguage | French ⓘ |
| task | sentence-level translation ⓘ |
| timeSpan | updated annually in WMT campaigns ⓘ |
| typicalModel | sequence-to-sequence models ⓘ |
| typicalPreprocessing |
subword segmentation (e.g. BPE)
GENERATED
ⓘ
tokenization GENERATED ⓘ truecasing GENERATED ⓘ |
| typicalUse | supervised learning ⓘ |
| usedBy |
academic researchers
ⓘ
industry MT practitioners ⓘ |
| usedFor |
benchmarking translation quality
ⓘ
machine translation evaluation ⓘ machine translation training ⓘ |
| usedIn | WMT shared tasks ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.