word2vec
E906310
distributional semantics model
natural language processing technique
neural network-based representation learning method
word embedding model
word2vec is a neural network-based technique for learning dense vector representations of words that capture semantic and syntactic relationships, widely used in natural language processing.
Observed surface forms (1)
| Surface form | Occurrences |
|---|---|
| word2vec algorithm | 1 |
Statements (48)
| Predicate | Object |
|---|---|
| instanceOf |
distributional semantics model
ⓘ
natural language processing technique ⓘ neural network-based representation learning method ⓘ word embedding model ⓘ |
| basedOn |
distributional hypothesis
ⓘ
neural networks ⓘ |
| captures |
semantic relationships between words
ⓘ
syntactic relationships between words ⓘ |
| category | unsupervised learning ⓘ |
| developedAt | Google NERFINISHED ⓘ |
| developedBy | Tomas Mikolov NERFINISHED ⓘ |
| domain |
computational linguistics
ⓘ
natural language processing ⓘ |
| embeddingDimension | typically 100–300 ⓘ |
| exampleProperty | king - man + woman ≈ queen ⓘ |
| hasArchitecture |
Continuous Bag-of-Words (CBOW)
NERFINISHED
ⓘ
Skip-gram NERFINISHED ⓘ |
| implementedIn |
Gensim
NERFINISHED
ⓘ
PyTorch NERFINISHED ⓘ TensorFlow NERFINISHED ⓘ |
| inputUnit | word tokens ⓘ |
| inspired |
GloVe
NERFINISHED
ⓘ
fastText NERFINISHED ⓘ many neural word embedding methods ⓘ |
| introducedInPaper | Efficient Estimation of Word Representations in Vector Space NERFINISHED ⓘ |
| introducedInYear | 2013 ⓘ |
| language |
C (original implementation)
ⓘ
Python (reference implementations) ⓘ |
| license | Apache-style open source (original code) ⓘ |
| optimizationTechnique |
hierarchical softmax
ⓘ
negative sampling ⓘ |
| output | word embeddings ⓘ |
| popularized | vector arithmetic on words ⓘ |
| representationType |
continuous vector space
ⓘ
dense vectors ⓘ |
| scalesTo | billions of tokens ⓘ |
| supports | large vocabularies ⓘ |
| task | learning dense vector representations of words ⓘ |
| trainingDataType | unlabeled text corpora ⓘ |
| trainingObjective |
predict context words from target word (Skip-gram)
ⓘ
predict target word from context (CBOW) ⓘ |
| usedFor |
feature extraction for NLP models
ⓘ
information retrieval ⓘ machine translation (as component) ⓘ semantic clustering ⓘ text classification ⓘ word analogy tasks ⓘ word similarity ⓘ |
Referenced by (2)
Full triples — surface form annotated when it differs from this entity's canonical label.
this entity surface form:
word2vec algorithm