Paragraph Vector
E899023
Paragraph Vector is an unsupervised learning algorithm that generates fixed-length vector representations for variable-length texts such as sentences, paragraphs, and documents, enabling them to be used effectively in machine learning tasks.
Observed surface forms (1)
| Surface form | Occurrences |
|---|---|
| Paragraph Vector Distributed Bag of Words | 1 |
Statements (48)
| Predicate | Object |
|---|---|
| instanceOf |
document embedding method
ⓘ
neural network model ⓘ unsupervised learning algorithm ⓘ |
| alsoKnownAs | Doc2Vec NERFINISHED ⓘ |
| basedOn | Word2Vec NERFINISHED ⓘ |
| captures |
semantic similarity between texts
ⓘ
syntactic regularities to some extent ⓘ |
| citationCountCategory | highly cited in NLP literature ⓘ |
| developedBy |
Quoc V. Le
NERFINISHED
ⓘ
Tomas Mikolov NERFINISHED ⓘ |
| embeddingSpace | continuous vector space ⓘ |
| extends | distributed word representations ⓘ |
| field |
machine learning
ⓘ
natural language processing ⓘ |
| fullNameOfVariant |
PV-DBOW: Distributed Bag of Words version of Paragraph Vector
NERFINISHED
ⓘ
PV-DM: Distributed Memory model of Paragraph Vectors NERFINISHED ⓘ |
| hasLimitation |
less effective than modern transformer-based embeddings on many tasks
ⓘ
training can be computationally expensive on large corpora ⓘ |
| hasVariant |
PV-DBOW
ⓘ
PV-DM NERFINISHED ⓘ |
| implementedIn |
DL4J
NERFINISHED
ⓘ
Gensim NERFINISHED ⓘ TensorFlow (custom implementations) ⓘ |
| influenced | subsequent research on document embeddings ⓘ |
| inputType | variable-length text ⓘ |
| inspiredBy | neural language models ⓘ |
| introducedInPaper | Distributed Representations of Sentences and Documents NERFINISHED ⓘ |
| languageAgnostic | true ⓘ |
| learningParadigm | unsupervised learning ⓘ |
| optimizationMethod | stochastic gradient descent ⓘ |
| predecessorOf | more advanced document embedding methods ⓘ |
| publicationYear | 2014 ⓘ |
| publishedAtConference | ICML 2014 NERFINISHED ⓘ |
| representationType | fixed-length vector ⓘ |
| represents |
documents as dense vectors
ⓘ
paragraphs as dense vectors ⓘ sentences as dense vectors ⓘ |
| supportsTask |
document clustering
ⓘ
information retrieval ⓘ recommendation ⓘ semantic similarity ⓘ sentiment analysis ⓘ text classification ⓘ |
| trainingDataRequirement | large unlabeled text corpora ⓘ |
| uses | distributed representations ⓘ |
| usesTrainingObjective |
predicting context words
ⓘ
predicting target words ⓘ predicting words from paragraph id ⓘ |
Referenced by (2)
Full triples — surface form annotated when it differs from this entity's canonical label.
this entity surface form:
Paragraph Vector Distributed Bag of Words