Paragraph Vector

E899023

Paragraph Vector is an unsupervised learning algorithm that generates fixed-length vector representations for variable-length texts such as sentences, paragraphs, and documents, enabling them to be used effectively in machine learning tasks.

Try in SPARQL Jump to: Surface forms Statements Referenced by

Observed surface forms (1)

Surface form Occurrences
Paragraph Vector Distributed Bag of Words 1

Statements (48)

Predicate Object
instanceOf document embedding method
neural network model
unsupervised learning algorithm
alsoKnownAs Doc2Vec NERFINISHED
basedOn Word2Vec NERFINISHED
captures semantic similarity between texts
syntactic regularities to some extent
citationCountCategory highly cited in NLP literature
developedBy Quoc V. Le NERFINISHED
Tomas Mikolov NERFINISHED
embeddingSpace continuous vector space
extends distributed word representations
field machine learning
natural language processing
fullNameOfVariant PV-DBOW: Distributed Bag of Words version of Paragraph Vector NERFINISHED
PV-DM: Distributed Memory model of Paragraph Vectors NERFINISHED
hasLimitation less effective than modern transformer-based embeddings on many tasks
training can be computationally expensive on large corpora
hasVariant PV-DBOW
PV-DM NERFINISHED
implementedIn DL4J NERFINISHED
Gensim NERFINISHED
TensorFlow (custom implementations)
influenced subsequent research on document embeddings
inputType variable-length text
inspiredBy neural language models
introducedInPaper Distributed Representations of Sentences and Documents NERFINISHED
languageAgnostic true
learningParadigm unsupervised learning
optimizationMethod stochastic gradient descent
predecessorOf more advanced document embedding methods
publicationYear 2014
publishedAtConference ICML 2014 NERFINISHED
representationType fixed-length vector
represents documents as dense vectors
paragraphs as dense vectors
sentences as dense vectors
supportsTask document clustering
information retrieval
recommendation
semantic similarity
sentiment analysis
text classification
trainingDataRequirement large unlabeled text corpora
uses distributed representations
usesTrainingObjective predicting context words
predicting target words
predicting words from paragraph id

Referenced by (2)

Full triples — surface form annotated when it differs from this entity's canonical label.

Distributed Representations of Sentences and Documents proposesMethod Paragraph Vector
this entity surface form: Paragraph Vector Distributed Bag of Words