Paragraph Vector

E899023

document embedding method neural network model unsupervised learning algorithm

Paragraph Vector is an unsupervised learning algorithm that generates fixed-length vector representations for variable-length texts such as sentences, paragraphs, and documents, enabling them to be used effectively in machine learning tasks.

Try in SPARQL Jump to: Surface forms Statements Referenced by

Observed surface forms (1)

Surface form	Occurrences
Paragraph Vector Distributed Bag of Words	1

Statements (48)

Predicate	Object
instanceOf	document embedding method ⓘ neural network model ⓘ unsupervised learning algorithm ⓘ
alsoKnownAs	Doc2Vec NERFINISHED ⓘ
basedOn	Word2Vec NERFINISHED ⓘ
captures	semantic similarity between texts ⓘ syntactic regularities to some extent ⓘ
citationCountCategory	highly cited in NLP literature ⓘ
developedBy	Quoc V. Le NERFINISHED ⓘ Tomas Mikolov NERFINISHED ⓘ
embeddingSpace	continuous vector space ⓘ
extends	distributed word representations ⓘ
field	machine learning ⓘ natural language processing ⓘ
fullNameOfVariant	PV-DBOW: Distributed Bag of Words version of Paragraph Vector NERFINISHED ⓘ PV-DM: Distributed Memory model of Paragraph Vectors NERFINISHED ⓘ
hasLimitation	less effective than modern transformer-based embeddings on many tasks ⓘ training can be computationally expensive on large corpora ⓘ
hasVariant	PV-DBOW ⓘ PV-DM NERFINISHED ⓘ
implementedIn	DL4J NERFINISHED ⓘ Gensim NERFINISHED ⓘ TensorFlow (custom implementations) ⓘ
influenced	subsequent research on document embeddings ⓘ
inputType	variable-length text ⓘ
inspiredBy	neural language models ⓘ
introducedInPaper	Distributed Representations of Sentences and Documents NERFINISHED ⓘ
languageAgnostic	true ⓘ
learningParadigm	unsupervised learning ⓘ
optimizationMethod	stochastic gradient descent ⓘ
predecessorOf	more advanced document embedding methods ⓘ
publicationYear	2014 ⓘ
publishedAtConference	ICML 2014 NERFINISHED ⓘ
representationType	fixed-length vector ⓘ
represents	documents as dense vectors ⓘ paragraphs as dense vectors ⓘ sentences as dense vectors ⓘ
supportsTask	document clustering ⓘ information retrieval ⓘ recommendation ⓘ semantic similarity ⓘ sentiment analysis ⓘ text classification ⓘ
trainingDataRequirement	large unlabeled text corpora ⓘ
uses	distributed representations ⓘ
usesTrainingObjective	predicting context words ⓘ predicting target words ⓘ predicting words from paragraph id ⓘ

Referenced by (2)

Full triples — surface form annotated when it differs from this entity's canonical label.

Distributed Representations of Sentences and Documents → introduces → Paragraph Vector ⓘ

Distributed Representations of Sentences and Documents → proposesMethod → Paragraph Vector ⓘ

this entity surface form: Paragraph Vector Distributed Bag of Words