Latent Dirichlet Allocation

E898981

Latent Dirichlet Allocation is a generative probabilistic model commonly used in natural language processing to discover latent topics within large collections of documents.

Jump to: Statements Referenced by

Statements (59)

Predicate Object
instanceOf Bayesian model
bag-of-words model
generative probabilistic model
topic model
unsupervised learning method
appliedIn bioinformatics text analysis
digital humanities
news article analysis
scientific literature analysis
social media analysis
assumes bag-of-words representation of documents
documents are mixtures of topics
topics are distributions over words
basedOn Dirichlet distribution NERFINISHED
multinomial distribution
differsFrom probabilistic latent semantic analysis by using Dirichlet priors
evaluationMetric perplexity
topic coherence
extends probabilistic latent semantic analysis NERFINISHED
field machine learning
natural language processing
statistics
hasAbbreviation LDA NERFINISHED
hasComponent topic distribution per document
word distribution per topic
hasHyperparameter alpha
beta
hyperparameterAlphaControls document-topic sparsity
hyperparameterBetaControls topic-word sparsity
implementedIn Gensim NERFINISHED
MALLET NERFINISHED
Stan NERFINISHED
scikit-learn NERFINISHED
inferenceMethod collapsed Gibbs sampling
expectation-maximization
online variational Bayes
variational inference
input corpus of documents
introducedBy Andrew Y. Ng NERFINISHED
David M. Blei NERFINISHED
Michael I. Jordan NERFINISHED
introducedInPaper Latent Dirichlet Allocation NERFINISHED
output set of topics
topic proportions for each document
word distribution for each topic
publicationYear 2003
publishedIn Journal of Machine Learning Research NERFINISHED
relatedTo latent semantic analysis NERFINISHED
probabilistic latent semantic analysis NERFINISHED
requires predefined number of topics
usedFor content-based recommendation
dimensionality reduction
document classification preprocessing
document clustering
feature extraction
information retrieval
recommender systems
text mining
topic discovery

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Gibbs sampling usedIn Latent Dirichlet Allocation