class lda.LDA(n_topics, n_iter=2000, alpha=0.1, eta=0.01, random_state=None, refresh=10)

Latent Dirichlet allocation using collapsed Gibbs sampling

n_topics : int

Number of topics

n_iter : int, default 2000

Number of sampling iterations

alpha : float, default 0.1

Dirichlet parameter for distribution over topics

eta : float, default 0.01

Dirichlet parameter for distribution over words

random_state : int or RandomState, optional

The generator used for the initial topics.


>>> import numpy
>>> X = numpy.array([[1,1], [2, 1], [3, 1], [4, 1], [5, 8], [6, 1]])
>>> import lda
>>> model = lda.LDA(n_topics=2, random_state=0, n_iter=100)
>>> model.fit(X) #doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
>>> model.components_
array([[ 0.85714286,  0.14285714],
       [ 0.45      ,  0.55      ]])
>>> model.loglikelihood() #doctest: +ELLIPSIS
`components_` : array, shape = [n_topics, n_features]

Point estimate of the topic-word distributions (Phi in literature)

`topic_word_` :

Alias for components_

`nzw_` : array, shape = [n_topics, n_features]

Matrix of counts recording topic-word assignments in final iteration.

`ndz_` : array, shape = [n_samples, n_topics]

Matrix of counts recording document-topic assignments in final iteration.

`doc_topic_` : array, shape = [n_samples, n_features]

Point estimate of the document-topic distributions (Theta in literature)

`nz_` : array, shape = [n_topics]

Array of topic assignment counts in final iteration.

fit(self, X, y=None)

Fit the model with X.

X: array-like, shape (n_samples, n_features)

Training data, where n_samples in the number of samples and n_features is the number of features. Sparse matrix allowed.

self : object

Returns the instance itself.

fit_transform(self, X, y=None)

Apply dimensionality reduction on X

X : array-like, shape (n_samples, n_features)

New data, where n_samples in the number of samples and n_features is the number of features. Sparse matrix allowed.

doc_topic : array-like, shape (n_samples, n_topics)

Point estimate of the document-topic distributions

transform(self, X, max_iter=20, tol=1e-16)

Transform the data X according to previously fitted model

X : array-like, shape (n_samples, n_features)

New data, where n_samples in the number of samples and n_features is the number of features.

max_iter : int, optional

Maximum number of iterations in iterated-pseudocount estimation.

tol: double, optional

Tolerance value used in stopping condition.

doc_topic : array-like, shape (n_samples, n_topics)

Point estimate of the document-topic distributions

_transform_single(self, doc, max_iter, tol)

Transform a single document according to the previously fit model

X : 1D numpy array of integers

Each element represents a word in the document

max_iter : int

Maximum number of iterations in iterated-pseudocount estimation.

tol: double

Tolerance value used in stopping condition.

doc_topic : 1D numpy array of length n_topics

Point estimate of the topic distributions for document

_fit(self, X)

Fit the model to the data X

X: array-like, shape (n_samples, n_features)

Training vector, where n_samples in the number of samples and n_features is the number of features. Sparse matrix allowed.

_initialize(self, X)

Calculate complete log likelihood, log p(w,z)

Formula used is log p(w,z) = log p(w|z) + log p(z)

_sample_topics(self, rands)

Samples all topic assignments. Called once per iteration.
