lda.utils

Module Contents

Functions

check_random_state(seed)
matrix_to_lists(doc_word) Convert a (sparse) matrix of counts into arrays of word and doc indices
lists_to_matrix(WS, DS) Convert array of word (or topic) and document indices to doc-term array
dtm2ldac(dtm, offset=0) Convert a document-term matrix into an LDA-C formatted file
ldac2dtm(stream, offset=0) Convert an LDA-C formatted file to a document-term array
lda.utils.PY2
lda.utils.zip
lda.utils.logger
lda.utils.check_random_state(seed)
lda.utils.matrix_to_lists(doc_word)

Convert a (sparse) matrix of counts into arrays of word and doc indices

Parameters:
doc_word : array or sparse matrix (D, V)

document-term matrix of counts

Returns:
(WS, DS) : tuple of two arrays

WS[k] contains the kth word in the corpus DS[k] contains the document index for the kth word

lda.utils.lists_to_matrix(WS, DS)

Convert array of word (or topic) and document indices to doc-term array

Parameters:
(WS, DS) : tuple of two arrays

WS[k] contains the kth word in the corpus DS[k] contains the document index for the kth word

Returns:
doc_word : array (D, V)

document-term array of counts

lda.utils.dtm2ldac(dtm, offset=0)

Convert a document-term matrix into an LDA-C formatted file

Parameters:
dtm : array of shape N,V
Returns:
doclines : iterable of LDA-C lines suitable for writing to file

Notes

If a format similar to SVMLight is desired, offset of 1 may be used.

lda.utils.ldac2dtm(stream, offset=0)

Convert an LDA-C formatted file to a document-term array

Parameters:
stream: file object

File yielding unicode strings in LDA-C format.

Returns:
dtm : array of shape N,V

Notes

If a format similar to SVMLight is the source, an offset of 1 may be used.