lda.utils

Attributes

PY2

zip

logger

Functions

check_random_state(seed)

matrix_to_lists(doc_word)

Convert a (sparse) matrix of counts into arrays of word and doc indices

lists_to_matrix(WS, DS)

Convert array of word (or topic) and document indices to doc-term array

dtm2ldac(dtm[, offset])

Convert a document-term matrix into an LDA-C formatted file

ldac2dtm(stream[, offset])

Convert an LDA-C formatted file to a document-term array

Module Contents

lda.utils.PY2
lda.utils.zip
lda.utils.logger
lda.utils.check_random_state(seed)
lda.utils.matrix_to_lists(doc_word)

Convert a (sparse) matrix of counts into arrays of word and doc indices

Parameters:
doc_wordarray or sparse matrix (D, V)

document-term matrix of counts

Returns:
(WS, DS)tuple of two arrays

WS[k] contains the kth word in the corpus DS[k] contains the document index for the kth word

lda.utils.lists_to_matrix(WS, DS)

Convert array of word (or topic) and document indices to doc-term array

Parameters:
(WS, DS)tuple of two arrays

WS[k] contains the kth word in the corpus DS[k] contains the document index for the kth word

Returns:
doc_wordarray (D, V)

document-term array of counts

lda.utils.dtm2ldac(dtm, offset=0)

Convert a document-term matrix into an LDA-C formatted file

Parameters:
dtmarray of shape N,V
Returns:
doclinesiterable of LDA-C lines suitable for writing to file

Notes

If a format similar to SVMLight is desired, offset of 1 may be used.

lda.utils.ldac2dtm(stream, offset=0)

Convert an LDA-C formatted file to a document-term array

Parameters:
stream: file object

File yielding unicode strings in LDA-C format.

Returns:
dtmarray of shape N,V

Notes

If a format similar to SVMLight is the source, an offset of 1 may be used.