lda.utils
¶
Module Contents¶
Functions¶
check_random_state (seed) |
|
matrix_to_lists (doc_word) |
Convert a (sparse) matrix of counts into arrays of word and doc indices |
lists_to_matrix (WS, DS) |
Convert array of word (or topic) and document indices to doc-term array |
dtm2ldac (dtm, offset=0) |
Convert a document-term matrix into an LDA-C formatted file |
ldac2dtm (stream, offset=0) |
Convert an LDA-C formatted file to a document-term array |
-
lda.utils.
PY2
¶
-
lda.utils.
zip
¶
-
lda.utils.
logger
¶
-
lda.utils.
check_random_state
(seed)¶
-
lda.utils.
matrix_to_lists
(doc_word)¶ Convert a (sparse) matrix of counts into arrays of word and doc indices
Parameters: - doc_word : array or sparse matrix (D, V)
document-term matrix of counts
Returns: - (WS, DS) : tuple of two arrays
WS[k] contains the kth word in the corpus DS[k] contains the document index for the kth word
-
lda.utils.
lists_to_matrix
(WS, DS)¶ Convert array of word (or topic) and document indices to doc-term array
Parameters: - (WS, DS) : tuple of two arrays
WS[k] contains the kth word in the corpus DS[k] contains the document index for the kth word
Returns: - doc_word : array (D, V)
document-term array of counts
-
lda.utils.
dtm2ldac
(dtm, offset=0)¶ Convert a document-term matrix into an LDA-C formatted file
Parameters: - dtm : array of shape N,V
Returns: - doclines : iterable of LDA-C lines suitable for writing to file
Notes
If a format similar to SVMLight is desired, offset of 1 may be used.
-
lda.utils.
ldac2dtm
(stream, offset=0)¶ Convert an LDA-C formatted file to a document-term array
Parameters: - stream: file object
File yielding unicode strings in LDA-C format.
Returns: - dtm : array of shape N,V
Notes
If a format similar to SVMLight is the source, an offset of 1 may be used.