lda.utils
Attributes
Functions
|
|
|
Convert a (sparse) matrix of counts into arrays of word and doc indices |
|
Convert array of word (or topic) and document indices to doc-term array |
|
Convert a document-term matrix into an LDA-C formatted file |
|
Convert an LDA-C formatted file to a document-term array |
Module Contents
- lda.utils.PY2
- lda.utils.zip
- lda.utils.logger
- lda.utils.check_random_state(seed)
- lda.utils.matrix_to_lists(doc_word)
Convert a (sparse) matrix of counts into arrays of word and doc indices
- Parameters:
- doc_wordarray or sparse matrix (D, V)
document-term matrix of counts
- Returns:
- (WS, DS)tuple of two arrays
WS[k] contains the kth word in the corpus DS[k] contains the document index for the kth word
- lda.utils.lists_to_matrix(WS, DS)
Convert array of word (or topic) and document indices to doc-term array
- Parameters:
- (WS, DS)tuple of two arrays
WS[k] contains the kth word in the corpus DS[k] contains the document index for the kth word
- Returns:
- doc_wordarray (D, V)
document-term array of counts
- lda.utils.dtm2ldac(dtm, offset=0)
Convert a document-term matrix into an LDA-C formatted file
- Parameters:
- dtmarray of shape N,V
- Returns:
- doclinesiterable of LDA-C lines suitable for writing to file
Notes
If a format similar to SVMLight is desired, offset of 1 may be used.
- lda.utils.ldac2dtm(stream, offset=0)
Convert an LDA-C formatted file to a document-term array
- Parameters:
- stream: file object
File yielding unicode strings in LDA-C format.
- Returns:
- dtmarray of shape N,V
Notes
If a format similar to SVMLight is the source, an offset of 1 may be used.