parallel processing - How to parallelize topicmodels R package -
I have a document (~ 50,000), that I have converted to a corpus and unfortunately using LDA objects The module package in the subject, to test more than 150 subjects, it takes several hours.
So far, I have found that I can test many different cluster sizes together:
Library (theme) Library (PERIR) Library (Forch ) Library (DOMC) Register DOMC (5) # 5 Core DTM # My DoculatorMatrix Seek & Lt; - Use the Seek (200500, 50 = 50) model & lt; - llply (cec, function (d) {lda (dtm, d)}, .arrell = t) There is no way to parallel the LDA function so that it can run faster (Instead of running several LDAs at one time)?
I am not familiar with the LDA function but say I have divided the fund into 16 pieces , And put each piece in a list named corpus16list .
In order to run it in parallel y, something like the following must be done:
library (doParallel) cl & lt; - registerDoParallel (cl) for makeCluster (16) # 16 processor # Now start chain nchains & lt; - 16 my_k & lt; - A Vector result with list # 6 or 16 elements & lt; - foreach (i = 1: nchains, .packages = c% dopar% {Results & lt; - LDA (corpus16list [[i]], K = my_k, control = my_control)}, .progress = " Text ")) Returns (Results)} The result is result_list , which is a list, you can connect with 16 outputs out of 16 chains that fit you. Or use a .combine function in the forehead (which is beyond the scope of this question).
To send different values of i to control , k or whatever you need to do.
This code should work on Windows and Linux, and with this how ever many cores you need.
Comments
Post a Comment