parallel processing - How to parallelize topicmodels R package -

April 15, 2011

I have a document (~ 50,000), that I have converted to a corpus and unfortunately using LDA objects The module package in the subject, to test more than 150 subjects, it takes several hours.

So far, I have found that I can test many different cluster sizes together:

  Library (theme) Library (PERIR) Library (Forch ) Library (DOMC) Register DOMC (5) # 5 Core DTM # My DoculatorMatrix Seek & Lt; - Use the Seek (200500, 50 = 50) model & lt; - llply (cec, function (d) {lda (dtm, d)}, .arrell = t)

There is no way to parallel the LDA function so that it can run faster (Instead of running several LDAs at one time)?

I am not familiar with the LDA function but say I have divided the fund into 16 pieces , And put each piece in a list named corpus16list .

In order to run it in parallel y, something like the following must be done:

  library (doParallel) cl & lt; - registerDoParallel (cl) for makeCluster (16) # 16 processor # Now start chain nchains & lt; - 16 my_k & lt; - A Vector result with list # 6 or 16 elements & lt; - foreach (i = 1: nchains, .packages = c% dopar% {Results & lt; - LDA (corpus16list [[i]], K = my_k, control = my_control)}, .progress = " Text ")) Returns (Results)}

The result is result_list , which is a list, you can connect with 16 outputs out of 16 chains that fit you. Or use a .combine function in the forehead (which is beyond the scope of this question).

To send different values of i to control , k or whatever you need to do.

This code should work on Windows and Linux, and with this how ever many cores you need.

Search This Blog

Raj T

parallel processing - How to parallelize topicmodels R package -

Comments

Post a Comment

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -