-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelization #4
Comments
Thanks for your interest @ruiye88. Have you tried the current implementation on your data set? Is it too slow? Could you tell us a little bit more about the size of your data set? |
Hi Peter, thanks for the quick response. I tried a test run on a subset of my dataset (~1500 cells, maxiter1 = 100,maxiter2 = 50, maxiter3 = 50) and it takes about 20-30 minutes. My full dataset has ~50K cells. Do you have an estimate of how long it might take to run the full dataset? Also, I'm assuming most users may want to run multiple Kmax values to compare the results. Therefore, it would be really helpful if the parallel computation could be implemented. |
@ruiye88 In the paper, we ran on a dataset containing ~35,000 cells, which is quite comparable to your dataset. Does your counts matrix have a high proportion of zeros and, of so, is it encoded as a sparse matrix? My understanding is that if your Y matrix has many rows and is sparse, gbcd will run faster. In particular, it runs the more efficient method that does not compute the (dense) N x N covariance matrix if this condition is satisfied: 2 * ncol(Y) * mean(Y > 0) < nrow(Y) For us, the more efficient implementation ran on the dataset with ~35,000 cells in about 20 h. You could potentially also run on multiple Kmax values in parallel (e.g., using mclapply), although it may use a lot of memory. There is some support for parallel computations in the current implementation if you have installed R with a version of the BLAS library that supports multithreading, such as OpenBLAS or Intel MKL, that that should speed things up a bit, although it is more important to make sure your data are encoded properly as a sparse matrix. Hope this helps. |
Hi,
Thank you for developing this wonderful tool. Just curious are you planning to add parallel computing options for the function?
Rui
The text was updated successfully, but these errors were encountered: