Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of obs/exp pileups is significantly lower than of obs pileups with nproc > 1 #536

Open
dmitrymyl opened this issue Aug 21, 2024 · 0 comments

Comments

@dmitrymyl
Copy link

dmitrymyl commented Aug 21, 2024

Hey! I've been running pileups, both observed and observed/expected, with cooltools.pileup. I noticed that obs/exp pileups are generated 20x slower than obs pileups (as for my data) when setting nproc=8: it took 40s to create a single obs/exp pileup and only 2s to create a single obs pileup. But when setting nproc=1, the speed is almost the same: both pileups took ~12s to run.

I explored the issue, and it seems like the problem is with the multiprocessing provider. cooltools uses multiprocess library internally, while I also tried multiprocessing and concurrent.futures from stdlib. The benchmarking results for obs/exp pileup creation are below:

Provider of the map functor nproc time
stdlib (map) 1 11.5s
concurrent.futures.ProcessPoolExecutor 1 15.8s
concurrent.futures.ProcessPoolExecutor 8 4.52s
multiprocess 8 21s
multiprocessing 8 3.31s

Both multiprocessing and concurrent.futures outperform multiprocess with nproc=8. Also note that the stdlib map in single-core mode is 2x faster than multiprocess.Pool.map with nproc=8. I suspect the reason is in how expected_df is serialized and/or passed to child processes in multiprocess.

multiprocessing uses pickle for serialization, while multiprocess uses dill. I checked that the serialization speed and byte length of the serialized expected_df are very similar between dill and pickle. So the problem might be with how multiprocess implements Pool or Queue.

Anyway, it seems like changing the multiprocessing framework would be beneficial for the performance of cooltools, especially when using CLI. Within API, users can provide a custom map_functor (but it is broken, see #535). I would be glad to start a discussion on this matter.

P.S. I don't know how reproducible the issue is, but it is in case of my data: 8 coolers, 2 different resolutions, various numbers of features and nproc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant