How to run SEACells efficiently on large-scale dataset #70

koh2ng0 · 2024-07-22T06:32:45Z

Hi,

First of all, thank you for developing the excellent package. I have tried to run the SEACells on our large-scale datasets (~270K cells). While it performed well, it was too slow, taking almost 3 days and 3 hours for model training over 50 iterations.

I tried two approaches: using GPU and CPU.

with GPU
I attempted to run SEACells with GPU using the following commands:

model = SEACells.core.SEACells(adata, 
                                                        build_kernel_on=build_kernel_on, 
                                                        n_SEACells=n_SEACells, 
                                                        n_waypoint_eigs=n_waypoint_eigs,
                                                        convergence_epsilon = 1e-5,
                                                        use_gpu=True)

However, I encountered the following error:

"OutOfMemoryError: Out of memory allocating 6,121,777,152 bytes (allocated so far: 32,323,490,304 bytes)."
We have 3 GPUs, each with 32768MiB memory. I believed this would be sufficient, so I'm not sure why this error occurred.

Could you guide how to resolve this issue? Additionally, is it possible to utilize more than one GPU for this process?

with CPU
While it works, it excessively takes too much time.

model = SEACells.core.SEACells(adata,
                                                        build_kernel_on = 'X_scVI',
                                                        n_SEACells = n_SEACells,
                                                        n_waypoint_eigs = n_waypoint_eigs,
                                                        convergence_epsilon = 1e-5,
                                                        use_sparse = True)

Could you recommend solutions to improve the time and memory efficiency for running SEACells on large-scale datasets?

Thank you for your assistance.

The text was updated successfully, but these errors were encountered:

kjtreese · 2024-08-30T15:33:30Z

I'm hoping someone has some input on this because I'm running into the same issue with a dataset of 240K cells. We're splitting it into smaller chunks but it still takes up SO much memory and time. We're wanting metacells of smaller sizes to match (at least as closely as possible) to the ones we already have done manually, so I'm setting the number of SEACells to 1000+ but it's just so slow.

li-xuyang28 · 2024-09-29T20:31:20Z

I saw that at some point sparse matrix with GPU was planned/proposed, but never implemented. I was wondering if there is any current plan for that to happen? Would certainly love to see this tool being scalable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run SEACells efficiently on large-scale dataset #70

How to run SEACells efficiently on large-scale dataset #70

koh2ng0 commented Jul 22, 2024 •

edited

Loading

kjtreese commented Aug 30, 2024

li-xuyang28 commented Sep 29, 2024

How to run SEACells efficiently on large-scale dataset #70

How to run SEACells efficiently on large-scale dataset #70

Comments

koh2ng0 commented Jul 22, 2024 • edited Loading

kjtreese commented Aug 30, 2024

li-xuyang28 commented Sep 29, 2024

koh2ng0 commented Jul 22, 2024 •

edited

Loading