Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Streaming DANDI:000541 takes a long time #1889

Open
3 tasks done
rly opened this issue Apr 12, 2024 · 0 comments
Open
3 tasks done

[Feature]: Streaming DANDI:000541 takes a long time #1889

rly opened this issue Apr 12, 2024 · 0 comments
Assignees
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Milestone

Comments

@rly
Copy link
Contributor

rly commented Apr 12, 2024

What would you like to see added to PyNWB?

From @dysprague: When looping through all files in dandiset 000541 and extracting the NeuroPAL images, it takes ~33 minutes. There are 21 files that are on the order of ~2 GB. This is a lot slower than the other dandisets that also have NeuroPAL images (e.g., 000714, 000692, and 000776). This problem exists for streaming with both PyNWB and MatNWB.

It is actually faster to download and open the file than stream it on my computer and connection.

I suspect it has to do with the fact that this dandiset has one set of 960 PlaneSegmentation tables for the "CalciumSeriesSegmentation" ImageSegmentation group, another set of 960 for the "CalciumSeriesSegmentationdNMF" ImageSegmentation group, and another set of 960 for the "NeuronIDs/ImageSegmentation" group. Each table represents the segmentation at a particular time point. That is a lot of groups.

Is your feature request related to a problem?

No response

What solution would you like?

Provide a recommendation for how to reorganize this data for more efficient streaming. I can do this but I need to look more closely into what is changing across tables / ImageSegmentation groups. It is possible that this can all be combined into a single (or two) PlaneSegmentation table with a column for time sample.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

@rly rly added category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) labels Apr 12, 2024
@rly rly self-assigned this Apr 12, 2024
@rly rly modified the milestones: Future, 2.8.0 Apr 12, 2024
@stephprince stephprince modified the milestones: 2.8.0, 2.9.0 Jul 23, 2024
@rly rly modified the milestones: 2.9.0, Next Major Release - 3.0 Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

2 participants