You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, yt employs a global index to locate data. For many datasets (incl. RAMSES), data are ever spread within files with a mapping that relies on some space-filling process (e.g. the Hilbert curve or the Z-curve).
This mapping is typically lightweight and could be used as an intermediate coarse-grained indexing (similar to what has been done with SPH, as far as I could understand).
Having a lightweight coarse-grained indexing (with no false-negative!) would allow it to be copied on however many MPI tasks yt is running on. As a second step, the current indexing (from file to position in file) would allow finer refinement (do we need to read any data? where is it located on disk?). This would allow distributing the list of files to be read to the different tasks deterministically based on the intersection with the coarse index, and each task would subsequently read the files that actually intersect.
Some of the code required for this is already in place for the RAMSES dataset (yt-project/yt#4734 for having a two-level index, yt-project/yt#4730 for how one could use this to parallelize I/O).
The text was updated successfully, but these errors were encountered:
At the moment, yt employs a global index to locate data. For many datasets (incl. RAMSES), data are ever spread within files with a mapping that relies on some space-filling process (e.g. the Hilbert curve or the Z-curve).
This mapping is typically lightweight and could be used as an intermediate coarse-grained indexing (similar to what has been done with SPH, as far as I could understand).
Having a lightweight coarse-grained indexing (with no false-negative!) would allow it to be copied on however many MPI tasks yt is running on. As a second step, the current indexing (from file to position in file) would allow finer refinement (do we need to read any data? where is it located on disk?). This would allow distributing the list of files to be read to the different tasks deterministically based on the intersection with the coarse index, and each task would subsequently read the files that actually intersect.
Some of the code required for this is already in place for the RAMSES dataset (yt-project/yt#4734 for having a two-level index, yt-project/yt#4730 for how one could use this to parallelize I/O).
The text was updated successfully, but these errors were encountered: