-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
to_dask()
not lazy when simplecache::
in urlpath
#73
Comments
Whilst this may be possible, it would be tricky. Dask wants to open the file to assess the chunking; it could be done on the original file, but only cache it when actually loading, in theory. There is a block-wise cacher in fsspec, which only downloads the parts of a file that are accessed, as they are accessed, but that only works with a library expecting to work with python file-like objects (i.e., there's a reason to call open_local: the library wants a real local file). You could do something with FUSE, where the file looks real to the OS, but uses block-wise chunking internally - this kind of thing I'm pretty sure has never been tried. |
when loading
to_dask
with caching as in pangeo-data/pangeo-datastore#113,fsspec.open_local
first loads the whole dataset and then opens the data inxarray
, still with chunks but after having spend the time on downloading.is there a way to circumvent this in
intake-xarray
or is this a consequence fromfsspec
caching that cannot be changed forintake-xarray
?it would be great to just do
to_dask()
without spending the time to download and only cache whenxarray
runscompute
.The text was updated successfully, but these errors were encountered: