Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python bindings: add a osgeo.gdal_fsspec module that on import will register GDAL VSI file system handlers as fsspec AbstractFileSystem #10985

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rouault
Copy link
Member

@rouault rouault commented Oct 10, 2024

This enables using GDAL virtual file systems with other libraries of the Python ecosystem that accept fsspec paths

"""Module exposing GDAL Virtual File Systems (VSI) as a "gdalvsi" fsspec implementation.

Importing "osgeo.gdal_fsspec" requires the Python "fsspec"
(https://filesystem-spec.readthedocs.io/en/latest/) module to be available.

A generic "gdalvsi" fsspec protocol is available. All GDAL VSI file names must be
simply prefixed with "gdalvsi://". For example:

  • "gdalvsi://data/byte.tif" to access relative file "data/byte.tif"
  • "gdalvsi:///home/user/byte.tif" to access absolute file "/home/user/byte.tif"
  • "gdalvsi:///vsimem/byte.tif" (note the 3 slashes) to access VSIMem file "/vsimem/byte.tif"
  • "gdalvsi:///vsicurl/https://example.com/byte.tif (note the 3 slashes) to access "https://example.com/byte.tif" through /vsicurl/
    """

@rouault rouault added this to the 3.11.0 milestone Oct 10, 2024
@rouault rouault changed the title Python bindings: add a osgeo.gdal_fsspec that on import will register GDAL VSI file system handlers as fsspec AbstractFileSystem Python bindings: add a osgeo.gdal_fsspec module that on import will register GDAL VSI file system handlers as fsspec AbstractFileSystem Oct 10, 2024
@rouault rouault force-pushed the gdal_fsspec branch 6 times, most recently from 1f35d07 to 6f80f00 Compare October 11, 2024 12:44
@sgillies
Copy link
Contributor

@rouault can we do without so many new forms of identifiers? As a community, I think we want fewer ways to reference the same thing, not more ways to reference the same thing.

Like, I imagine the fsspec usage would be:

fs = fsspec.filesystem("gdalvsi")
fs.open("/vsicurl/https://example.com/foo.tif")

The existing /vsi*/ files could remain the one way to reference datasets for GDAL, if we want.

Is "vsizip://my.zip/foo.tif" only for fsspec or will it work with GDALOpen too?

@rouault
Copy link
Member Author

rouault commented Oct 11, 2024

can we do without so many new forms of identifiers?

I also had the same hesitation, and I would be OK with just having a single "gdalvsi" fsspec protocol taking GDAL paths as you suggest

Is "vsizip://my.zip/foo.tif" only for fsspec or will it work with GDALOpen too?

No that would be only for fsspec, if we keep the "sub-classed" vsiXXX fsspec protocol

rouault added a commit to rouault/gdal that referenced this pull request Oct 12, 2024
…file system

In c889f14 (3.10.0dev), we have
introduced a (C++) bridge from GDAL VSI to Arrow file system that used
the 'vsi://' prefix. But as we are also going to introduce at the Python
level as bridge from GDAL VSI to fsspec (OSGeo#10985)
that is going to use a 'gdalvsi://' prefix, better use the same prefix.
@rouault
Copy link
Member Author

rouault commented Oct 12, 2024

Pull request simplified to register a single "gdalvsi" fsspec protocol as suggested by @sgillies

@coveralls
Copy link
Collaborator

Coverage Status

coverage: 69.435% (-0.01%) from 69.449%
when pulling 0b07ea7 on rouault:gdal_fsspec
into 8d9a397 on OSGeo:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants