Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename bands as variables using long_name attribute #736

Open
lopezvoliver opened this issue Jan 24, 2024 · 9 comments
Open

Rename bands as variables using long_name attribute #736

lopezvoliver opened this issue Jan 24, 2024 · 9 comments
Labels
proposal Idea for a new feature.

Comments

@lopezvoliver
Copy link

The typical Geotiff images that I use have a description for each band that rioxarray sets as the long_name attribute.

By default, images are read as xarray.DataArray with a band, y, and x dimensions, and the long_name attribute is a tuple containing the description for each band.

Using band_as_variable=True (available since version 0.13) gives the user the option to instead get a xarray.Dataset with dimensions y and x, and N data variables (one for each band). The names of the variables are simply band_1, band_2, ..., band_N, which makes sense. You can also find the long_name attribute within each data variable.

I think it would be useful to have another optional keyword argument that renames the data variables to the long_name (description). Currently, this can be done as shown below:

import rioxarray

image = rioxarray.open_rasterio("myMultiBandImage.tif", band_as_variable=True)
image = image.rename({band:image[band].attrs["long_name"] for band in image})

Having an option to do this directly with rioxarray.open_rasterio would be useful:

image = rioxarray.open_rasterio("myMultiBandImage.tif", band_as_variable=True, rename_bands=True)

Of course, for backwards compatibility it should be set to False by default and it only makes sense when used with band_as_variable=True.

@lopezvoliver lopezvoliver added the proposal Idea for a new feature. label Jan 24, 2024
@snowman2
Copy link
Member

snowman2 commented Jan 24, 2024

#600 (comment)

The variable name should only contain alphanumeric characters and underscores. The band description could potentially be a sentence with any characters. This ensures consistency and stability.

@snowman2
Copy link
Member

I don't think this is too terrible for users to do if they have a safe long_name:

image = image.rename({band:image[band].attrs["long_name"] for band in image})

@RichardScottOZ
Copy link
Contributor

Maybe we could just put this in the docs as a tip/example?

@RichardScottOZ
Copy link
Contributor

@snowman2 - I could probably mine stuff I have done for generic examples if a general writing rasters [or reading] notebook is useful.

e.g. this is what it looks like as an ERS grid, this with LZW compression, or whatever else.

@RichardScottOZ
Copy link
Contributor

e.g. maybe extending something like this: https://corteva.github.io/rioxarray/html/examples/convert_to_raster.html

for things that I would have liked to see when I first came across that sort of info many moons ago

@snowman2
Copy link
Member

Those documentation contributions would be great 👍

@RichardScottOZ
Copy link
Contributor

Ok, will see what I can do shortly!

@RichardScottOZ
Copy link
Contributor

Some things like this? #753

@NiklasPhabian
Copy link

NiklasPhabian commented Jul 11, 2024

I don't think this is too terrible for users to do if they have a safe long_name:

image = image.rename({band:image[band].attrs["long_name"] for band in image})

While that is totally true, I think it is still a bit confusing that one doesn't end up with the same dataset after a roundtrip (xarray.Dataset->geotif->xarray.Dataset). When writing a dataset to a geotiff with rioxarray, the data variable names are written out as the band descriptions. So when reading the geotiff back in, it would be consequent to use the descriptions as data variable names.

In that sense, we also could give the behavior of open_rasterio() with band_as_variable=False a second thought. In this case, you get back a DataArray with a long_name attribute that contains a tuple with the band descriptions. The tuple, of course, has the same length as the band dimension and could/should be considered the 3rd dimension's coordinates.

I don't want to derail this too much, but am curious about your thoughts here: Since geotiffs don't really have the notion of multiple datasets (but just 'bands' / 'channel' / a 3rd dimension), I am wondering if writing a dataset to a geotiff even makes conceptual sense. Possibly more consequent would be to allow writing only 2D or 3D DataArrays. In the latter case, the 3rd dimension's coordinates could/should be used as the band descriptions. Right now, when writing a 3D DataArray to a geotiff, the 3rd dimension's coordinates are discarded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Idea for a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants