Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to band_tags #799

Open
neda-dtu opened this issue Aug 1, 2024 · 4 comments
Open

Improvements to band_tags #799

neda-dtu opened this issue Aug 1, 2024 · 4 comments
Labels
proposal Idea for a new feature.

Comments

@neda-dtu
Copy link

neda-dtu commented Aug 1, 2024

I was working on adding GDAL Metadata to a GeoTiff file, as it is needed for getting the minimum and maximum values when using Geotiff.js. I tried adding the STATISTICS_MINIMUM and STATISTICS_MAXIMUM in a few ways, but wasn't able to get it to work until I found out about the undocumented band_tags key for the tags argument of to_raster.

  1. It would great to have documentation of this features, perhaps with an example. I could add a mention of it in the docstring. Would there be other documentation that you think could be helpful?

  2. As described in Envi header information is stripped on write #635 the tags are flattened when read into attributes on read, so band_tags do not round trip between read and write. It seems that XArray will allow for dictionary based attributes. Would this be a feature of interest? I guess it would need to have a toggle on read, as NetCDF doesn't allow dicts as attributes.

@neda-dtu neda-dtu added the proposal Idea for a new feature. label Aug 1, 2024
@snowman2
Copy link
Member

snowman2 commented Aug 3, 2024

I wonder if it would be helpful to have an interface to writing tags to the Dataset/DataArray similar to rasterio's: https://rasterio.readthedocs.io/en/stable/topics/tags.html#writing-tags

@snowman2
Copy link
Member

snowman2 commented Aug 3, 2024

@neda-dtu, mind providing examples of what your failed attempts looked like?

@neda-dtu
Copy link
Author

neda-dtu commented Aug 5, 2024

I am not sure that I follow how the Dataset/DataArray tags would be handled similar to rasterio. Would this imply making an additional attribute to hold the tags?

Here are my successful and failed examples. I used the attached test file from the Geotiff.js repository. nt_20201024_f18_nrt_s.tif.zip

The metadata from gdalinfo looks like:

Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-3950000.000, 4350000.000) ( 42d14'27.21"W, 39d13'51.20"S)
Lower Left  (-3950000.000,-3950000.000) (135d 0' 0.00"W, 41d26'49.04"S)
Upper Right ( 3950000.000, 4350000.000) ( 42d14'27.21"E, 39d13'51.20"S)
Lower Right ( 3950000.000,-3950000.000) (135d 0' 0.00"E, 41d26'49.04"S)
Center      (       0.000,  200000.000) (  0d 0' 0.01"E, 88d 9'14.17"S)
Band 1 Block=316x6 Type=Float32, ColorInterp=Gray
  Min=0.000 Max=100.000 
  Minimum=0.000, Maximum=100.000, Mean=28.560, StdDev=39.350
  NoData Value=-3.4e+38
  Metadata:
    STATISTICS_MAXIMUM=100
    STATISTICS_MEAN=28.560288669249
    STATISTICS_MINIMUM=0
    STATISTICS_STDDEV=39.349526064368

My first attempt was to just roundtrip the file:

import xarray as xr
import rioxarray
ds = xr.open_dataarray("nt_20201024_f18_nrt_s.tif")
ds.rio.to_raster("nt_roundtrip.tif", "COG")

This resulted in the statistics metadata being not being assigned to the bands, but the nodata value was assigned and a new description band field was added.

Metadata:
  STATISTICS_MAXIMUM=100
  STATISTICS_MEAN=28.560288669249
  STATISTICS_MINIMUM=0
  STATISTICS_STDDEV=39.349526064368
  AREA_OR_POINT=Area
Image Structure Metadata:
  LAYOUT=COG
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-3950000.000, 4350000.000) ( 42d14'27.21"W, 39d13'51.20"S)
Lower Left  (-3950000.000,-3950000.000) (135d 0' 0.00"W, 41d26'49.04"S)
Upper Right ( 3950000.000, 4350000.000) ( 42d14'27.21"E, 39d13'51.20"S)
Lower Right ( 3950000.000,-3950000.000) (135d 0' 0.00"E, 41d26'49.04"S)
Center      (       0.000,  200000.000) (  0d 0' 0.01"E, 88d 9'14.17"S)
Band 1 Block=512x512 Type=Float32, ColorInterp=Gray
  Description = band_data
  NoData Value=-3.4e+38

My next attempt was to try to write the whole dataset to raster. To try and differentiate between the band tags and the dataset tags. Of course didn't work as only dataarrays can be written.

I then tried adding the tiff tag for GDALMetadata 42112, with the XML added, as I saw when using tiffinfo that the XML from the good file had an extra sample keyword compared to the round-tripped file.

original tiffinfo

  GDAL Metadata: <GDALMetadata>
  <Item name="STATISTICS_MAXIMUM" sample="0">100</Item>
  <Item name="STATISTICS_MEAN" sample="0">28.560288669249</Item>
  <Item name="STATISTICS_MINIMUM" sample="0">0</Item>
  <Item name="STATISTICS_STDDEV" sample="0">39.349526064368</Item>
</GDALMetadata>

roundtripped tiffinfo

  GDAL Metadata: <GDALMetadata>
  <Item name="STATISTICS_MAXIMUM">100</Item>
  <Item name="STATISTICS_MEAN">28.560288669249</Item>
  <Item name="STATISTICS_MINIMUM">0</Item>
  <Item name="STATISTICS_STDDEV">39.349526064368</Item>
  <Item name="DESCRIPTION" sample="0" role="description">band_data</Item>
</GDALMetadata>

This didn't work as I wasn't supposed to be adding tiff tags directly and the XML wasn't parsed correctly.

Finally, I dug around in the rioxarray code to see why the description field was correctly added to the band, which was when I found that band tags worked.

I updated my code to add the statistics I needed as band tags:

import xarray as xr
import rioxarray
ds = xr.open_dataarray("nt_20201024_f18_nrt_s.tif")
tags = {}
tags["STATISTICS_MINIMUM"] = float(ds.min())
tags["STATISTICS_MAXIMUM"] = float(ds.max())

ds.rio.to_raster("nt_band_tags.tif", "COG", tags={"band_tags": [tags]})

Then I got the idea that I could add the band_tags attribute to the dataset for round-trip, which worked in my test.

import xarray as xr
import rioxarray
ds = xr.open_dataarray("nt_20201024_f18_nrt_s.tif")
attrs = ds.attrs
ds.attrs = {
    "AREA_OR_POINT": attrs.pop("AREA_OR_POINT"),
    "band_tags": [attrs],
}
ds.rio.to_raster("nt_band_attrs.tif", "COG")

@snowman2
Copy link
Member

snowman2 commented Aug 5, 2024

The idea would be to add ds.rio.write_tags or something so you don't have to worry about the intrernals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Idea for a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants