-
Hi, Here is some code that I have written to perform this task :
The issue with this code is that it fills up my memory. I have tried using different chunk sizes on the opened tiff, dropping the NODATA values inside the Has anyone performed a similar task using rioxarray and could point me to the right way of doing this? Ultimately I want the data to be stored to parquet but that's another problem. Sam |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
da = rioxarray.open_rasterio(f"{BASE_PATH}large_tiff.tif", cache=False, mask_and_scale=True)
windows = [...]
out_data = []
for window in windows:
subset = da.rio.isel_window(window)
out_data.append(
subset
.to_dataframe()
.reset_index()
.drop_vars("spatial_ref")
.dropna(how="all", subset=list(subset.data_vars))
)
pandas.concat(out_data).to_parquet(f"{BASE_PATH}tiff_selection.parquet") |
Beta Was this translation helpful? Give feedback.
rioxarray.open_rasterio
lazily loads the data by default. You can take advantage of this in your process and read the data in windows (or you could just select subsets of the array by slicing).To prevent your memory from loading up, I recommend using the
cache=False
kwarg when opening the file.