Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add daymet #213

Draft
wants to merge 47 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
1d74af4
Test
yuvipanda Oct 28, 2022
532209c
Remove enums
yuvipanda Oct 28, 2022
2bc3674
Fucking scope shit
yuvipanda Oct 28, 2022
1dbf425
Till last year
yuvipanda Oct 28, 2022
42573b1
Make it a dict
yuvipanda Oct 28, 2022
a494b48
debug
yuvipanda Oct 28, 2022
d2d43d7
Use cmr
yuvipanda Oct 28, 2022
3208078
Support earthdata login
yuvipanda Oct 28, 2022
d65ab19
Split things into one per region per variable
yuvipanda Oct 28, 2022
961835b
Set nitems_per_input
yuvipanda Oct 28, 2022
3c0a3e5
Fix name
yuvipanda Oct 28, 2022
34f2cbc
Don't shard by region
yuvipanda Oct 28, 2022
2c8e8c4
Try a different nitems_per_file
yuvipanda Oct 28, 2022
0dc6e0a
pass in nitems elsewhere
yuvipanda Oct 28, 2022
be9590d
Try this
yuvipanda Oct 28, 2022
5a50ed2
Try just 1 input per chunk
yuvipanda Oct 28, 2022
5c6b152
Merge all vars into one store
yuvipanda Oct 28, 2022
f65b42e
Try going one level deeper
yuvipanda Oct 28, 2022
eb4c12d
Fuck serialization
yuvipanda Oct 28, 2022
4537e10
Pass as args
yuvipanda Oct 28, 2022
77ae336
Fix ordering
yuvipanda Oct 28, 2022
9cc0228
Try amend
yuvipanda Oct 28, 2022
3c9058c
XarrayZarr can't handle more than 1 MergeDim apparently
yuvipanda Oct 28, 2022
0d7df7b
Revert "XarrayZarr can't handle more than 1 MergeDim apparently"
yuvipanda Oct 28, 2022
dcfff1e
Revert "Revert "XarrayZarr can't handle more than 1 MergeDim apparent…
yuvipanda Oct 28, 2022
e5243ad
Revert "Revert "Revert "XarrayZarr can't handle more than 1 MergeDim …
yuvipanda Oct 29, 2022
f353f41
Try
yuvipanda Oct 29, 2022
0c9d57e
Try dict
yuvipanda Oct 29, 2022
1096f8a
Recipe name
yuvipanda Oct 29, 2022
717edef
Try partial
yuvipanda Oct 29, 2022
758864b
Debug
yuvipanda Oct 29, 2022
107b150
Try multiple concat dims
yuvipanda Oct 29, 2022
83d2263
What if we return none
yuvipanda Oct 29, 2022
1f8feb8
Fuck
yuvipanda Oct 29, 2022
3fe8f19
Try just na
yuvipanda Oct 29, 2022
e6c215f
blah
yuvipanda Oct 29, 2022
87ee3a5
fadf
yuvipanda Oct 29, 2022
e383f2a
try just one thing
yuvipanda Oct 29, 2022
6e1a899
try target chunks
yuvipanda Oct 29, 2022
d0608b2
ldsf
yuvipanda Oct 29, 2022
5d5e269
Simplify so recipe works for 1 region, 1 variable
yuvipanda Nov 9, 2022
3caff10
Create a zarr output per variable for hi
yuvipanda Nov 9, 2022
24eb1fb
Produce one recipe per region per variable
yuvipanda Nov 9, 2022
89971c7
Adopt recipe to use beam-refactor
yuvipanda Feb 17, 2023
cff9b83
Explicitly install required packages
yuvipanda Feb 17, 2023
9f70560
Generate all variables for na
yuvipanda Feb 17, 2023
de5b8ee
Use / as separator in subpath
yuvipanda Feb 17, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions recipes/daymet/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
title: Daymet
description: >
Daily Surface Weather and Climatological Summaries (Daymet) provides
long-term, continuous, gridded estimates of daily weather and climatology
variables by interpolating and extrapolating ground-based observations through
statistical modeling techniques. The Daymet data products provide driver data
for biogeochemical terrestrial modeling and have myriad applications in many
Earth science, natural resource, biodiversity, and agricultural research
areas. Daymet weather variables include daily minimum and maximum temperature,
precipitation, vapor pressure, shortwave radiation, snow water equivalent, and
day length produced on a 1 km x 1 km gridded surface over continental North
America and Hawaii from 1980 and over Puerto Rico from 1950 through the end of
the most recent full calendar year.
pangeo_forge_version: '0.9.0'
pangeo_notebook_version: '2022.06.02'
recipes:
dict_object: recipe:recipes
provenance:
license: 'No constraints on data access or use.'
maintainers:
- name: 'Charles Stern'
orcid: '0000-0002-4078-0852'
github: cisaacstern
bakery:
id: 'pangeo-ldeo-nsf-earthcube'
96 changes: 96 additions & 0 deletions recipes/daymet/recipe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
import netrc
import os
from functools import partial

import aiohttp
import apache_beam as beam
from pangeo_forge_cmr import get_cmr_granule_links

from pangeo_forge_recipes import patterns
from pangeo_forge_recipes.patterns import pattern_from_file_sequence
from pangeo_forge_recipes.transforms import OpenURLWithFSSpec, OpenWithXarray, StoreToZarr

# We need to provide EarthData credentials to fetch the files.
# The credentials of the currently logged in user are used, and passed on to the cloud
# as well when the operation is scaled out. This shall be automated with a machine identity
# in the future.
# go here to set up .netrc file: https://disc.gsfc.nasa.gov/data-access
username, _, password = netrc.netrc().authenticators('urs.earthdata.nasa.gov')
client_kwargs = {
'auth': aiohttp.BasicAuth(username, password),
'trust_env': True,
}


class OpenURLWithEarthDataLogin(OpenURLWithFSSpec):
def expand(self, *args, **kwargs):
auth_kwargs = {}
if 'EARTHDATA_LOGIN_TOKEN' in os.environ:
auth_kwargs = {
'headers': {'Authorization': f'Bearer {os.environ["EARTHDATA_LOGIN_TOKEN"]}'}
}
elif os.path.exists(os.environ.get('NETRC', os.path.expanduser('~/.netrc'))):
# FIXME: Actually support the NETRC environment variable
username, _, password = netrc.netrc().authenticators('urs.earthdata.nasa.gov')
auth_kwargs = {
'auth': aiohttp.BasicAuth(username, password)
}
if auth_kwargs:
if self.open_kwargs is None:
self.open_kwargs = auth_kwargs
else:
self.open_kwargs.update(auth_kwargs)
return super().expand(*args, **kwargs)
# Get the daymet latest version data
shortname = 'Daymet_Daily_V4R1_2129'

all_files = get_cmr_granule_links(shortname)

split_files = {
'hi': {'files': {}, 'kwargs': {'inputs_per_chunk': 1}},
'pr': {'files': {}, 'kwargs': {'inputs_per_chunk': 1}},
'na': {
'files': {},
# `subset_inputs` says split the *input* file into *n number of chunks* (dynamically calculating the
# size of the chunks) while *reading*. This is helpful only for large input files, as otherwise the file is just
# too big to be read into memory. This *does not* affect the output at all in any way or form!
# `target_chunks` describes the *size* (in number of items) to set the chunking of the *output*
# zarr store, and is what determines reading speeds.
'kwargs': {'subset_inputs': {'time': 365}, "target_chunks": {'time': 14}},
},
}

for f in all_files:
# File URLs look like https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4R1/data/daymet_v4_daily_hi_vp_2021.nc,
# or rather, https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4R1/data/daymet_v4_daily_<region>_<variable>_<year>.nc
# variable is one of hi, na or pr. There is one file per year, and one per variable
region, var, year = f.rsplit("/", 1)[1].rsplit(".", 1)[0].rsplit("_", 3)[1:]
split_files[region]['files'].setdefault(var, []).append(f)


recipes = {}

all_vars = set(k for k in split_files[region]['files'].keys() for region in split_files)

region = 'na'

for var in all_vars:
pattern = pattern_from_file_sequence(
split_files[region]['files'][var],
concat_dim="time",
nitems_per_file=365,
# fsspec_open_kwargs={'engine': 'netcdf4'}
# fsspec_open_kwargs={'backend_kwargs': {'storage_options': {'client_kwargs': client_kwargs}}, 'engine': 'h5netcdf'},
)
recipe = (
beam.Create(pattern.items())
| OpenURLWithEarthDataLogin()
| OpenWithXarray(xarray_open_kwargs={'chunks': 'auto'})
| StoreToZarr(
target_subpath=f'{region}/{var}',
target_chunks={'time': 128},
combine_dims=pattern.combine_dim_keys,
)
)

recipes[f'{region}-{var}'] = recipe
2 changes: 2 additions & 0 deletions recipes/daymet/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pangeo-forge-cmr
git+https://github.com/yuvipanda/pangeo-forge-recipes@subpath