-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash computation fails for recipe file patterns containing aiohttp
objects
#443
Comments
@derekocallaghan, thanks for this detailed description, which made it a lot easier for me to understand this issue and the possible solutions.
I would agree with this. The most expedient fix here seems like excluding pangeo-forge-recipes/pangeo_forge_recipes/patterns.py Lines 299 to 300 in 3441d94
to something like root = {
"fsspec_open_kwargs": {
k: v
for k, v in pattern.fsspec_open_kwargs.items()
# Let's put a comment here describing why we exclude only
# `client_kwargs` (because it can contain hard-to-hash objects)
# so that we remember why we did this.
if k != "client_kwargs"
},
...
} My suggestion of taking this "shortcut" (while memorializing the decision in a comment) is motivated by:
If, once hashes are used in Pangeo Forge Cloud, we then realize that Make sense, @rabernat? If so, I'd invite @derekocallaghan to make a PR for this (if you're interested!). |
Thanks @cisaacstern, I'll create a PR for this next week. |
I'm currently working on the PR for this including corresponding test cases, and it seems that pangeo-forge-recipes/pangeo_forge_recipes/serialization.py Lines 36 to 43 in fe5d1f7
In [55]: dictionary
Out[55]:
{'fsspec_open_kwargs': {'client_kwargs': {'auth': BasicAuth(login='test', password='test', encoding='latin1')}},
'file_type': <FileType.netcdf4: 'netcdf4'>,
'nitems_per_file': {'time': 1}}
In [56]: dumps(
...: dictionary,
...: default=either_encode_or_hash,
...: ensure_ascii=False,
...: sort_keys=True,
...: indent=None,
...: separators=(",", ":"),
...: )
Out[56]: '{"file_type":"netcdf4","fsspec_open_kwargs":{"client_kwargs":{"auth":["test","test","latin1"]}},"nitems_per_file":{"time":1}}' I'll still exclude |
I'll take a look to see whether this is still an issue following the |
Thanks @derekocallaghan 🙏 |
Recipes that use
aiohttp
objects infile_pattern.fsspec_open_kwargs["client_kwargs"]
are currently failing as the hash can't be computed, e.g.:aiohttp.ClientTimeout
, details and suggested workarounds here.aiohttp.BasicAuth
, details here.Although this has been mentioned in the recent hash computation PR #429, it's unrelated to the associated changes that now compute the recipe hash at construction. Instead, the issue lies with
pangeo-forge-recipes.serialization.either_encode_or_hash()
when recipe file patterns containaiohttp
objects infsspec_open_kwargs["client_kwargs"]
.pangeo-forge-recipes/pangeo_forge_recipes/serialization.py
Lines 10 to 25 in fe5d1f7
Example recipe
aiohttp
usage (CCMP) here:The following code snippet will reproduce the problems encountered in the CCMP and Daymet recipes linked above:
FilePattern
has its ownsha256()
implementation, and althoughfsspec_open_kwargs
(andclient_kwargs
) is currently included, perhaps this could be excluded (similar to the patternformat_function
and combine dims currently excluded), or particularkwargs
(e.g. anyaiohttp
objects) could be excluded fromfsspec_open_kwargs
prior to serialization. I guess timeouts/credentials values are something that aren't necessary for a recipe hash?pangeo-forge-recipes/pangeo_forge_recipes/patterns.py
Lines 296 to 308 in 3441d94
In the case of
aiohttp.BasicAuth
, it includesencode()
, which could potentially be used e.g. adding something likeelif hasattr(obj, "encode"):
toeither_encode_or_hash()
. However, nothing similar exists foraiohttp.ClientTimeout
(I assume this is true of otheraiohttp
classes, although I haven't checked), so it may simply be easiest to exclude allaiohttp
objects from the file pattern hash.The text was updated successfully, but these errors were encountered: