Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid serialization blues by computing + caching the hash. #429

Merged
merged 5 commits into from
Oct 26, 2022

Conversation

alxmrs
Copy link
Contributor

@alxmrs alxmrs commented Oct 26, 2022

@alxmrs alxmrs marked this pull request as ready for review October 26, 2022 17:29
@andersy005
Copy link
Member

@alxmrs, should we go ahead and merge this?

@alxmrs
Copy link
Contributor Author

alxmrs commented Oct 26, 2022

Sure thing! I'll do the honors :)

@alxmrs alxmrs merged commit beeb15a into pangeo-forge:master Oct 26, 2022
@alxmrs alxmrs deleted the cache-hash branch October 26, 2022 21:30
yuvipanda added a commit to pangeo-forge/pangeo-forge-runner that referenced this pull request Oct 26, 2022
Brings in support for
ttps://github.com/pangeo-forge/pangeo-forge-recipes/pull/429,
but will stop working with any prior versions
@cisaacstern
Copy link
Member

Thanks @alxmrs! What an elegant solution to this problem.

@derekocallaghan
Copy link
Contributor

Hi @cisaacstern and @alxmrs, fyi recipes that use aiohttp objects in file_pattern.fsspec_open_kwargs["client_kwargs"] are currently failing as the hash can't be computed:

FilePattern has its own sha256() implementation, and although fsspec_open_kwargs (and client_kwargs) is currently included, perhaps this could be excluded (similar to the pattern format_function and combine dims currently excluded), or particular kwargs (e.g. any aiohttp instances) could be excluded from fsspec_open_kwargs prior to serialization. I guess timeouts/credentials values are something that aren't necessary for a recipe hash?

    # we exclude the format function and combine dims from ``root`` because they determine the
    # index:filepath pairs yielded by iterating over ``.items()``. if these pairs are generated in
    # a different way in the future, we ultimately don't care.
    root = {
        "fsspec_open_kwargs": pattern.fsspec_open_kwargs,
        "query_string_secrets": pattern.query_string_secrets,
        "file_type": pattern.file_type,
        "nitems_per_file": {
            op.name: op.nitems_per_file  # type: ignore
            for op in pattern.combine_dims
            if op.name in pattern.concat_dims
        },
    }

@cisaacstern
Copy link
Member

Thanks for flagging this, @derekocallaghan. And also for becoming so engaged these last few months! The project is benefiting so much from your contributions.

This issue of not being able to serialize aiohttp objects is important, and we should fix it. Currently it seems like your deep familiarity with this issue appears to be spread across a few different threads (here, and in a couple of staged-recipes discussions you link to above).

Could I ask that you summarize this problem in a new, standalone issue here on pangeo-forge-recipes? Something brief that links to your prior reports will suffice, and if you could also provide the shortest possible MRE that shows how the relevant aiohttp objects throw errors when passed to

def either_encode_or_hash(obj: Any):
"""For objects which are not serializable with ``json.dumps``, this function defines
type-specific handlers which extract either a serializable value or a hash from the object.
:param obj: Any object which is not serializable to ``json``.
"""
if isinstance(obj, Enum): # custom serializer for FileType, CombineOp, etc.
return obj.value
elif hasattr(obj, "sha256"):
return obj.sha256.hex()
elif inspect.isfunction(obj):
return inspect.getsource(obj)
elif isinstance(obj, bytes):
return obj.hex()
raise TypeError(f"object of type {type(obj).__name__} not serializable")

that'd be great.

My guess is that we can resolve this by simply adding an additional conditional block(s) to that function. Open to all other suggestions, though. However we fix it, having a dedicated issue will be helpful to frame the discussion. Thanks so much!

@derekocallaghan
Copy link
Contributor

Thanks @cisaacstern, I'll create a new standalone issue for the aiohttp serialization.

@cisaacstern
Copy link
Member

Thanks @derekocallaghan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generating a recipe's hash fails when using process_chunk and process_input functions
4 participants