Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Include df.attrs in to_json output #51012

Open
1 of 3 tasks
janosh opened this issue Jan 27, 2023 · 9 comments
Open
1 of 3 tasks

ENH: Include df.attrs in to_json output #51012

janosh opened this issue Jan 27, 2023 · 9 comments
Labels
Enhancement IO JSON read_json, to_json, json_normalize

Comments

@janosh
Copy link
Contributor

janosh commented Jan 27, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Persist dataset metadata

Feature Description

df.attrs is still experimental but would be great if it was written to JSON as metadata alongside the dataframe's content by df.to_json.

Alternative Solutions

Slightly clunky: Writing metadata to new line in same JSON file as df exported to manually. This limits the read options (e.g. can no longer import the file in NodeJS since invalid JSON).

https://stackoverflow.com/a/33113390

Additional Context

No response

@janosh janosh added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 27, 2023
@topper-123
Copy link
Contributor

topper-123 commented May 5, 2023

I like this idea, though of course it will only work with objects that are serializable. How this will interact with json validators should be considered.

As you say, attrs is still experimental. I would like it to be stable before using it in other locations in Pandas. The attrs feature is very simple, so I'd say it should be easy to decide to keep it permanently or not (I'm +1 on keeping it permanently)

@topper-123 topper-123 added IO JSON read_json, to_json, json_normalize and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 5, 2023
@janosh
Copy link
Contributor Author

janosh commented May 5, 2023

I'm +1 on keeping it permanently

Me too! Seems like a no-brainer. Being able to store metadata directly with a serialized dataframe will be a big deal imo!

@rmhowe425
Copy link
Contributor

take

@rmhowe425
Copy link
Contributor

@janosh @topper-123

Just to make sure that I understand what the enhancement request is, whenever a data frame is written to a .json file using df.to_json(), we're looking to also write df.attrs to the same file?

And I know that we stated that df.attrs is experimental right now, but ideally this should be implemented in a way where whenever df.from_json() is called, df.attrs is also read in from the json file?

@janosh
Copy link
Contributor Author

janosh commented Jun 2, 2023

Yes to both questions! 👍

@topper-123
Copy link
Contributor

xref discussion in #52166.

@rmhowe425
Copy link
Contributor

rmhowe425 commented Jun 11, 2023

@topper-123

Just to make sure, are we okay with implementing this under the assumption that the path_or_buf param for to_json() will never be a JSON literal?

Referencing PR #53409

@topper-123
Copy link
Contributor

The exact status of attrs hasn't been decided yet in #52166, so it's not completely decided .

I think we have decided to keep it but drop propagating attrs, but not 100 % sure, So if the implementation is simple and you're up for it, you could to make a PR just to see the response IMO.

@rmhowe425 rmhowe425 removed their assignment Jun 25, 2023
@tpvasconcelos
Copy link

It looks like the decision to require attrs to be JSON-serializable was (implicitly) made in this PR: #54346 and has already been shipped as part of v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

No branches or pull requests

4 participants