Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON encoding of non-finite floats #996

Open
wants to merge 2 commits into
base: spark-branch
Choose a base branch
from

Conversation

LukasBoersma
Copy link

As described in #983, the JSON output is currently generating invalid JSON when non-finite floats (like NaN or Infinity) are included.

As @sbrugman suggested, I added a config option to switch between three behaviours:

  • PYTHON_NATIVE: Use Python's default behaviour and generate the invalid JSON (as before)
  • NULL: Encode non-finite numbers as null values.
  • STRING: Stringify non-finite numbers (for example, NaN becomes the string "nan")

I made the null output the new default because it seems less surprising to encounter a null value when parsing numbers in the JSON compared to a "nan" string. I see that there is also an argument to make the old behaviour the default, to ensure 100% compatibility with existing code. Please let me know what you think should be the default.

Summary of the changes

  • I added an enum JsonNonFiniteEncoding with the three options listed above
  • I added an json_non_finite_encoding field in the Settings class
  • In the encode_it method that already manages the JSON-encoding of all values, I added special handling for non-finite floats, which encodes the values according to the configuration.

The changes should not break anything except that the JSON output will now contain null values instead of the raw "NaN" entries that violate the standard.

Example usage

import pandas as pd
from pandas_profiling import ProfileReport
from pandas_profiling.config import Settings, JsonNonFiniteEncoding

df = pd.DataFrame([1, 1, 1], columns=["a"])

profile = ProfileReport(df, title="Pandas Profiling Report", minimal=True)

profile.config.json_non_finite_encoding = JsonNonFiniteEncoding.STRING

print(profile.to_json())

Gives the following output:

[...]
"kurtosis": "nan",
[...]

When choosing the NULL config option, it would be:

"kurtosis": null,

And with the PYTHON_NATIVE option it would be:

"kurtosis": NaN,

@LukasBoersma LukasBoersma changed the title JSON encode of non-finite floats JSON encoding of non-finite floats Jun 8, 2022
Comment on lines +187 to +189
# Encode non-finite numbers as null values
NULL = 1
# Encode non-finite numbers as null values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same comment, can you update the string one?

Copy link
Contributor

@alexbarros alexbarros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the commitlint issues looks good to me, can you fix it so we can merge this?
Also, can you add some tests?

@chanedwin
Copy link
Collaborator

chanedwin commented Dec 11, 2022

@LukasBoersma could you change this PR to merge into develop instead of spark-branch? it isn't a spark specific issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants