ENH: Add use_nullable_dtypes for read_html #50286

phofl · 2022-12-15T20:33:44Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

doc/source/whatsnew/v2.0.0.rst

pandas/_libs/parsers.pyx

mroeschke · 2022-12-22T18:46:15Z

pandas/tests/io/test_html.py

@@ -132,6 +138,64 @@ def test_to_html_compat(self):
        res = self.read_html(out, attrs={"class": "dataframe"}, index_col=0)[0]
        tm.assert_frame_equal(res, df)

+    @pytest.mark.parametrize("nullable_backend", ["pandas", "pyarrow"])


Suggested change

@pytest.mark.parametrize("nullable_backend", ["pandas", "pyarrow"])

@pytest.mark.parametrize("dtype_backend", ["pandas", "pyarrow"])

mroeschke · 2022-12-22T18:46:39Z

pandas/tests/io/test_html.py

+
+        out = df.to_html(index=False)
+        with pd.option_context("mode.string_storage", storage):
+            with pd.option_context("mode.nullable_backend", nullable_backend):


Suggested change

with pd.option_context("mode.nullable_backend", nullable_backend):

with pd.option_context("mode.dtype_backend", nullable_backend):

mroeschke · 2022-12-22T19:03:26Z

pandas/io/html.py

+    use_nullable_dtypes : bool = False
+        Whether to use nullable dtypes as default when reading data. If
+        set to True, nullable dtypes are used for all dtypes that have a nullable
+        implementation, even if no nulls are present.


Could you add the additional paragraph of mode.dtype_backend being available that other docstrings have? (Should start with The nullable dtype implementation)

mroeschke · 2022-12-27T20:38:21Z

Thanks @phofl

DaveGuenther · 2024-10-23T18:08:58Z

Hi Folks, I'm not sure if this is the right venue for comments on patches after the fact, but just updated my codebase from pandas 1.5.3 to the current version (at time of this post it is 2.2), and noticed that at 2.0, there was a change to the nullable string values added to na_values: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#:~:text=Added%20%22None%22%20to%20default%20na_values%20in%20read_csv()%20(GH%2050286

Changing "None" to NaN ended up introducing a breaking change to my script, where it still ran without runtime errors, but processed the data differently causing errors in the output dataset. I had a csv file with "None" intentionally present in some columns in order to show the word on a dashboard. The issue didn't actually present until that null value showed up in an np.where() where the condition checked to see if it was "None". The observation then followed an undesired logic path.

I addressed this by copying the default na_values list from pandas 1.5.3 and overriding the one in pandas 2.2 (as I'd noticed a number of new values showed up in the default list in addition to "None").

I'm not sure I can recommend a better way to introduce a change like this, or a way to better communicate this to users, and the change was mentioned pretty far down the release notes.. You probably don't want to put FutureWarnings in read_csv() for everyone who uses it as it'd get pretty annoying. At any rate, I wanted to make a note of this, as adding/removing values from the default na_values list might introduce a "soft" breaking change when moving to new pandas versions.

Cheers,

ENH: Add use_nullable_dtypes for read_html

4812032

phofl added Enhancement IO HTML read_html, to_html, Styler.apply, Styler.applymap NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Dec 15, 2022

phofl commented Dec 15, 2022

View reviewed changes

doc/source/whatsnew/v2.0.0.rst Outdated Show resolved Hide resolved

Add gh ref

431e6e7

phofl commented Dec 15, 2022

View reviewed changes

pandas/_libs/parsers.pyx Show resolved Hide resolved

phofl added 3 commits December 15, 2022 23:31

Fix test

e1c4328

Fix test

a6df2c8

Add whatsnew

3156f9b

mroeschke reviewed Dec 22, 2022

View reviewed changes

phofl added 3 commits December 23, 2022 18:46

Merge remote-tracking branch 'upstream/main' into use_nullable_html

bdd5652

Address review

c7fb7dc

Add backend

abd64ad

mroeschke approved these changes Dec 27, 2022

View reviewed changes

mroeschke added this to the 2.0 milestone Dec 27, 2022

mroeschke merged commit b0305f7 into pandas-dev:main Dec 27, 2022

phofl deleted the use_nullable_html branch December 28, 2022 19:15

graingert mentioned this pull request Apr 6, 2023

BUG: pd.read_csv(io.StringIO("a\nNone")).a[0] is 'None' on pandas 1 but NaN on pandas 2 #52493

Open

3 tasks

jorisvandenbossche mentioned this pull request May 31, 2023

DOC Rework outlier detection estimators example scikit-learn/scikit-learn#25878

Merged

a4rcvv mentioned this pull request Mar 21, 2024

compilation fails with pandas >=2.0 yasserfarouk/scml-vis#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add use_nullable_dtypes for read_html #50286

ENH: Add use_nullable_dtypes for read_html #50286

phofl commented Dec 15, 2022

mroeschke Dec 22, 2022

mroeschke Dec 22, 2022

mroeschke Dec 22, 2022

phofl Dec 23, 2022

mroeschke commented Dec 27, 2022

DaveGuenther commented Oct 23, 2024 •

edited

Loading

	@pytest.mark.parametrize("nullable_backend", ["pandas", "pyarrow"])
	@pytest.mark.parametrize("dtype_backend", ["pandas", "pyarrow"])

	with pd.option_context("mode.nullable_backend", nullable_backend):
	with pd.option_context("mode.dtype_backend", nullable_backend):

ENH: Add use_nullable_dtypes for read_html #50286

ENH: Add use_nullable_dtypes for read_html #50286

Conversation

phofl commented Dec 15, 2022

mroeschke Dec 22, 2022

Choose a reason for hiding this comment

mroeschke Dec 22, 2022

Choose a reason for hiding this comment

mroeschke Dec 22, 2022

Choose a reason for hiding this comment

phofl Dec 23, 2022

Choose a reason for hiding this comment

mroeschke commented Dec 27, 2022

DaveGuenther commented Oct 23, 2024 • edited Loading

DaveGuenther commented Oct 23, 2024 •

edited

Loading