REGR: Index.map adding back tz to tz-agnostic result #57475

rhshadrach · 2024-02-17T21:48:03Z

closes BUG: pandas 2.2.0 DataFrame.groupby regression when group labels of type pd.Timestamp #57192 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

It seems to me we should skip casting back to the caller's dtype when the caller dtype is datetime, since these are reliably represented as objects (unlike ints, which loses the size). But datetimes/tzs are definitely not in my wheelhouse.

cc @jbrockmendel

rhshadrach · 2024-02-17T21:48:24Z

pandas/core/indexes/base.py

-            new_values = maybe_cast_pointwise_result(
-                new_values, self.dtype, same_dtype=same_dtype
-            )
+        if self.inferred_type != "datetime64":


not sure what other dtypes might arise here - need to check

jbrockmendel · 2024-02-20T21:42:41Z

I'm pretty sure the problem is in DatetimeArray._from_scalars, where we need to check that dont have tzawareness-mismatch. instead of

if lib.infer_dtype(scalars, skipna=True) not in ["datetime", "datetime64"]:
    raise ValueError

something like

if isinstance(dtype, DatetimeTZDtype) and not lib.is_datetime_with_singletz_array(scalars);
     raise ValueError
elif isinstance(dtype, np.dtype) and not lib.is_naive_datetime_array(scalars):
    raise ValueError

is_naive_datetime_array doesn't actually exist but seems like it should.

rhshadrach · 2024-03-09T12:57:10Z

Thanks @jbrockmendel - this is much better. One thing I'm running into is that is_datetime_with_singletz_array doesn't check if any of the values are actually datetimes (will always return True when there are no datetimes), so I think this still needs to have the line with lib.infer_dtype(scalars, skipna=True) or maybe use lib.is_datetime_or_datetime64_array? Any recommendations?

jbrockmendel · 2024-03-12T02:09:06Z

I'm trying to get rid of uses of lib.infer_dtype as it is very rarely the right tool to reach for.

Seems like is_datetime_with_singletz_array(values) and not isna(values).all() is the simplest way to get what you've described. Could try to combine those into a single function i guess but i wouldn't bother at this point.

# Conflicts: # doc/source/whatsnew/v2.2.1.rst

rhshadrach · 2024-03-31T17:41:59Z

pandas/core/arrays/datetimes.py

-            # TODO: if dtype is passed, check for tzawareness compat?
+        # TODO: require any NAs be valid-for-DTA
+        # TODO: if dtype is passed, check for tzawareness compat?
+        if not lib.is_datetime64_array(scalars):


I believe we need this (or something like it) because e.g. lib.is_datetime_with_singletz_array does not check if there are datetimes:

print(lib.is_datetime_with_singletz_array(np.array([1, 2, 3]))) # True

However, I do not currently understand the behavior differences between this and the previous lib.infer_dtype and need to look into this more. Any guidance here is much appreciated.

Looks like we get that behavior from is_datetime_with_singletz_array bc it implicitly assumes it is called from maybe_convert_objects, where it will only get called if the first non-NaT it sees is a datetime with a tz.

I'm not a fan of infer_dtype bc the string returns are clunky (and hard to reason about, e.g. when do you expect "datetime" vs "datetime64") and fragile when we add new return values (e.g. "integer-na"). Also in cases when you only care about datetimes, it will not short-circuit when it sees a non-datetime.

Thanks - makes perfect sense to move away from infer_dtype. For the old behavior, I'm seeing the use of infer_dtype allow the following to not raise on a non-empty input:

If the scalars has the attribute type, kind, name, or base set to one of "datetime64[ns]", "M", datetime

is_datetime64_array returns True

is_datetime_array returns True

As far as I can tell, the input scalars here must be an ndarray, Index, or ExtensionArray, but I don't feel confident about this from the code path in Index.map.

With this, I'm wondering if the following is a good equivalent:

if ( not is_datetime64_any_dtype(scalars) and not lib.is_datetime64_array(scalars) and not lib.is_datetime_array(scalars) ): raise ValueError

REGR: Index.map adding back tz to tz-agnostic result

ad64b75

rhshadrach added Regression Functionality that used to work in a prior pandas version Apply Apply, Aggregate, Transform, Map Index Related to the Index class or subclasses labels Feb 17, 2024

rhshadrach added this to the 2.2.1 milestone Feb 17, 2024

rhshadrach commented Feb 17, 2024

View reviewed changes

rhshadrach marked this pull request as draft February 19, 2024 23:03

lithomas1 modified the milestones: 2.2.1, 2.2.2 Feb 23, 2024

rhshadrach added 3 commits March 31, 2024 10:23

Merge remote-tracking branch 'upstream/main' into regr_index_map_tz

438d084

# Conflicts: # doc/source/whatsnew/v2.2.1.rst

Rework

ac14e3c

cleanup

effcf58

rhshadrach commented Mar 31, 2024

View reviewed changes

lithomas1 modified the milestones: 2.2.2, 2.2.3 Apr 11, 2024

rhshadrach closed this Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: Index.map adding back tz to tz-agnostic result #57475

REGR: Index.map adding back tz to tz-agnostic result #57475

rhshadrach commented Feb 17, 2024 •

edited

Loading

rhshadrach Feb 17, 2024

jbrockmendel commented Feb 20, 2024

rhshadrach commented Mar 9, 2024 •

edited

Loading

jbrockmendel commented Mar 12, 2024

rhshadrach Mar 31, 2024

jbrockmendel Apr 1, 2024

rhshadrach Apr 14, 2024

REGR: Index.map adding back tz to tz-agnostic result #57475

REGR: Index.map adding back tz to tz-agnostic result #57475

Conversation

rhshadrach commented Feb 17, 2024 • edited Loading

rhshadrach Feb 17, 2024

Choose a reason for hiding this comment

jbrockmendel commented Feb 20, 2024

rhshadrach commented Mar 9, 2024 • edited Loading

jbrockmendel commented Mar 12, 2024

rhshadrach Mar 31, 2024

Choose a reason for hiding this comment

jbrockmendel Apr 1, 2024

Choose a reason for hiding this comment

rhshadrach Apr 14, 2024

Choose a reason for hiding this comment

rhshadrach commented Feb 17, 2024 •

edited

Loading

rhshadrach commented Mar 9, 2024 •

edited

Loading