Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Add tests to ensure proper conversion of datetime64/timedelta64/timestamp/duration to and from pyarrow #54356

Merged
merged 16 commits into from
Aug 9, 2023

Conversation

Julian048
Copy link
Contributor

@Julian048 Julian048 commented Aug 1, 2023

@Julian048
Copy link
Contributor Author

As per this issue from the pyarrow repository there was error in convert_dtypes and pa.table()/pd.to_pandas() in which pyarrow would always return [ns] as unit regardless of associated metadeta. I have added tests to ensure the conversion always preserves associated unit.

Pyarrow has since fixed and closed the issue in their current main branch (dev version 13.0), and so when it is publicly available the if conditionals mentioned in #54191 can be considered for removal with these tests to ensure functionality remains the same after their removal.

@Julian048 Julian048 changed the title TST: Add tests to ensure proper conversion of pd datetime64 and timedelta64 to respective pyarrow type. TST: Add tests to ensure proper conversion of pd datetime64/timedelta64/timestamp/duration to and from pyarrow Aug 1, 2023
@Julian048 Julian048 changed the title TST: Add tests to ensure proper conversion of pd datetime64/timedelta64/timestamp/duration to and from pyarrow TST: Add tests to ensure proper conversion of datetime64/timedelta64/timestamp/duration to and from pyarrow Aug 1, 2023
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you only add 1 test for the example in #54191? The test should go in test_convert_dtypes.py

@Julian048
Copy link
Contributor Author

I've fixed the test to match the issue and put it in the proper location, but I'm still having a failing CI "minimum versions" and I can't pinpoint why if anyone could help with that.

@mroeschke
Copy link
Member

Might help to merge main again

def test_convert_dtypes_timestamp_and_duration(self):
# GH 54191
pytest.importorskip("pyarrow")
timestamp_series = pd.Series(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mirror the exact code snippet in #54191 (comment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what you're referencing right?
s = pd.Series(pd.date_range("2020-01-01", "2020-01-02", freq="1min"))
s = s.astype("timestamp[ms][pyarrow]")
t = s.convert_dtypes(dtype_backend="pyarrow")
pd.testing.assert_series_equal(s, t)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

@Julian048
Copy link
Contributor Author

I apologize for all the commits, git is new to me

@mroeschke mroeschke added the Testing pandas testing functions or related to the test suite label Aug 9, 2023
@mroeschke mroeschke added this to the 2.1 milestone Aug 9, 2023
@mroeschke mroeschke merged commit c54ceec into pandas-dev:main Aug 9, 2023
36 checks passed
@mroeschke
Copy link
Member

Thanks @Julian048

mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Aug 18, 2023
…timestamp/duration to and from pyarrow (pandas-dev#54356)

* add pyarrow conversion pytests

* make test match GH issue and move test to proper location

* fix naming of expected and result variables for assertion

* optimize test

* make test mirror GH pandas-dev#54191

* fix test logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: convert_dtypes(dtype_backend="pyarrow") converts "timestamp[ms][pyarrow]" to "timestamp[ns][pyarrow]"
2 participants