-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clib.conversion: Deal with np.object dtype in vectors_to_arrays and deprecate the array_to_datetime function #3507
base: main
Are you sure you want to change the base?
Changes from 2 commits
3661e54
20b9215
83673cf
56a0841
6338cde
1864556
54160bf
f4e1a5f
ed3be20
8ab6c6c
e26afbf
be3c93e
3d40687
8f99a97
5ecf445
cd75c5c
91e10a7
47214e5
d8777e5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -18,7 +18,6 @@ | |||||||||||||||||
import pandas as pd | ||||||||||||||||||
import xarray as xr | ||||||||||||||||||
from pygmt.clib.conversion import ( | ||||||||||||||||||
array_to_datetime, | ||||||||||||||||||
dataarray_to_matrix, | ||||||||||||||||||
sequence_to_ctypes_array, | ||||||||||||||||||
strings_to_ctypes_array, | ||||||||||||||||||
|
@@ -854,22 +853,13 @@ def _check_dtype_and_dim(self, array, ndim): | |||||||||||||||||
""" | ||||||||||||||||||
# Check that the array has the given number of dimensions | ||||||||||||||||||
if array.ndim != ndim: | ||||||||||||||||||
raise GMTInvalidInput( | ||||||||||||||||||
f"Expected a numpy {ndim}-D array, got {array.ndim}-D." | ||||||||||||||||||
) | ||||||||||||||||||
msg = f"Expected a numpy {ndim}-D array, got {array.ndim}-D." | ||||||||||||||||||
raise GMTInvalidInput(msg) | ||||||||||||||||||
|
||||||||||||||||||
# Check that the array has a valid/known data type | ||||||||||||||||||
if array.dtype.type not in DTYPES: | ||||||||||||||||||
try: | ||||||||||||||||||
if array.dtype.type is np.object_: | ||||||||||||||||||
# Try to convert unknown object type to np.datetime64 | ||||||||||||||||||
array = array_to_datetime(array) | ||||||||||||||||||
else: | ||||||||||||||||||
raise ValueError | ||||||||||||||||||
except ValueError as e: | ||||||||||||||||||
raise GMTInvalidInput( | ||||||||||||||||||
f"Unsupported numpy data type '{array.dtype.type}'." | ||||||||||||||||||
) from e | ||||||||||||||||||
msg = f"Unsupported numpy data type '{array.dtype.type}'." | ||||||||||||||||||
raise GMTInvalidInput(msg) | ||||||||||||||||||
return self[DTYPES[array.dtype.type]] | ||||||||||||||||||
|
||||||||||||||||||
def put_vector(self, dataset, column, vector): | ||||||||||||||||||
|
@@ -917,7 +907,7 @@ def put_vector(self, dataset, column, vector): | |||||||||||||||||
gmt_type = self._check_dtype_and_dim(vector, ndim=1) | ||||||||||||||||||
if gmt_type in {self["GMT_TEXT"], self["GMT_DATETIME"]}: | ||||||||||||||||||
if gmt_type == self["GMT_DATETIME"]: | ||||||||||||||||||
vector = np.datetime_as_string(array_to_datetime(vector)) | ||||||||||||||||||
vector = np.datetime_as_string(vector) | ||||||||||||||||||
vector_pointer = strings_to_ctypes_array(vector) | ||||||||||||||||||
else: | ||||||||||||||||||
vector_pointer = vector.ctypes.data_as(ctp.c_void_p) | ||||||||||||||||||
|
@@ -1388,7 +1378,7 @@ def virtualfile_from_vectors(self, *vectors): | |||||||||||||||||
# Assumes that first 2 columns contains coordinates like longitude | ||||||||||||||||||
# latitude, or datetime string types. | ||||||||||||||||||
for col, array in enumerate(arrays[2:]): | ||||||||||||||||||
if pd.api.types.is_string_dtype(array.dtype): | ||||||||||||||||||
if array.dtype.type == np.str_: | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we'll need to check if this can handle There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both can be converted to the numpy string dtype by the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The main idea of this PR is to let For any special dtypes that we know how to convert it to numpy dtype, we can maintain a mapping dictionary, just like what you did to support pyarrow's date32[day] and date64[ms] in #2845: pygmt/pygmt/clib/conversion.py Lines 208 to 211 in c2e429c
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In 83673cf, I've moved most of the doctests into a separate test file There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Based on the tests below, I think we should add the entry In [1]: import pandas as pd
In [2]: x = pd.Series(["abc", "12345"])
In [3]: x.dtype
Out[3]: dtype('O')
In [4]: str(x.dtype)
Out[4]: 'object'
In [5]: x = pd.Series(["abc", "12345"], dtype="string")
In [6]: x.dtype
Out[6]: string[python]
In [7]: str(x.dtype)
Out[7]: 'string'
In [8]: x = pd.Series(["abc", "12345"], dtype="string[pyarrow]")
In [9]: x.dtype
Out[9]: string[pyarrow]
In [10]: str(x.dtype)
Out[10]: 'string'
In [11]: import pyarrow as pa
In [12]: x = pa.array(["abc", "defghi"])
In [13]: x.type
Out[13]: DataType(string)
In [14]: str(x.type)
Out[14]: 'string' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In PR #2774 and #2774, we only checked if PyGMT supports pandas with the pyarrow backend, but didn't check if the original pyarrow arrays works. For example, for a pyarrow
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, raw |
||||||||||||||||||
columns = col + 2 | ||||||||||||||||||
break | ||||||||||||||||||
|
||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this PR,
array_to_datetime
is no longer used, but I still want to keep this function so that we know what kinds of datetime formats thatnp.asarray(array, dtype=np.datetime64)
can support.