-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Types: Unlock supporting timezone-aware DateTime
fields
#22
Conversation
DateTime
fieldsDateTime
fields
DateTime
fieldsDateTime
fields
738fc7f
to
2eca67a
Compare
4fb90ac
to
347f2a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, thx.
tests/datetime_test.py
Outdated
assert result["date"].year == 2009 | ||
assert result["datetime"].year == 2009 | ||
assert result["datetime"].tzname() is None | ||
assert result["datetime"].timetz() == dt.time(19, 19, 30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any conversion, using the CST timezone, the tz is not set on the result (as it should be) but the value stored shouldn't it be converted to UTC first ? (so hour should be 18?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When inserting or updating records using timezone-aware Python
date
ordatetime
objects, timezone information will not be preserved.
-- https://cratedb.com/docs/sqlalchemy-cratedb/data-types.html
Shouldn't it be converted to UTC first?
I am not sure. We may want to validate how it works with PostgreSQL and its SQLAlchemy dialect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use timestamp WITH time zone
as the column data type to make the conversion, otherwise the time zone information is just stripped:
cr> create table test(id string, tz timestamp with time zone);
CREATE OK, 1 row affected (0.359 sec)
cr> insert into test values('UTC1', '2024-06-24 10:20:30.123UTC');
INSERT OK, 1 row affected (0.017 sec)
cr> insert into test values('UTC2', '2024-06-24 10:20:30.123');
INSERT OK, 1 row affected (0.017 sec)
cr> insert into test values('+2h', '2024-06-24 10:20:30.123+02:00');
INSERT OK, 1 row affected (0.009 sec)
cr> insert into test values('+3h', '2024-06-24 10:20:30.123+03:00');
INSERT OK, 1 row affected (0.018 sec)
cr> select * from test order by id;
+------+---------------+
| id | tz |
+------+---------------+
| +2h | 1719217230123 |
| +3h | 1719213630123 |
| UTC1 | 1719224430123 |
| UTC2 | 1719224430123 |
+------+---------------+
SELECT 4 rows in set (0.004 sec)
cr> drop table test;
DROP OK, 1 row affected (0.045 sec)
cr> create table test(id string, tz timestamp without time zone);
CREATE OK, 1 row affected (0.361 sec)
cr> insert into test values('UTC1', '2024-06-24 10:20:30.123UTC');
INSERT OK, 1 row affected (0.017 sec)
cr> insert into test values('UTC2', '2024-06-24 10:20:30.123');
INSERT OK, 1 row affected (0.017 sec)
cr> insert into test values('+2h', '2024-06-24 10:20:30.123+02:00');
INSERT OK, 1 row affected (0.018 sec)
cr> insert into test values('+3h', '2024-06-24 10:20:30.123+03:00');
INSERT OK, 1 row affected (0.017 sec)
cr> select * from test order by id;
+------+---------------+
| id | tz |
+------+---------------+
| +2h | 1719224430123 |
| +3h | 1719224430123 |
| UTC1 | 1719224430123 |
| UTC2 | 1719224430123 |
+------+---------------+
SELECT 4 rows in set (0.004 sec)
Can we have a test with time zone
for the col type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, we follow the behavior of postgres:
matriv=> create table test(id text, tz timestamp without time zone);
CREATE TABLE
matriv=> insert into test values('UTC1', '2024-06-24 10:20:30.123UTC');
INSERT 0 1
matriv=> insert into test values('UTC2', '2024-06-24 10:20:30.123');
INSERT 0 1
matriv=> insert into test values('+2hr', '2024-06-24 10:20:30.123+02:00');
INSERT 0 1
matriv=> insert into test values('+3hr', '2024-06-24 10:20:30.123+03:00');
INSERT 0 1
matriv=> select * from test;
id | tz
------+-------------------------
UTC1 | 2024-06-24 10:20:30.123
UTC2 | 2024-06-24 10:20:30.123
+2hr | 2024-06-24 10:20:30.123
+3hr | 2024-06-24 10:20:30.123
(4 rows)
matriv=> drop table test;
DROP TABLE
matriv=> create table test(id text, tz timestamp with time zone);
CREATE TABLE
matriv=> insert into test values('UTC1', '2024-06-24 10:20:30.123UTC');
INSERT 0 1
matriv=> insert into test values('UTC2', '2024-06-24 10:20:30.123');
INSERT 0 1
matriv=> insert into test values('+2hr', '2024-06-24 10:20:30.123+02:00');
INSERT 0 1
matriv=> insert into test values('+3hr', '2024-06-24 10:20:30.123+03:00');
INSERT 0 1
matriv=> select * from test;
id | tz
------+----------------------------
UTC1 | 2024-06-24 13:20:30.123+03
UTC2 | 2024-06-24 10:20:30.123+03
+2hr | 2024-06-24 11:20:30.123+03
+3hr | 2024-06-24 10:20:30.123+03
(4 rows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestions. I will add more test cases, and also validate the implementation using the whole set of possible datetime/timestamp column types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
619c991, added just now, starts using time zone Europe/Kyiv
for testing, but it does not change anything on the test outcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've inspected the wire using sudo ngrep -d lo0 -Wbyline port 4200
, and found that on the JSON level, serialization does not take the time zone into account when invoking the software test test_datetime_tz
using pytest --no-cov -k test_datetime_tz
?
The library just sends 2009-05-13T19:19:30.123456Z
to the wire, effectively ignoring time zone information at data submission time already. It is clear that there is a flaw.
{"stmt": "INSERT INTO foobar (name, date, datetime_notz, datetime_tz) VALUES (?, ?, ?, ?)", "args": ["foo", "2009-05-13", "2009-05-13T19:19:30.123456Z", "2009-05-13T19:19:30.123456Z"]}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contrary to the observation above, like @matriv suggested, it looks like it works well when using direct SQL for querying.
DDL
SHOW CREATE TABLE doc.foobar;
CREATE TABLE IF NOT EXISTS "doc"."foobar" (
"name" TEXT NOT NULL,
"date" TIMESTAMP WITHOUT TIME ZONE,
"datetime_notz" TIMESTAMP WITHOUT TIME ZONE,
"datetime_tz" TIMESTAMP WITH TIME ZONE,
PRIMARY KEY ("name")
)
DML
INSERT INTO foobar (name, date, datetime_notz, datetime_tz) VALUES ('foo', '2009-05-13', '2009-05-13T19:19:30.123456Z', '2009-05-13T19:19:30.123456Z');
INSERT INTO foobar (name, date, datetime_notz, datetime_tz) VALUES ('bar', '2009-05-13', '2009-05-13T19:19:30.123456+03:00', '2009-05-13T19:19:30.123456+03:00');
DQL
SELECT * FROM foobar;
+------+---------------+---------------+---------------+
| name | date | datetime_notz | datetime_tz |
+------+---------------+---------------+---------------+
| foo | 1242172800000 | 1242242370123 | 1242242370123 |
| bar | 1242172800000 | 1242242370123 | 1242231570123 |
+------+---------------+---------------+---------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this spirit, it is clear that there is a flaw, and the relevant spot can be discovered easily. 💥
return value.strftime('%Y-%m-%dT%H:%M:%S.%fZ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
145f701 aims to get the improvement straight, test outcomes now reflect the relevant conversion implicitly performed by CrateDB.
c7212b2
to
833328b
Compare
4482076
to
009d233
Compare
tests/datetime_test.py
Outdated
INPUT_DATETIME_NOTZ = dt.datetime(2009, 5, 13, 19, 19, 30, 123456) | ||
INPUT_DATETIME_TZ = dt.datetime(2009, 5, 13, 19, 19, 30, 123456, tzinfo=CST()) | ||
OUTPUT_DATE = INPUT_DATE | ||
OUTPUT_TIME = dt.time(19, 19, 30, 123000) | ||
OUTPUT_DATETIME_NOTZ = dt.datetime(2009, 5, 13, 19, 19, 30, 123000) | ||
OUTPUT_DATETIME_TZ = dt.datetime(2009, 5, 13, 19, 19, 30, 123000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a bit confusing that there isn't a difference when passing in a time zone or not. I guess this relates that when no time zone is given the local one which is also CST
will be used in python? Inside CrateDB, all timestamps without timezone are treat as UTC. Shouldn't we then rather use an exotic one so the outputs aren't the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be happy to steer this patch into that direction. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this relates that when no time zone is given the local one which is also
CST
will be used in Python?
Co-authored-by: Marios Trivyzas <[email protected]>
... instead of `sa.DateTime` and `sa.TIMESTAMP`. Introduce `visit_TIMESTAMP` from PGTypeCompiler to render SQL DDL clauses like `TIMESTAMP WITH|WITHOUT TIME ZONE`.
009d233
to
619c991
Compare
Effectively, just add `%z` to the format string passed to `strftime`. -- https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
dt = datetime.strptime(args[1], '%Y-%m-%dT%H:%M:%S.%fZ') | ||
dt = datetime.strptime(args[1], '%Y-%m-%dT%H:%M:%S.%f') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can not use the %z
suffix here. Otherwise:
ValueError: time data '2024-06-25T13:48:33.295461' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'
dt = datetime.strptime(args[2], '%Y-%m-%dT%H:%M:%S.%fZ') | ||
dt = datetime.strptime(args[2], '%Y-%m-%dT%H:%M:%S.%f') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dito.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thx!
About
Timezone-aware SQLAlchemy
DateTime
fields apparently are supported already. This patch just removes a corresponding assertion which prevented them from actually being used.References
Originally, this is coming from a monkeypatch to meltano-target-cratedb. Most probably, it is also related to those requests:
Other than this, that other patch is also related:
datetime.date
values onDateTime
fields #25Backlog