Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

materialize-postgres: materialize UUID formatted strings as UUID columns #1993

Merged
merged 2 commits into from
Sep 30, 2024

Conversation

williamhbaker
Copy link
Member

@williamhbaker williamhbaker commented Sep 30, 2024

Description:

Materialize UUID formatted strings as UUID columns, but continue to allow TEXT columns to validate for these fields since there are pre-existing columns and we don't want to force a backfill of them. Transferring the values of the fields is the same if they are UUID columns vs. TEXT columns.

Going forward new or re-backfilled tables will have UUID formatted strings materialized as UUID columns.

Also: Create load tables for materialize-postgres with column DDL based on the key columns of the target tables. This is required for join query comparisons to work correctly for pre-existing tables that have UUID fields materialized as TEXT columns. Most of the diff is a result of this change.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

Materialize UUID formatted strings as UUID columns, but continue to allow TEXT
columns to validate for these fields since there are pre-existing columns and we
don't want to force a backfill of them. Transferring the values of the fields is
the same if they are UUID columns vs. TEXT columns.
…le key columns

In scenarios where there may be more than one allowed pre-existing column type,
the load table must be created with a column type corresponding to the existing
table column to ensure that join query comparisons work correctly.

This required threading through a hydrated `InfoSchema` to the `NewTransactor`
constructor, and that has been added as a generally available capability. This
simplifies `materialize-redshift` and `materialize-mysql` which were already
making an `InfoSchema` in their own bespoke way, and eventually most other
materializations will probably need to do something like `materialize-postgres`
does with its load table columns.

Currently the only other materialization that matches its load table column
types to the existing table column types is `materialize-bigquery`, and it may
be useful to refactor this handling in terms of the `InfoSchema` at some point
as well.
@williamhbaker williamhbaker added the change:unplanned Unplanned change, useful for things like doc updates label Sep 30, 2024
@williamhbaker williamhbaker requested a review from a team September 30, 2024 17:25
Copy link
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@williamhbaker williamhbaker merged commit c0c172f into main Sep 30, 2024
50 of 53 checks passed
@williamhbaker williamhbaker deleted the wb/postgres-uuid branch September 30, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:unplanned Unplanned change, useful for things like doc updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants