-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
materialize-sql: migratable columns #1928
Conversation
c7e0f48
to
5503dcb
Compare
2d6619e
to
61327f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments are below. I am in favor of moving the resumption out of InfoSchema, and also moving the renaming logic into materialize-sql
, and my comments are kind of related to that.
materialize-sql/test_support.go
Outdated
return out.String() | ||
} | ||
|
||
func DumpTestTable(t *testing.T, db *stdsql.DB, qualifiedTableName string, ordering string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we can't use the StdDumpTable
and StdGetSchema
in place of this? I see that there is an ordering
parameter, but from what I can tell there is only ever a single row of data in the table.
7f747bd
to
f748613
Compare
4804927
to
bc8954a
Compare
dab291a
to
c427c77
Compare
2b73c48
to
338e9ea
Compare
338e9ea
to
47a098b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % a few final comments
@@ -0,0 +1,37 @@ | |||
collections: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment directly on the file but there's a materialize-sql/testdata/generated_specs/
and also materialize-sql/testdata/validate/generated_specs/
. We don't need both of those do we? Also I don't see any way to regenerate the proto files if the specs are changed.
TBH I'd be in favor of getting rid of the TestValidateMigrations
and supporting code entirely since I don't think it's doing anything that the individual materialization tests aren't also doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@williamhbaker these are different generated specs. The way to re-generate these spec files is running go generate
in materialize-sql
.
I think it's about being able to test materialize-sql changes independent of a connector, let's keep it for some time and if we find it to be useless after some time we can remove it
return sql.Dialect{ | ||
MigratableTypes: sql.MigrationSpecs{ | ||
"decimal": {sql.NewMigrationSpec([]string{"varchar", "longtext"}, nocast)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think this representation is quite right, but it's fine to leave as-is for now.
I think it is getting at the appropriate FlatTypeMappings
position in a roundabout way. Really all these are saying "if the mapped type of the new projection uses the mapping of sql.STRING
-> Fallback
, then it can be migrated from these existing column types". I think this configuration could probably be represented in the sql.DDLMapper
with a sql.MigrateableFrom
or something like that. But I also think this will be evolving as we add more migrations, so it is a good first attempt at representing this.
var ( | ||
tplAll = sql.MustParseTemplate(databricksDialect, "root", ` | ||
-- Templated creation of a materialized table definition and comments: | ||
-- delta.columnMapping.mode enables column renaming in Databricks. Column renaming was introduced in Databricks Runtime 10.4 LTS which was released in March 2022. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that any previously existing tables won't support column migrations? Just confirming since I don't really know what we'd do about that, other than notice that the Apply
fails in probably a really strange way, and then the tables would need to be manually re-backfilled. Which seems...fine, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@williamhbaker yes, previously existing tables won't support column migrations
There's a potential problem I thought of with post-commit apply materializations (Snowflake, Databricks) that we'll want to make sure is addressed, somehow, before merging this also: https://estuaryworkspace.slack.com/archives/C03Q2NRFKDL/p1728002036017909 |
5535fd7
to
18f70c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description:
ALTER TABLE ... TYPE ...
directly run these commands to change the column types, but for others, namely Databricks, Redshift, Snowflake and BigQuery we need to use a trick of renaming columns to migrate a column. I've made it in a way that makes it resumable.ALTER TABLE ... SET DATA TYPE
as a standard cast or whether it's one of those that only supports type widening (e.g. going from smallint to bigint).Workflow steps:
(How does one use this feature, and how has it changed)
Documentation links affected:
(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)
Notes for reviewers:
(anything that might help someone review this PR)
This change is