Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

materialize-sql: migratable columns #1928

Merged
merged 23 commits into from
Oct 8, 2024
Merged

materialize-sql: migratable columns #1928

merged 23 commits into from
Oct 8, 2024

Conversation

mdibaiee
Copy link
Member

@mdibaiee mdibaiee commented Sep 12, 2024

Description:

  • The connectors that allow an ALTER TABLE ... TYPE ... directly run these commands to change the column types, but for others, namely Databricks, Redshift, Snowflake and BigQuery we need to use a trick of renaming columns to migrate a column. I've made it in a way that makes it resumable.
  • I've left Starburst out of this for now... Their docs are not very clear on whether they support ALTER TABLE ... SET DATA TYPE as a standard cast or whether it's one of those that only supports type widening (e.g. going from smallint to bigint).

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

@mdibaiee mdibaiee force-pushed the mahdi/column-type-change branch 2 times, most recently from c7e0f48 to 5503dcb Compare September 12, 2024 15:46
@mdibaiee mdibaiee added the change:planned This is a planned change label Sep 16, 2024
@mdibaiee mdibaiee force-pushed the mahdi/column-type-change branch 9 times, most recently from 2d6619e to 61327f6 Compare September 19, 2024 14:22
@mdibaiee mdibaiee marked this pull request as ready for review September 19, 2024 14:36
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments are below. I am in favor of moving the resumption out of InfoSchema, and also moving the renaming logic into materialize-sql, and my comments are kind of related to that.

materialize-boilerplate/apply.go Outdated Show resolved Hide resolved
materialize-sql/dialect.go Outdated Show resolved Hide resolved
materialize-sql/test_support.go Outdated Show resolved Hide resolved
materialize-sql/testdata/generate-spec-proto.sh Outdated Show resolved Hide resolved
materialize-sql/test_support.go Outdated Show resolved Hide resolved
return out.String()
}

func DumpTestTable(t *testing.T, db *stdsql.DB, qualifiedTableName string, ordering string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't use the StdDumpTable and StdGetSchema in place of this? I see that there is an ordering parameter, but from what I can tell there is only ever a single row of data in the table.

materialize-databricks/sqlgen.go Show resolved Hide resolved
materialize-bigquery/bigquery_test.go Outdated Show resolved Hide resolved
materialize-bigquery/client.go Outdated Show resolved Hide resolved
@mdibaiee mdibaiee force-pushed the mahdi/column-type-change branch 7 times, most recently from 7f747bd to f748613 Compare September 25, 2024 12:25
@mdibaiee mdibaiee force-pushed the mahdi/column-type-change branch 4 times, most recently from 4804927 to bc8954a Compare September 30, 2024 14:35
materialize-sql/std_sql.go Show resolved Hide resolved
materialize-sql/test_support.go Show resolved Hide resolved
materialize-sql/type_mapping.go Outdated Show resolved Hide resolved
materialize-postgres/sqlgen.go Outdated Show resolved Hide resolved
materialize-bigquery/client.go Outdated Show resolved Hide resolved
materialize-bigquery/client.go Outdated Show resolved Hide resolved
materialize-motherduck/client.go Outdated Show resolved Hide resolved
materialize-mysql/client.go Outdated Show resolved Hide resolved
materialize-redshift/client.go Outdated Show resolved Hide resolved
@mdibaiee mdibaiee changed the title materialize-boilerplate: migratable columns in applier materialize-sql: migratable columns Oct 1, 2024
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % a few final comments

@@ -0,0 +1,37 @@
collections:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment directly on the file but there's a materialize-sql/testdata/generated_specs/ and also materialize-sql/testdata/validate/generated_specs/. We don't need both of those do we? Also I don't see any way to regenerate the proto files if the specs are changed.

TBH I'd be in favor of getting rid of the TestValidateMigrations and supporting code entirely since I don't think it's doing anything that the individual materialization tests aren't also doing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamhbaker these are different generated specs. The way to re-generate these spec files is running go generate in materialize-sql.

I think it's about being able to test materialize-sql changes independent of a connector, let's keep it for some time and if we find it to be useless after some time we can remove it

materialize-mysql/client.go Outdated Show resolved Hide resolved
materialize-sql/migration_mapping.go Outdated Show resolved Hide resolved
materialize-sql/dialect.go Outdated Show resolved Hide resolved
return sql.Dialect{
MigratableTypes: sql.MigrationSpecs{
"decimal": {sql.NewMigrationSpec([]string{"varchar", "longtext"}, nocast)},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think this representation is quite right, but it's fine to leave as-is for now.

I think it is getting at the appropriate FlatTypeMappings position in a roundabout way. Really all these are saying "if the mapped type of the new projection uses the mapping of sql.STRING -> Fallback, then it can be migrated from these existing column types". I think this configuration could probably be represented in the sql.DDLMapper with a sql.MigrateableFrom or something like that. But I also think this will be evolving as we add more migrations, so it is a good first attempt at representing this.

var (
tplAll = sql.MustParseTemplate(databricksDialect, "root", `
-- Templated creation of a materialized table definition and comments:
-- delta.columnMapping.mode enables column renaming in Databricks. Column renaming was introduced in Databricks Runtime 10.4 LTS which was released in March 2022.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that any previously existing tables won't support column migrations? Just confirming since I don't really know what we'd do about that, other than notice that the Apply fails in probably a really strange way, and then the tables would need to be manually re-backfilled. Which seems...fine, I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamhbaker yes, previously existing tables won't support column migrations

materialize-sqlserver/sqlgen.go Outdated Show resolved Hide resolved
materialize-sqlserver/sqlgen.go Outdated Show resolved Hide resolved
materialize-sqlserver/sqlgen.go Outdated Show resolved Hide resolved
@williamhbaker
Copy link
Member

There's a potential problem I thought of with post-commit apply materializations (Snowflake, Databricks) that we'll want to make sure is addressed, somehow, before merging this also: https://estuaryworkspace.slack.com/archives/C03Q2NRFKDL/p1728002036017909

Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mdibaiee mdibaiee merged commit 91aef85 into main Oct 8, 2024
49 of 52 checks passed
@mdibaiee mdibaiee deleted the mahdi/column-type-change branch October 8, 2024 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:planned This is a planned change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants