materialize-sql: migratable columns #1928

mdibaiee · 2024-09-12T15:08:51Z

Description:

The connectors that allow an ALTER TABLE ... TYPE ... directly run these commands to change the column types, but for others, namely Databricks, Redshift, Snowflake and BigQuery we need to use a trick of renaming columns to migrate a column. I've made it in a way that makes it resumable.
I've left Starburst out of this for now... Their docs are not very clear on whether they support ALTER TABLE ... SET DATA TYPE as a standard cast or whether it's one of those that only supports type widening (e.g. going from smallint to bigint).

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)

This change is

williamhbaker

Some initial comments are below. I am in favor of moving the resumption out of InfoSchema, and also moving the renaming logic into materialize-sql, and my comments are kind of related to that.

materialize-boilerplate/apply.go

materialize-sql/dialect.go

materialize-sql/test_support.go

materialize-sql/testdata/generate-spec-proto.sh

materialize-sql/test_support.go

williamhbaker · 2024-09-19T16:21:47Z

materialize-sql/test_support.go

+	return out.String()
+}
+
+func DumpTestTable(t *testing.T, db *stdsql.DB, qualifiedTableName string, ordering string) (string, error) {


Is there a reason we can't use the StdDumpTable and StdGetSchema in place of this? I see that there is an ordering parameter, but from what I can tell there is only ever a single row of data in the table.

materialize-sql/testdata/validate/base.flow.yaml

materialize-databricks/sqlgen.go

materialize-bigquery/bigquery_test.go

materialize-bigquery/client.go

materialize-sql/std_sql.go

materialize-sql/test_support.go

materialize-sql/type_mapping.go

materialize-postgres/sqlgen.go

materialize-bigquery/client.go

materialize-motherduck/client.go

materialize-mysql/client.go

materialize-redshift/client.go

williamhbaker

LGTM % a few final comments

williamhbaker · 2024-10-03T17:06:25Z

materialize-sql/testdata/validate/base.flow.yaml

@@ -0,0 +1,37 @@
+collections:


I can't comment directly on the file but there's a materialize-sql/testdata/generated_specs/ and also materialize-sql/testdata/validate/generated_specs/. We don't need both of those do we? Also I don't see any way to regenerate the proto files if the specs are changed.

TBH I'd be in favor of getting rid of the TestValidateMigrations and supporting code entirely since I don't think it's doing anything that the individual materialization tests aren't also doing?

@williamhbaker these are different generated specs. The way to re-generate these spec files is running go generate in materialize-sql.

I think it's about being able to test materialize-sql changes independent of a connector, let's keep it for some time and if we find it to be useless after some time we can remove it

materialize-mysql/client.go

materialize-sql/migration_mapping.go

materialize-sql/dialect.go

williamhbaker · 2024-10-03T17:44:56Z

materialize-mysql/sqlgen.go

 	return sql.Dialect{
+		MigratableTypes: sql.MigrationSpecs{
+			"decimal":  {sql.NewMigrationSpec([]string{"varchar", "longtext"}, nocast)},


I still don't think this representation is quite right, but it's fine to leave as-is for now.

I think it is getting at the appropriate FlatTypeMappings position in a roundabout way. Really all these are saying "if the mapped type of the new projection uses the mapping of sql.STRING -> Fallback, then it can be migrated from these existing column types". I think this configuration could probably be represented in the sql.DDLMapper with a sql.MigrateableFrom or something like that. But I also think this will be evolving as we add more migrations, so it is a good first attempt at representing this.

williamhbaker · 2024-10-03T17:56:24Z

materialize-databricks/sqlgen.go

 var (
 	tplAll = sql.MustParseTemplate(databricksDialect, "root", `
 -- Templated creation of a materialized table definition and comments:
+-- delta.columnMapping.mode enables column renaming in Databricks. Column renaming was introduced in Databricks Runtime 10.4 LTS which was released in March 2022.


Does this mean that any previously existing tables won't support column migrations? Just confirming since I don't really know what we'd do about that, other than notice that the Apply fails in probably a really strange way, and then the tables would need to be manually re-backfilled. Which seems...fine, I guess.

@williamhbaker yes, previously existing tables won't support column migrations

materialize-sqlserver/sqlgen.go

williamhbaker · 2024-10-04T02:13:32Z

There's a potential problem I thought of with post-commit apply materializations (Snowflake, Databricks) that we'll want to make sure is addressed, somehow, before merging this also: https://estuaryworkspace.slack.com/archives/C03Q2NRFKDL/p1728002036017909

williamhbaker

LGTM

mdibaiee force-pushed the mahdi/column-type-change branch 2 times, most recently from c7e0f48 to 5503dcb Compare September 12, 2024 15:46

mdibaiee added the change:planned This is a planned change label Sep 16, 2024

mdibaiee force-pushed the mahdi/column-type-change branch 9 times, most recently from 2d6619e to 61327f6 Compare September 19, 2024 14:22

mdibaiee requested a review from williamhbaker September 19, 2024 14:36

mdibaiee marked this pull request as ready for review September 19, 2024 14:36

williamhbaker reviewed Sep 19, 2024

View reviewed changes

mdibaiee force-pushed the mahdi/column-type-change branch 7 times, most recently from 7f747bd to f748613 Compare September 25, 2024 12:25

mdibaiee requested a review from williamhbaker September 25, 2024 12:37

mdibaiee force-pushed the mahdi/column-type-change branch 4 times, most recently from 4804927 to bc8954a Compare September 30, 2024 14:35

williamhbaker reviewed Sep 30, 2024

View reviewed changes

mdibaiee changed the title ~~materialize-boilerplate: migratable columns in applier~~ materialize-sql: migratable columns Oct 1, 2024

mdibaiee force-pushed the mahdi/column-type-change branch from dab291a to c427c77 Compare October 2, 2024 17:39

mdibaiee added 19 commits October 3, 2024 15:11

materialize-mysql: implement formatted string migration

07aa4ae

materialize-sqlserver: implement formatted string migration

aa9de93

materialize-redshift: formatted string to string migration

19c6cc5

materialize-sql: update test snapshots

2df6146

materialize-databricks: formatted string to string migration

6b03953

materialize-bigquery: formatted string to string migration

836c832

materialize-snowflake: formatted string to string migration

4a6e73b

materialize-motherduck: formatted string to string migration

9d5611e

materialize-sql: refactor ColumnTypeChangeMigrations

baf1f61

materialize-sql: refactor ValidateMigrations tests

7dfcf88

materialize-sql: detect changes in materialize-sql.UpdateResource

ea1d32f

materialize-sql: fix bug in materialize-sql.Compatible

ec3f06b

materialize-sql: refactor & simplify ColumnChangeMigration

c2d77e9

materialize-sql: more migratable types

31cf45f

materialize-mysql: support string to date-time conversion

69bd828

materialize-sql: update test snapshots

906ac84

materialize-{sql}: use non-transactional Alter Table statements

82393d7

materialize-sql: simplify column migration steps

fba469f

materialize-*: only support migrations when a cast is infallible

ffc904c

mdibaiee force-pushed the mahdi/column-type-change branch from 2b73c48 to 338e9ea Compare October 3, 2024 14:12

materialize-sql: custom CastSQL function and migration mapping spec

47a098b

mdibaiee force-pushed the mahdi/column-type-change branch from 338e9ea to 47a098b Compare October 3, 2024 14:14

williamhbaker approved these changes Oct 3, 2024

View reviewed changes

materialize-sql: small change requests for migrations

18f70c5

mdibaiee force-pushed the mahdi/column-type-change branch from 5535fd7 to 18f70c5 Compare October 4, 2024 11:40

materialize-snowflake: verify deletion of files succeeds

0e6f78a

williamhbaker approved these changes Oct 8, 2024

View reviewed changes

mdibaiee merged commit 91aef85 into main Oct 8, 2024
49 of 52 checks passed

mdibaiee deleted the mahdi/column-type-change branch October 8, 2024 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

materialize-sql: migratable columns #1928

materialize-sql: migratable columns #1928

mdibaiee commented Sep 12, 2024 •

edited

Loading

williamhbaker left a comment

williamhbaker Sep 19, 2024

williamhbaker left a comment

williamhbaker Oct 3, 2024

mdibaiee Oct 4, 2024

williamhbaker Oct 3, 2024

williamhbaker Oct 3, 2024

mdibaiee Oct 4, 2024

williamhbaker commented Oct 4, 2024

williamhbaker left a comment

materialize-sql: migratable columns #1928

materialize-sql: migratable columns #1928

Conversation

mdibaiee commented Sep 12, 2024 • edited Loading

williamhbaker left a comment

Choose a reason for hiding this comment

williamhbaker Sep 19, 2024

Choose a reason for hiding this comment

williamhbaker left a comment

Choose a reason for hiding this comment

williamhbaker Oct 3, 2024

Choose a reason for hiding this comment

mdibaiee Oct 4, 2024

Choose a reason for hiding this comment

williamhbaker Oct 3, 2024

Choose a reason for hiding this comment

williamhbaker Oct 3, 2024

Choose a reason for hiding this comment

mdibaiee Oct 4, 2024

Choose a reason for hiding this comment

williamhbaker commented Oct 4, 2024

williamhbaker left a comment

Choose a reason for hiding this comment

mdibaiee commented Sep 12, 2024 •

edited

Loading