Releases · snowplow/snowplow-rdb-loader

09 Oct 11:57

spenes

6.1.1

0b447df

6.1.1 Latest

Latest

The most important change in this release is removing the dependency on list schemas endpoint of Iglu repos. This change makes it possible to use static Iglu registries with RDB Loader.

Another notable change in this release is that columns are prefixed with an underscore in Databricks Loader if the column name starts with a number, e.g. 1_my_field becomes _1_my_field.

Additionally, we've upgraded some of the dependencies to fix some vulnerabilities and fixed a problem that causes NPEs in the Batch Transformer.

Changelog

Upgrade dependencies to fix some vulnerabilities (#1364)
Fix NPEs in TypesAccumulator (#1363)
Create migration for only max schema key of the same schema model (#1360)
Transformer Stream: don't use hard-coded window timestamps in the tests (#1360)
Use lookupSchemasUntil function instead of listSchemasLike function (#1360)
Upgrade schema-ddl to 0.25.0 (#1362)

Assets 2

10 Jul 11:43

github-actions

6.0.2

618565a

6.0.2

What's Changed

Improve RDB Loader behavior for table without comment
Bump Iglu Scala Client to 3.1.1
Loader should not call the "list" Iglu endpoint for Snowflake/Databricks tables
RDB Redshift Loader: increase atomic field lengths when RDB Loader creates Redshift events table
RDB Databricks Loader: remove atomic field lengths when RDB Loader creates Databricks table

Assets 8

19 Mar 09:28

github-actions

6.0.0

76baa42

6.0.0

[Redshift-only] New migration mechanism & recovery tables

Previously, Redshift loaders would migrate the shredded table to the latest available schema version. This could lead to a race condition between transformer & loader.

As of 6.0.0, loader will migrate the shredded table to the latest schema version discovered in the shredding_complete payload (rather than the latest existing version). Also, thanks to the new file hierarchy described below, the loader is able to issue one COPY statement per schema version. This enables the loader to decide on the exact set of columns.

Also, we are introducing a new mechanism to prevent the loader from failing when the schema is not evolved correct. You can find more information about it in here.

[Redshift-only] Monitoring recovery tables

Previous versions have been printing the table name to stdout. As of 6.0.0, in case an event is loaded to a recovery table, the name of that recovery table will be printed instead.

In case webhook is configured, previous recent versions would use load_succeeded/3-0-0 to report information about the successful load.

As of 6.0.0, loader will use load_succeeded/3-0-1 schema which comes with $.recoveryTableNames key to report the list of names of recovery tables loaded in the batch (breaking schema keys from shredding_complete payload).

[Redshift-only] `$.featureFlags.disableMigration` configuration

RDB Loader 6.0.0 introduces a new configuration, $.featureFlags.disableMigration, a list of schema criterion to disable migration for.

For the provided schema criterions only, RDB Loader will neither migrate the corresponding shredded table nor create recovery tables for breaking schema versions. Loader will attempt to load to the corresponding shredded table without migrating.

This is useful if you have older schemas with breaking changes and don’t want the loader to apply the new logic to them.

New file hierarchy for shredded events

Both batch & stream transformers would write shredded events based on the following scheme so far

vendor/name/model

As of 6.0.0, all transformers will use the following scheme

vendor/name/model/revision/addition

which increases granularity of the output, enabling higher precision in downstream usage.

Removal of padding `\N` char

Transformers write events to S3 to be loaded by Redshift. For the loading command to work, all events at a given path (e.g. com.acme/button_click/1) must follow the same format. A batch, however, may contain events with different versions of a given schema. In particular, events with a newer schema might have new fields not present in the events with an older one.

Previously, transformers solved this problem by formatting all events according to the latest version of the schema and using the \N character in case of missing fields.

As of 6.0.0, there is no need to do that, because — as explained above — events using different versions of a schema are written to different paths.

New license

Following our recent licensing announcement, RDB Loader is now released under the Snowplow Limited Use License Agreement.

Changelog

Bump AWS SDK to 1.12.677 (#1344)
Bump commons-compress to 1.26.0 (#1344)
Bump nimbus-jose-jwt to 9.37.2 (#1344)
Add mandatory SLULA license acceptance flag (#1344)
Bump schema-ddl to 0.22.1 (#1342)
Bump AWS SDK to 2.23.17 (#1339)
pubsub transformer: increase subscriber's awaitTermiantePeriod (#1328)
pubsub transformer: Increase default value of minDurationPerAckExtension (#1326)
Loader: Fix column names for shredded tables (#1332)
Redshift loader: send statsd metrics for recovery tables (#1331)
Quote column names in Redshift load statements (#1330)
Loader: Report recovery table names in load_succeeded payload (#1318)
Loader: Fix table name in COPY logs (#1316)
Upgrade schema-ddl to 0.20.0 (#1265)
Move to Snowplow Limited Use License (#1345)

Assets 8

14 Mar 11:33

spenes

5.7.5

1a18eac

5.7.5

This is a patch release that bumps dependencies for potential security vulnerabilities.

Changelog

Bump zookeeper to 3.7.2 (#1325)
Bump aws sdk to 2.21.33 (#1325)
Bump jetty-http to 9.4.53.v20231009 (#1325)
Bump reactor-netty-http to 1.0.39 (#1325)
Use databricks JDBC 2.6.34 (#1325)

Assets 2

10 Oct 22:40

github-actions

5.7.4

05dd5ab

5.7.4

This is a patch release that bumps dependencies for potential security vulnerabilities.

Changelog

Bump snappy-java to 1.1.10.4 (#1313)

Assets 8

07 Sep 13:31

github-actions

5.7.3

eb4878c

5.7.3

This is a patch release that bumps dependencies for potential security vulnerabilities.

Changelog

Bump jackson-mapper-asl to 1.9.14-atlassian-6 (#1312)
Loader: exclude unnecessary hadoop dependencies (#1312)
Bump snappy-java to 1.1.10.3 (#1312)
Bump jettison to 1.5.4 (#1312)

Assets 8

08 Aug 13:47

github-actions

5.7.1

af0997d

5.7.1

A patch release to remove unwanted transitive dependencies, improve tests, and fix minor bugs.

Changelog

Lower sensitivity of cats-effect responsiveness warning (#1309)
Reduce log level for test suite (#1307)
Exclude zookeeper transitive dependency from loaders (#1305)
Batch Transformer: make it possible to skip schemas with all transformations (#1300)
Bump Snowplow Events Manifest to 0.4.0 (#1303)
transformer-kafka: add semi-automatic test scenarios using cloud resources (#1302)

Assets 8

04 Aug 16:28

github-actions

5.7.0

741f7f1

5.7.0

Add Azure support

In this commit, we introduce necessary changes and assets to make it possible to run RDB Loader with Azure services. These are the changes:

Introduce new transformer-kafka asset that will be able to read events from Kafka topic and writes transformed events to Azure Blob Storage
Make necessary changes on the Loader module to read shredding complete messages from Kafka module. Also, loader needs to interact with blob storage for folder monitoring feature. We've made necessary changes on the Loader module to make it possible to interact with Azure Blob Storage as well.

Assets 8

12 Jul 09:14

github-actions

5.6.3

07a2923

5.6.3

Starting with this version, Databricks Loader will be able to work with catalog names that contain non-alphanumeric characters like hyphen.

Also, we've bumped a few dependencies for potential security vulnerabilities.

Changelog

Databricks Loader: allow any character in catalog name (#1288)
Bump nimbus-jose-jwt to 9.31 (#1291)
Bump snappy-java to 1.1.10.1 (#1291)
Bump json-smart to 2.4.9 (#1291)

Assets 7

10 Jul 10:05

github-actions

5.6.2

2ddf367

5.6.2

Fixes a regression which under rare circumstances caused exceptions like:

Load failed and will not be retried: [Amazon](500310) Invalid operation: cannot alter column "xyz" of relation "com_example_foo_2", target column size should be different; = SqlState: 0A000: [Amazon](500310) Invalid operation: cannot alter column "xyz" of relation "com_example_foo_2", target column size should be different;

Changelog

Fix pattern matching on known exception for alter table failures (#1283)

Assets 7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

What's Changed

[Redshift-only] New migration mechanism & recovery tables

[Redshift-only] Monitoring recovery tables

[Redshift-only] `$.featureFlags.disableMigration` configuration

New file hierarchy for shredded events

Removal of padding `\N` char

New license

Changelog

Changelog

Changelog

Changelog

Changelog

Changelog

Changelog

Releases: snowplow/snowplow-rdb-loader

6.1.1

Changelog

6.0.2

What's Changed

6.0.0

[Redshift-only] New migration mechanism & recovery tables

[Redshift-only] Monitoring recovery tables

[Redshift-only] $.featureFlags.disableMigration configuration

New file hierarchy for shredded events

Removal of padding \N char

New license

Changelog

5.7.5

Changelog

5.7.4

Changelog

5.7.3

Changelog

5.7.1

Changelog

5.7.0

5.6.3

Changelog

5.6.2

Changelog

[Redshift-only] `$.featureFlags.disableMigration` configuration

Removal of padding `\N` char