Releases: snowplow/snowplow-rdb-loader
6.1.1
The most important change in this release is removing the dependency on list schemas
endpoint of Iglu repos. This change makes it possible to use static Iglu registries with RDB Loader.
Another notable change in this release is that columns are prefixed with an underscore in Databricks Loader if the column name starts with a number, e.g. 1_my_field
becomes _1_my_field
.
Additionally, we've upgraded some of the dependencies to fix some vulnerabilities and fixed a problem that causes NPEs in the Batch Transformer.
Changelog
- Upgrade dependencies to fix some vulnerabilities (#1364)
- Fix NPEs in TypesAccumulator (#1363)
- Create migration for only max schema key of the same schema model (#1360)
- Transformer Stream: don't use hard-coded window timestamps in the tests (#1360)
- Use lookupSchemasUntil function instead of listSchemasLike function (#1360)
- Upgrade schema-ddl to 0.25.0 (#1362)
6.0.2
What's Changed
- Improve RDB Loader behavior for table without comment
- Bump Iglu Scala Client to 3.1.1
- Loader should not call the "list" Iglu endpoint for Snowflake/Databricks tables
- RDB Redshift Loader: increase atomic field lengths when RDB Loader creates Redshift events table
- RDB Databricks Loader: remove atomic field lengths when RDB Loader creates Databricks table
6.0.0
[Redshift-only] New migration mechanism & recovery tables
Previously, Redshift loaders would migrate the shredded table to the latest available schema version. This could lead to a race condition between transformer & loader.
As of 6.0.0, loader will migrate the shredded table to the latest schema version discovered in the shredding_complete payload (rather than the latest existing version). Also, thanks to the new file hierarchy described below, the loader is able to issue one COPY statement per schema version. This enables the loader to decide on the exact set of columns.
Also, we are introducing a new mechanism to prevent the loader from failing when the schema is not evolved correct. You can find more information about it in here.
[Redshift-only] Monitoring recovery tables
Previous versions have been printing the table name to stdout. As of 6.0.0, in case an event is loaded to a recovery table, the name of that recovery table will be printed instead.
In case webhook is configured, previous recent versions would use load_succeeded/3-0-0
to report information about the successful load.
As of 6.0.0, loader will use load_succeeded/3-0-1
schema which comes with $.recoveryTableNames
key to report the list of names of recovery tables loaded in the batch (breaking schema keys from shredding_complete payload).
[Redshift-only] $.featureFlags.disableMigration
configuration
RDB Loader 6.0.0 introduces a new configuration, $.featureFlags.disableMigration
, a list of schema criterion to disable migration for.
For the provided schema criterions only, RDB Loader will neither migrate the corresponding shredded table nor create recovery tables for breaking schema versions. Loader will attempt to load to the corresponding shredded table without migrating.
This is useful if you have older schemas with breaking changes and don’t want the loader to apply the new logic to them.
New file hierarchy for shredded events
Both batch & stream transformers would write shredded events based on the following scheme so far
vendor/name/model
As of 6.0.0, all transformers will use the following scheme
vendor/name/model/revision/addition
which increases granularity of the output, enabling higher precision in downstream usage.
Removal of padding \N
char
Transformers write events to S3 to be loaded by Redshift. For the loading command to work, all events at a given path (e.g. com.acme/button_click/1
) must follow the same format. A batch, however, may contain events with different versions of a given schema. In particular, events with a newer schema might have new fields not present in the events with an older one.
Previously, transformers solved this problem by formatting all events according to the latest version of the schema and using the \N
character in case of missing fields.
As of 6.0.0, there is no need to do that, because — as explained above — events using different versions of a schema are written to different paths.
New license
Following our recent licensing announcement, RDB Loader
is now released under the Snowplow Limited Use License Agreement
.
Changelog
- Bump AWS SDK to 1.12.677 (#1344)
- Bump commons-compress to 1.26.0 (#1344)
- Bump nimbus-jose-jwt to 9.37.2 (#1344)
- Add mandatory SLULA license acceptance flag (#1344)
- Bump schema-ddl to 0.22.1 (#1342)
- Bump AWS SDK to 2.23.17 (#1339)
- pubsub transformer: increase subscriber's awaitTermiantePeriod (#1328)
- pubsub transformer: Increase default value of minDurationPerAckExtension (#1326)
- Loader: Fix column names for shredded tables (#1332)
- Redshift loader: send statsd metrics for recovery tables (#1331)
- Quote column names in Redshift load statements (#1330)
- Loader: Report recovery table names in load_succeeded payload (#1318)
- Loader: Fix table name in COPY logs (#1316)
- Upgrade schema-ddl to 0.20.0 (#1265)
- Move to Snowplow Limited Use License (#1345)
5.7.5
5.7.4
5.7.3
5.7.1
A patch release to remove unwanted transitive dependencies, improve tests, and fix minor bugs.
Changelog
- Lower sensitivity of cats-effect responsiveness warning (#1309)
- Reduce log level for test suite (#1307)
- Exclude zookeeper transitive dependency from loaders (#1305)
- Batch Transformer: make it possible to skip schemas with all transformations (#1300)
- Bump Snowplow Events Manifest to 0.4.0 (#1303)
- transformer-kafka: add semi-automatic test scenarios using cloud resources (#1302)
5.7.0
Add Azure support
In this commit, we introduce necessary changes and assets to make it possible to run RDB Loader with Azure services. These are the changes:
- Introduce new transformer-kafka asset that will be able to read events from Kafka topic and writes transformed events to Azure Blob Storage
- Make necessary changes on the Loader module to read shredding complete messages from Kafka module. Also, loader needs to interact with blob storage for folder monitoring feature. We've made necessary changes on the Loader module to make it possible to interact with Azure Blob Storage as well.
5.6.3
Starting with this version, Databricks Loader will be able to work with catalog names that contain non-alphanumeric characters like hyphen.
Also, we've bumped a few dependencies for potential security vulnerabilities.
Changelog
5.6.2
Fixes a regression which under rare circumstances caused exceptions like:
Load failed and will not be retried: [Amazon](500310) Invalid operation: cannot alter column "xyz" of relation "com_example_foo_2", target column size should be different; = SqlState: 0A000: [Amazon](500310) Invalid operation: cannot alter column "xyz" of relation "com_example_foo_2", target column size should be different;
Changelog
- Fix pattern matching on known exception for alter table failures (#1283)