Skip to content

Releases: cloudera-labs/envelope

v0.7.2

06 Dec 21:07
Compare
Choose a tag to compare

Close Spark session on pipeline finish

v0.7.1

30 Apr 18:19
Compare
Choose a tag to compare

Fixes some concurrency and streaming related bugs.

v0.7.0

19 Apr 14:34
Compare
Choose a tag to compare

Notable changes

  • Added hash, latest, translate, parse JSON, and Spark ML derivers
  • Added Impala DDL task
  • Added pluggable API for retrieving schemas
  • Added event framework to plug in custom code at various pipeline lifecycle events
  • Added capability to do dynamic configuration loading
  • Translation errors are now redirected to a new step instead of failing the pipeline
  • History tracking planners can assign a surrogate key to new records
  • Loop steps can update multiple parameters per iteration

Breaking changes

  • HBase output ‘mapping.rowkey’ configuration has been renamed to ‘mapping.rowkey.columns’
  • Tasks are now specified by a ‘task’ configuration object in the task step, where previously it was at the step level
  • Schemas are now specified by a commonly defined ‘schema’ configuration object, where previously there were different schema configurations for each Envelope component
  • Kafka input ‘encoding’ configuration has been removed. Envelope will now automatically determine the required encoding for the translator
  • Kudu output will now ignore duplicate key errors by default, where previously it would not. This can be overridden with the ‘insert.ignore’ configuration

v0.6.1

15 Nov 21:03
Compare
Choose a tag to compare

Fix for Kudu integration on unsecured clusters.

v0.6.0

12 Oct 20:08
Compare
Choose a tag to compare

Envelope 0.6.0

Highlights

  • Envelope now depends entirely on upstream Apache projects
  • New validation framework to eagerly identify invalid configuration files
  • Migration to standalone delegation token management to remove the dependency on unstable Spark APIs
  • New example showing how Cloudera Navigator logs can be ingested into Kudu and Solr
  • Native Kafka offset management for Kudu inputs, enabled by default
  • Kafka input can now read from a list of topics
  • Translator implementation for protobuf
  • Driver memory can now be set in the application configuration
  • New in-list deriver to filter a dataset where a column value matches a list of values
  • New select deriver to include or exclude a list of columns from a dataset
  • New distinct deriver
  • Support for sliding windows in streaming pipelines

Breaking Changes

  • Repartition and coalesce configurations must now be specified at the step level rather than at input or deriver levels
  • By default, Kafka offset management is turned on and uses Kafka to store the offsets. This means that group.id is now required by default
  • KafkaInput topic configuration is now topics and requires a list of topics
  • Application executors is now executor.instances

v0.5.0

16 Feb 22:14
Compare
Choose a tag to compare
  • Rebase on Cloudera Spark 2.2 Release 2
  • Added task step type
  • Support for planner timestamps of multiple fields and types
  • Support for secured Kudu
  • Avro serialization option for Kafka output
  • User-provided classes can specify their own alias
  • Many bug fixes

v0.4.0

14 Jul 21:28
Compare
Choose a tag to compare

Some highlights:

  • Rebase on Spark 2.1, Kudu 1.3, Kafka 0.10
  • Slowly changing dimensions performance
  • HBase and ZK outputs
  • Kafka offset management
  • Loop and decision steps
  • CSV, JSON, text for FileSystem input/output
  • Spark SQL UDFs
  • Data quality checks