06 Dec 21:07

jeremybeard

v0.7.2 Latest

Latest

Close Spark session on pipeline finish

Assets 3

30 Apr 18:19

jeremybeard

v0.7.1

Fixes some concurrency and streaming related bugs.

Assets 3

19 Apr 14:34

jeremybeard

v0.7.0

Notable changes

Added hash, latest, translate, parse JSON, and Spark ML derivers
Added Impala DDL task
Added pluggable API for retrieving schemas
Added event framework to plug in custom code at various pipeline lifecycle events
Added capability to do dynamic configuration loading
Translation errors are now redirected to a new step instead of failing the pipeline
History tracking planners can assign a surrogate key to new records
Loop steps can update multiple parameters per iteration

Breaking changes

HBase output ‘mapping.rowkey’ configuration has been renamed to ‘mapping.rowkey.columns’
Tasks are now specified by a ‘task’ configuration object in the task step, where previously it was at the step level
Schemas are now specified by a commonly defined ‘schema’ configuration object, where previously there were different schema configurations for each Envelope component
Kafka input ‘encoding’ configuration has been removed. Envelope will now automatically determine the required encoding for the translator
Kudu output will now ignore duplicate key errors by default, where previously it would not. This can be overridden with the ‘insert.ignore’ configuration

Assets 3

15 Nov 21:03

jeremybeard

v0.6.1

Fix for Kudu integration on unsecured clusters.

Assets 3

12 Oct 20:08

jeremybeard

v0.6.0

Envelope 0.6.0

Highlights

Envelope now depends entirely on upstream Apache projects
New validation framework to eagerly identify invalid configuration files
Migration to standalone delegation token management to remove the dependency on unstable Spark APIs
New example showing how Cloudera Navigator logs can be ingested into Kudu and Solr
Native Kafka offset management for Kudu inputs, enabled by default
Kafka input can now read from a list of topics
Translator implementation for protobuf
Driver memory can now be set in the application configuration
New in-list deriver to filter a dataset where a column value matches a list of values
New select deriver to include or exclude a list of columns from a dataset
New distinct deriver
Support for sliding windows in streaming pipelines

Breaking Changes

Repartition and coalesce configurations must now be specified at the step level rather than at input or deriver levels
By default, Kafka offset management is turned on and uses Kafka to store the offsets. This means that group.id is now required by default
KafkaInput topic configuration is now topics and requires a list of topics
Application executors is now executor.instances

Assets 3

16 Feb 22:14

jeremybeard

v0.5.0

Rebase on Cloudera Spark 2.2 Release 2
Added task step type
Support for planner timestamps of multiple fields and types
Support for secured Kudu
Avro serialization option for Kafka output
User-provided classes can specify their own alias
Many bug fixes

Assets 2

14 Jul 21:28

jeremybeard

v0.4.0

Some highlights:

Rebase on Spark 2.1, Kudu 1.3, Kafka 0.10
Slowly changing dimensions performance
HBase and ZK outputs
Kafka offset management
Loop and decision steps
CSV, JSON, text for FileSystem input/output
Spark SQL UDFs
Data quality checks

Assets 2