Releases: cloudera-labs/envelope
Releases · cloudera-labs/envelope
v0.7.2
v0.7.1
Fixes some concurrency and streaming related bugs.
v0.7.0
Notable changes
- Added hash, latest, translate, parse JSON, and Spark ML derivers
- Added Impala DDL task
- Added pluggable API for retrieving schemas
- Added event framework to plug in custom code at various pipeline lifecycle events
- Added capability to do dynamic configuration loading
- Translation errors are now redirected to a new step instead of failing the pipeline
- History tracking planners can assign a surrogate key to new records
- Loop steps can update multiple parameters per iteration
Breaking changes
- HBase output ‘mapping.rowkey’ configuration has been renamed to ‘mapping.rowkey.columns’
- Tasks are now specified by a ‘task’ configuration object in the task step, where previously it was at the step level
- Schemas are now specified by a commonly defined ‘schema’ configuration object, where previously there were different schema configurations for each Envelope component
- Kafka input ‘encoding’ configuration has been removed. Envelope will now automatically determine the required encoding for the translator
- Kudu output will now ignore duplicate key errors by default, where previously it would not. This can be overridden with the ‘insert.ignore’ configuration
v0.6.1
Fix for Kudu integration on unsecured clusters.
v0.6.0
Envelope 0.6.0
Highlights
- Envelope now depends entirely on upstream Apache projects
- New validation framework to eagerly identify invalid configuration files
- Migration to standalone delegation token management to remove the dependency on unstable Spark APIs
- New example showing how Cloudera Navigator logs can be ingested into Kudu and Solr
- Native Kafka offset management for Kudu inputs, enabled by default
- Kafka input can now read from a list of topics
- Translator implementation for protobuf
- Driver memory can now be set in the application configuration
- New in-list deriver to filter a dataset where a column value matches a list of values
- New select deriver to include or exclude a list of columns from a dataset
- New distinct deriver
- Support for sliding windows in streaming pipelines
Breaking Changes
- Repartition and coalesce configurations must now be specified at the step level rather than at input or deriver levels
- By default, Kafka offset management is turned on and uses Kafka to store the offsets. This means that
group.id
is now required by default - KafkaInput
topic
configuration is nowtopics
and requires a list of topics - Application
executors
is nowexecutor.instances
v0.5.0
- Rebase on Cloudera Spark 2.2 Release 2
- Added task step type
- Support for planner timestamps of multiple fields and types
- Support for secured Kudu
- Avro serialization option for Kafka output
- User-provided classes can specify their own alias
- Many bug fixes
v0.4.0
Some highlights:
- Rebase on Spark 2.1, Kudu 1.3, Kafka 0.10
- Slowly changing dimensions performance
- HBase and ZK outputs
- Kafka offset management
- Loop and decision steps
- CSV, JSON, text for FileSystem input/output
- Spark SQL UDFs
- Data quality checks