Skip to content

Releases: adidas/lakehouse-engine

1.23.0

30 Oct 09:34
Compare
Choose a tag to compare
  • Upgrade pyspark from 3.4.1 to 3.5.0
  • Upgrade delta-spark from 2.4.0 to 3.2.0
  • Upgrade python to 3.11
  • Change os handling Spark Session on global variable, due to breaking changes on forEachBatch behavior on new Spark Versions

1.22.1

06 Sep 08:06
Compare
Choose a tag to compare
  • [DOCS] Introduction of GAB (Gold Asset Builder) Documentation
  • [DOCS] Introduction of PRISMA Data Quality Documentation
  • Improve deployment strategy to be more accurate & faster

1.22.0

08 Aug 18:44
Compare
Choose a tag to compare
  • A new Data Quality Type prisma was lunched as part of the DQ offering
    • The main Goal of this new DQ Type is to offer better observability with an enhanced output data model and with different ways to interact with it (by directly providing DQ Functions or by deriving those DQ Functions automatically from an auxiliary table)
    • Note: proper documentation with examples will come on newer versions, but users can check our local tests for any doubts on using the feature for now
  • A Rest API Writer was introduced and you can check the documentation here.
  • A Custom SQL Transformer was introduced and the respective documentation is available here.
  • 2 Great Expectations Custom Expectations were added:
    • expect_column_pair_a_to_be_not_equal_to_b
    • expect_column_pair_date_a_to_be_greater_than_or_equal_to_date_b
  • Upgrade several libraries on the lock files due to the upgrade of:
    • pip-tools to 7.4.1
    • pip-audit to 2.7.3
    • pdoc to 14.5.1
    • twine to 5.1.1
    • and addition of the types-requests library

1.21.0

17 Jun 17:55
Compare
Choose a tag to compare

Possible Breaking Change on the Lakehouse Engine Installation

  • A new Deployment / Installation strategy was applied to make Lakehouse Engine lighter to install and manage by breaking it into plugins (DQ, Azure, OS, SFTP).
  • Now if people install the Lakehouse Engine without specifying optional dependencies, they will simply install the core package, which will be way faster and bring way less dependencies, but can break their code if they were using features coming from optional dependencies.
    • Ex: if you were doing pip install lakehouse_engine and you were using Data Quality features, now you should change the installation command to pip install lakehouse_engine[dq]
  • More details on the installation can be found in the ReadMe file

1.20.1

17 Jun 13:42
Compare
Choose a tag to compare
  • Implement alternative for the toJson usage on the GAB algorithm as it makes usage of RDDs which is not whitelisted on Databricks Unity Shared Clusters

1.20.0

21 May 12:42
Compare
Choose a tag to compare
  • Introduction of the Gold Assets Builder (GAB) algorithm - an accelerator for the creation of Gold Tables/Materialisations on top of fact tables with different:

    1. aggregations,
    2. dimensions,
    3. metrics,
    4. cadences, and,
    5. reconciliation windows
  • Fix DQ custom expectation validation

1.19.0

14 Mar 15:15
Compare
Choose a tag to compare

1.18.0

20 Jan 19:09
Compare
Choose a tag to compare
  • Added a feature to collect Lakehouse Engine Usage Statistics
  • Improve Lakehouse Engine Tests performance by increase spark driver memory
  • Added audit-dep-safety to assess safety of library dependencies
  • Upgrade paramiko library from 2.12.0 to 3.4.0
  • Upgrade transitive dependencies

1.17.0

18 Oct 20:07
Compare
Choose a tag to compare
  • Upgrade pyspark from 3.3.2 to 3.4.1
  • Upgrade delta-spark from 2.2.0 to 2.4.0
  • Upgrade ydata-profiling from 4.5.1 to 4.6.0 (and update all transitive dependencies accordingly)
  • Fix Hash Masker trasformer which was wrongly dropping the original columns
  • Fix list/delete S3 objects limitation of 1000 objects, by implementing pagination

1.16.1

17 Oct 12:44
Compare
Choose a tag to compare
  • Allow both batch and streaming sensors for Delta data formats (only streaming was allowed previously)
  • Apply a fix to the expect_column_values_to_be_date_not_older_than which was not dealing properly with Timestamps