Releases: adidas/lakehouse-engine
Releases · adidas/lakehouse-engine
1.23.0
1.22.1
1.22.0
- A new Data Quality Type prisma was lunched as part of the DQ offering
- The main Goal of this new DQ Type is to offer better observability with an enhanced output data model and with different ways to interact with it (by directly providing DQ Functions or by deriving those DQ Functions automatically from an auxiliary table)
- Note: proper documentation with examples will come on newer versions, but users can check our local tests for any doubts on using the feature for now
- A Rest API Writer was introduced and you can check the documentation here.
- A Custom SQL Transformer was introduced and the respective documentation is available here.
- 2 Great Expectations Custom Expectations were added:
- expect_column_pair_a_to_be_not_equal_to_b
- expect_column_pair_date_a_to_be_greater_than_or_equal_to_date_b
- Upgrade several libraries on the lock files due to the upgrade of:
- pip-tools to 7.4.1
- pip-audit to 2.7.3
- pdoc to 14.5.1
- twine to 5.1.1
- and addition of the types-requests library
1.21.0
Possible Breaking Change on the Lakehouse Engine Installation
- A new Deployment / Installation strategy was applied to make Lakehouse Engine lighter to install and manage by breaking it into plugins (DQ, Azure, OS, SFTP).
- Now if people install the Lakehouse Engine without specifying optional dependencies, they will simply install the core package, which will be way faster and bring way less dependencies, but can break their code if they were using features coming from optional dependencies.
- Ex: if you were doing
pip install lakehouse_engine
and you were using Data Quality features, now you should change the installation command topip install lakehouse_engine[dq]
- Ex: if you were doing
- More details on the installation can be found in the ReadMe file
1.20.1
1.20.0
-
Introduction of the Gold Assets Builder (GAB) algorithm - an accelerator for the creation of Gold Tables/Materialisations on top of fact tables with different:
- aggregations,
- dimensions,
- metrics,
- cadences, and,
- reconciliation windows
-
Fix DQ custom expectation validation
1.19.0
- Ensure the Lakehouse Engine is ready for working in an environment with Databricks Unity Catalog enabled and DBR13.3 https://docs.databricks.com/en/compute/access-mode-limitations.html#shared-access-mode-limitations-on-unity-catalog
- Upgrade Great Expectations library from 0.17.11 to 0.18.8
- Added File Manager support for DBFS (databricks file system utils adding support to direct interact with Databricks Volumes, for example)
- Apply code changes related with breaking changes imposed by Databricks Unity and/or DBR13.3 (e.g. remove RDDs and spark context usages)
- Improve Documentation of the Lakehouse Engine by adding several examples usages and context around those
- More than 30 documentation pages were added and you can find it here:
- Upgrade jinja2 library from 3.0.3 to 3.1.3
- Add support for an advanced parser and more flexibility for using different delimiters for splitting and processing SQL commands
1.18.0
- Added a feature to collect Lakehouse Engine Usage Statistics
- Improve Lakehouse Engine Tests performance by increase spark driver memory
- Added audit-dep-safety to assess safety of library dependencies
- Upgrade paramiko library from 2.12.0 to 3.4.0
- Upgrade transitive dependencies
1.17.0
- Upgrade pyspark from 3.3.2 to 3.4.1
- Upgrade delta-spark from 2.2.0 to 2.4.0
- Upgrade ydata-profiling from 4.5.1 to 4.6.0 (and update all transitive dependencies accordingly)
- Fix Hash Masker trasformer which was wrongly dropping the original columns
- Fix list/delete S3 objects limitation of 1000 objects, by implementing pagination