Skip to content

v0.14.1

Latest
Compare
Choose a tag to compare
@david-leifker david-leifker released this 17 Sep 21:48
6a165a8

DataHub v0.14.1 Release Notes

User Experience

  • Enhanced Data Propagation UI: New features allow viewing propagated column documentation, source information, and asset-level propagation details. This improves visibility into data lineage and enables better understanding of data flow across the organization. (#11047)

  • Improved Search Result Tracking: Added page number to search result click events, enabling better measurement of search ranking performance. This helps users understand and optimize their search experience. (#11151)

  • Fixed Display Issues: Resolved issues with displaying "0" values for last ingested data and improved handling of multilingual characters in descriptions. These fixes ensure more accurate and readable information presentation. (#10840, #10975)

Developer Experience

  • Performance Improvements:

    • Implemented lazy dataLoaders for GraphQL queries, significantly reducing latency for local environments. (#11293)
    • Added option to log slow GraphQL queries, helping identify and address performance bottlenecks. (#11308)
    • Introduced session authorization caching for faster access checks. (#11327)
  • Enhanced Search Capabilities:

    • Added support for custom highlighting fields in GraphQL queries, allowing faster and more customizable data retrieval. (#11339)
    • Implemented new search query functionality to filter by parents/children of Domains or Containers. (#11279)
    • Added support for multiple values in 'CONTAIN', 'START_WITH', and 'END_WITH' operators, enabling more flexible and precise searches. (#11068)
  • API Improvements:

    • Extended throttling to API requests, supporting non-browser ingestion/write requests and manual throttling for better control over system load. (#11325)
    • Added support for 'START_WITH' and 'END_WITH' operators in GraphQL API, enhancing string query capabilities. (#11026)
  • Bug Fixes:

    • Resolved issues with forward slash handling in search queries, empty key-value pairs in Elasticsearch mapping, and support for various data types in object fields. These fixes improve search accuracy and data representation. (#10932, #11004, #11066)
    • Addressed Postgres regression by upgrading the ebean library from version 12.x to 15.x, resolving a read lock NPE issue. (#11379)

Metadata Ingestion

  • S3 Integration Enhancements:

    • Enhanced partition support for S3 dataset ingestion, improving metadata representation and enabling advanced partition detection. (#11083)
    • Enhanced S3 ingestion process to support reading specific file types, allowing more granular control over data ingestion. (#11177)
  • BigQuery Improvements:

    • Implemented query log extractor for BigQuery, creating "Query" entities with usage statistics, lineage, and operation details. (#10994)
    • Added support for filtering GCP project ingestion based on project labels, enabling more targeted data collection. (#11169)
    • Implemented query job retries for transient errors, improving system robustness. (#11162)
  • Snowflake Updates:

    • Added support for Iceberg tables in Snowflake access history, enhancing lineage capture capabilities. (#10961)
    • Introduced ability to define clustering key formulas for Snowflake datasets. (#11254)
    • Fixed tag exclusion issues in Snowflake ingestion process. (#11250)
  • New and Updated Connectors:

    • Added ingestion source for SAP Analytics Cloud, expanding DataHub's integration capabilities. (#10958)
    • Enhanced Salesforce connector with customizable API version and improved error messages. (#11145, #11266)
    • Updated Tableau ingestion process with new parameters and improved field type parsing. (#11255, #11202)
  • Other Ingestion Improvements:

    • Added support for MongoDB database ingestion as containers. (#11178)
    • Implemented automatic capturing of Snowflake assets with Pandas I/O Manager in Dagster module. (#11189)
    • Enhanced Fivetran ingestion with destination ID filtering capabilities. (#11277)
    • Added support for browse-only tables in Databricks ingestion. (#10766)

Other Improvements and Fixes

  • Upgraded various dependencies including Kafka, Azure Identity, Acryl-SQLglot, and GraphQL/Spring versions.
  • Improved error handling and logging across multiple components.
  • Enhanced test coverage and reliability.
  • Updated documentation for various features and processes.

Breaking Changes

Notable breaking changes include:

  • Removal of lower method from get_db_name in SQLAlchemySource, affecting URNs of related entities.
  • Changes to default sink mode and aspect handling that require server version 0.14.0+.

See the full details here.

Contributors

We extend our heartfelt thanks to all contributors for their valuable work on this release:

First-Time Contributors

@AaronYang0628, @alexandrebunn, @alisa-aylward-toast, @arpanchakra29, @esselius, @eunseokyang, @ignitz, @milindgupta, @milindgupta9, @Nbagga14, @rohansun, @sakethvarma397, @vignesh-hbk

Repeat Contributors

@deepgarg-visa, @dushayntAW, @feldjay, @filipe-caetano-ovo, @ksrinath, @Masterchen09, @matthew-coudert-cko, @mayurinehate, @nmbryant, @pinakipb2, @prashanthic23, @sagar-salvi-apptware, @siladitya2, @sleeperdeep

DataHub Maintainers

@anshbansal, @asikowitz, @chriscollins3456, @darnaut, @david-leifker, @eboneil, @hsheth2, @jjoyce0510, @maggiehays, @pedro93, @RyanHolstien, @shirshanka, @sid-acryl, @skrydal, @treff7es, @yoonhyejin

Your contributions are invaluable in making DataHub better for everyone. Thank you!

What's Changed

New Contributors

Full Changelog: v0.14.0.2...v0.14.1