DataHub v0.8.44
Release Highlights
Known Issues
Standalone Kafka Consumers
We have identified that using standalone Kafka consumers (MCP/MCL messages) has been a broken feature since v0.8.44. Root cause is some spring bean dependencies that were not correctly excluded.
This has gone undetected in our testing infrastructure because our tests do not run with standalone consumers since then until recently.
The underlying issue has been fixed by #5827 and we are now running all our smoke tests with standalone consumers, since #5856 to prevent this from happening in the future. The fix will be released in v0.8.46.
[Helm] DataHub Actions Container
We recently rolled out support for running ingestion in debug mode. This requires a bump in the datahub-actions
container to either HEAD (latest) or v0.0.7
. The correct version is set correctly as the default in v0.2.103.
User Experience
- Improvements to UI-based ingestion: view live logs during execution, view ingestion summary (ie. number of entities ingested), and rollback functionality. Also surfaces CLI-run ingestion jobs.
- New look on Homepage: Domains have been promoted to the top of the fold, so they are listed above Entity cards and Platform cards
- Improvements to searching for Looker resources - when searching for a measure or dimension, we will now surface Looks & Dashboards that reference those fields
- The DataHub Docs Site has a new look! We are reorganizing content to make it easier and more intuitive for DataHub Developers and End-Users alike to navigate our resources.
- Improved Error Handling on the UI - a much nicer messaging when exceptions are caught by the frontend application.
- Misc minor bug fixes and improvements
Developer Experience
- Eternal personal access tokens are now supported
- Deprecated support for Python 3.6 (we expect this to have little-to-no impact on the Community based on pip download data)
Metadata Ingestion
- Improved documentation for Domains transformer
- Stateful Ingestion now supported for Glue
data-lake
Source has been deprecated in favor ofs3
source- Chart Entity now supports chartUsageStatistics
- dbt ingestion supports auto-extracting owner from the
meta
block - Improved Snowflake Connector is now available; we expect this to provide a reduction in ingestion run-time and lower levels of complexity
What's Changed
- chore(ingest): remove orderedset dependency by @hsheth2 in #5591
- refactor(ingest): simplify upgrade version stats by @hsheth2 in #5588
- feat(metadata-service-auth): add support for eternal personal access tokens by @ksrinath in #5433
- fix(ci): paths for github workflows by @anshbansal in #5595
- fix(ingest): Fix ingest Clickhouse without password by @liyuhui666 in #5511
- fix(ci): cleanup sleeps to instead use retries by @anshbansal in #5597
- Kafka form Addition and resolved confilict by @Ankit-Keshari-Vituity in #5598
- fix(ingest): Fix minor logging bug in the glue source. by @rslanka in #5605
- fix(ci): use different image for smoke base image by @anshbansal in #5607
- fix(ci): cancel docker-unified workflow only on PRs on new commits by @anshbansal in #5608
- fix(ci): add env variable for creds smoke test by @anshbansal in #5609
- fix(ui) Followups to recent changes to UI ingestion forms by @chriscollins3456 in #5602
- docs(transformers): Add domain transformer documentation in transformers readme by @mohdsiddique in #5606
- feat(model): adding status aspect to assertions by @shirshanka in #5612
- fix(ingest): use default telemetry ID when config is unwritable by @hsheth2 in #5614
- chore(ingest): drop python 3.6 support by @hsheth2 in #5521
- fix(ui): Split based on Data Platform delimiter in Lineage viz by @jjoyce0510 in #5613
- feat(search): Sticky search filters + misc bug fixes & improvements by @jjoyce0510 in #5601
- fix(graphql): handle null source values in ml features & primary keys by @gabe-lyons in #5626
- fix(graph service): only query for entities that should have lineage [Breaking Change] by @gabe-lyons in #5539
- feat(model): Add optional message field to auditstamp by @gabe-lyons in #5611
- fix(ingest): fix indenting issue in azure ad connector by @aditya-radhakrishnan in #5627
- feat(tokens) Create and display non-expiring tokens on the frontend by @chriscollins3456 in #5630
- Schema tab: Fixed the header issue by @Ankit-Keshari-Vituity in #5622
- build(docs-website): only show release notes for recent releases by @hsheth2 in #5621
- docs(README): update links and reorg content by @maggiehays in #5618
- perf(operations): performance improvement to operations tab via reduced fetching by @gabe-lyons in #5632
- feat(ui) Retrieve last ingested timestamp and display on frontend by @chriscollins3456 in #5600
- Update README.md and maintaining consistency by @hemanthkotaprolu in #5623
- fix(ingest): fix delta-lake dict iteration bug by @hsheth2 in #5625
- fix(ingest): okta - make async loop init more robust by @shirshanka in #5640
- fix(ingest): cli - handle exception in upgrade check by @shirshanka in #5641
- build(docs-website): make codegen script idempotent by @hsheth2 in #5620
- docs(airflow): fix formatting by @hsheth2 in #5617
- fix(ui): Fixing minor search redirect filtering issue introduced by sticky filters by @jjoyce0510 in #5643
- fix(ingestion): Update developer docs by @szalai1 in #5644
- feat(ui): Adding slack handle to corp group info by @jjoyce0510 in #5645
- fix(delta-table): allow env, credential file based s3 auth by @MugdhaHardikar-GSLab in #5636
- feat(GraphQL API): Add "browsePaths" field to browsable entity types by @jjoyce0510 in #5646
- feat(ingest): generate a list of aspects in codegen by @hsheth2 in #5633
- feat(ingestion): Glue stateful ingestion by @amanda-her in #5553
- feat(ingest): add snowflake-beta source by @mayurinehate in #5517
- fix(ingest): remove alphabet field from allow/deny config by @hsheth2 in #5629
- feat(mssql): add multi database ingest support by @MugdhaHardikar-GSLab in #5516
- chore(ingest): drop data-lake source in favor of s3 source by @hsheth2 in #5628
- fix(ingest): use mongodb ping command to test connection by @hsheth2 in #5650
- fix(ingest): remove
profile_sql_table
event by @hsheth2 in #5616 - fix(ci): use graphql instead of restli by @anshbansal in #5610
- feat(ingest): rest_emitter - Adding option to disable ssl by @szalai1 in #5642
- feat(ingest): GE Profile/Action Trino support by @aezomz in #5361
- Stats Tab: Table and column stats hide when there is no data by @Ankit-Keshari-Vituity in #5651
- fix(ingest): redash - fix redash dashboard url bug by @de-kwanyoung-son in #5500
- Glossary: Worked on the refetching data issue by @Ankit-Keshari-Vituity in #5638
- feat(ingestion) Fetch live logs on an ingestion run from UI by @chriscollins3456 in #5653
- fix(spark-lineage): Create application setup on sqlevent start by @MugdhaHardikar-GSLab in #5657
- fix(ui) Remove constraint for searching with less than 3 characters by @chriscollins3456 in #5654
- docs: adds ABLY as DataHub adopter by @de-kwanyoung-son in #5656
- fix(siblings): set sleep after checking if the restore step should run by @gabe-lyons in #5660
- fix(users): add origin aspect to corpuser by @aditya-radhakrishnan in #5662
- feat(domains): highlighting domain recommendation cards on homepage by @gabe-lyons in #5655
- feat(ingestion) Followups to live ingestion logs in UI by @chriscollins3456 in #5676
- feat(test): add option to send to slack thread by @anshbansal in #5673
- chore(ingest): set min stackprinter version by @hsheth2 in #5666
- docs(airflow): fix note formatting by @hsheth2 in #5679
- docs: fixes typos in Business Glossary docs by @topleft in #5615
- fix(docs) Fix link from Business Glossary ingestion page by @chriscollins3456 in #5680
- Worked on the Hive ingestion form by @Ankit-Keshari-Vituity in #5661
- feat(ingestion): Support for displaying history of CLI ingestion runs in the "Manage Ingestion" UI by @rslanka in #5639
- Search Page: Pagination Issue by @Ankit-Keshari-Vituity in #5685
- feat(ingestion-ui) Display CLI-based ingestion sources in UI by @chriscollins3456 in #5681
- fix(schema-history): make latestVersion field on result optional by @aditya-radhakrishnan in #5689
- feat(ingest): file - add support for folders, large files, improve co… by @shirshanka in #5692
- feat(ingest): rest-sink - stability improvements to handle large inpu… by @shirshanka in #5693
- Add UP_FOR_RETRY DPI run result by @divyamanohar-stripe in #5664
- feat(ingest): add support for a event failure log + reporting cancelled runs on cli by @shirshanka in #5694
- fix(doc): Fixing boolean type in datahub rest emitter's json schema by @treff7es in #5695
- fix(ui) Refresh executions on Ingestion page when they are visible by @chriscollins3456 in #5698
- fix(ingest): emit status aspect for entities ingested from okta and azure_ad by @aditya-radhakrishnan in #5700
- feat(kafka-setup): Adds SASL SSL support in kafka setup docker image by @pedro93 in #5697
- fix(ingest): refactor sync-async config, thread-safety for sink repor… by @shirshanka in #5705
- feat(ingest): add
enable_owner_extraction
option to dbt by @hsheth2 in #5707 - feat(ingestion): add github_info config for dbt by @remisalmon in #5648
- docs(ingest): add info about datahub auth tokens with airflow by @hsheth2 in #5703
- fix(airflow): Stable tag order in DataFlow/DataJobs by @treff7es in #5696
- fix(ingest): add pymongo srv extra by @hsheth2 in #5701
- fix(ui): Long overdue - Fix red error screens during OIDC login, logout exception scenarios by @jjoyce0510 in #5708
- feat(ingest): better reporting for file source, friendlier stats names by @shirshanka in #5710
- Worked on postgres ingestion form integration by @Ankit-Keshari-Vituity in #5671
- feat(ingest): Add mode option to presto-on-hive source by @szalai1 in #5659
- Worked on the alignment of all data in domain list by @Ankit-Keshari-Vituity in #5713
- feat(retention) Enable retention and set max versions for executionRequests by @chriscollins3456 in #5704
- fix(ingestion): Fix nifi integration tests. by @rslanka in #5718
- build(deps): bump nbconvert from 6.5.0 to 6.5.1 in /docker/datahub-ingestion by @dependabot in #5716
- feat(ingest): remove nulls during serialization by @shirshanka in #5719
- feat(looker): index looker charts and dashboards by business term by @gabe-lyons in #5649
- fix(GMS): No such classes directory file:///etc/datahub/plugins/auth/r… by @mohdsiddique in #5720
- fix(ingestion): ingest tables from dba_tables in oracle source by @mohdsiddique in #5592
- fix(ingest): redshift-usage: check full table/schema names with AllowDenyPattern by @hsheth2 in #5702
- Worked on the scroll to top of the page after pagination change by @Ankit-Keshari-Vituity in #5714
- feat(ingest): round time to 2 decimal places by @anshbansal in #5721
- fix(superset): do not crash when display_uri is not set by @daha in #5711
- fix(deps): remove tdigest dependency and associated code by @shirshanka in #5729
- fix(ingest): bigquery - Not setting ge config schema when profiling with temp table by @treff7es in #5737
- feat(ingest): file - allow filter by aspect and get stats by @anshbansal in #5738
- fix(ingest): looker - soft-deleted charts should re-emerge on re-disc… by @shirshanka in #5732
- feat(elasticsearch): Add nested type display by @liyuhui666 in #5524
- fix(docs): fixes issue with auto-generated ingestion doc by @shirshanka in #5733
- feat(mysql): support multiple database in single recipe by @MugdhaHardikar-GSLab in #5684
- fix(ingest): tweak mongodb schema inference to fix test by @hsheth2 in #5744
- fix(bootstrap): Remove malformed test in bootstrap.json by @jjoyce0510 in #5747
- docs(site redesign): Overhaul Docs Site by @maggiehays in #5731
- fix(ingestion): Fix SQL Lineage Parser to handle special tokens with a hyphen in table and column names. by @rslanka in #5748
- Snowflake beta improvements by @mayurinehate in #5736
- chore(ingest): update mixpanel api endpoint by @hsheth2 in #5750
- feat(model): add chartUsageStatistics to the chart entity by @shirshanka in #5753
- fix(ui): Improve Error Messaging on the UI by @jjoyce0510 in #5752
- chore(ingest): add vulture config and remove some dead code by @hsheth2 in #5745
- fix(doc): presto-on-hive - Removing new lines from docs to fix doc generation by @treff7es in #5755
- feat(restore-indices): add multithreading and add aspectName, urn filter by @anshbansal in #5712
- fix(GMS): fix no such classes directory file:///etc/datahub/plugins/auth/resources by @mohdsiddique in #5743
- feat(ingestion) Add ability to rollback ingestion from UI - BE PR by @chriscollins3456 in #5739
- feat(ingestion-ui) Add ability to set debug_mode on UI ingestion sources by @chriscollins3456 in #5762
- fix(search): validate entities exist before returning search results in EntityClient by @aditya-radhakrishnan in #5751
- feat(ingestion-ui) Add ability to rollback ingestion runs from the UI - FE only by @chriscollins3456 in #5740
- fix(ingest): proper null skip logic in serialization by @hsheth2 in #5749
- fix(ingest): snowflake-beta fix missing initialization of variable by @mayurinehate in #5757
- fix(ingest): add databricks dep for hive by @hsheth2 in #5764
- feat(ingest): add config to extractor interface by @hsheth2 in #5761
- chore: update server-side telemetry endpoint by @hsheth2 in #5759
- feat(ingestion): bigquery - Bigquery beta connector - first cut by @treff7es in #5663
- feat(ingestion): looker chart usage statistics by @mohdsiddique in #5652
- feat(restore-indices): add urn like filter by @anshbansal in #5770
- feat(restore-indices): add timing info by @anshbansal in #5773
- feat(simplified homepage): adding option to show limited entity types on homepage by @gabe-lyons in #5678
- fix(ingest): add pydantic version upper bound by @hsheth2 in #5775
- Worked on the Secret Fields in ingestion form by @Ankit-Keshari-Vituity in #5727
- feat(cli): add spinner to indicate progress by @shirshanka in #5769
- feat(roles): add roles feature to DataHub by @aditya-radhakrishnan in #5767
- feat(model): add storage size to dataset profiles by @shirshanka in #5777
- docs(roles): add documentation about roles by @aditya-radhakrishnan in #5778
- fix(ui): Remove add limit on Entity Profile for glossary terms and tags by @jjoyce0510 in #5780
- fix(ci): Attempting to fix failing smoke tests by @jjoyce0510 in #5760
- fix(tags) Add creator of tag as the owner of it by @chriscollins3456 in #5787
- docs(lookml): updating github_info in lookml docs by @gabe-lyons in #5779
- fix(audit logs) Set actor urn on audit stamp through Java Entity Client by @chriscollins3456 in #5788
- feat(ingestion-ui) Add test connection button to Looker form by @chriscollins3456 in #5794
- fix(ingestion): fix looker chart-usage by @mohdsiddique in #5791
- fix(ingest): Fix oauth config validation in snowflake. by @rslanka in #5796
- fix(bootstrap): Creating dedicated thread pool for executing async bootstrap steps + misc fixes by @jjoyce0510 in #5798
- feat(previews): add previews for glossary terms, tags, and domains by @gabe-lyons in #5784
New Contributors
- @hemanthkotaprolu made their first contribution in #5623
- @szalai1 made their first contribution in #5644
- @amanda-her made their first contribution in #5553
- @de-kwanyoung-son made their first contribution in #5500
- @topleft made their first contribution in #5615
- @divyamanohar-stripe made their first contribution in #5664
Full Changelog: v0.8.43...v0.8.44