Skip to content

[Release] 0.4.1

Compare
Choose a tag to compare
@devinrsmith devinrsmith released this 01 Sep 23:22
· 2497 commits to main since this release

New Feature

Kafka Ingestion Support

  • Schema service integration for Confluent/Apicurio.
  • Decode from Apache Avro, JSON, or primitives.
  • Start from beginning, end, current, or an arbitrary offset.
  • Stream tables to the Deephaven query engine for efficient event processing or aggregations.
  • Record full history in-memory to allow richer Deephaven queries.
  • Commingle all partitions, or segregate them into distinct sub-tables within Deephaven TableMaps.

Stream Table Optimizations

  • Developed in support of Kafka ingestion.
  • Allow for very efficient aggregations over the full data set, while only storing “new” rows in their entirety.
  • Can be transformed into full-history in-memory tables.
  • Suitable for any streaming ingestion use case.

Parquet Support Expansion

  • Use Spark/Dask style _metadata and _common_metadata files when present for faster partition discovery.
  • Support multiple row groups in a single .parquet file.
  • Major scalability enhancements for files with many dictionary pages and/or row groups.
  • Support dictionary size constraints in writing tools.
  • Single entry point for “do the right thing” reading.
  • This expands on existing support for Parquet, which allows for symmetrical writing/reading of Deephaven tables, in addition to ingestion support for single files, flat layouts, and hive-style partitioning with most standard compression options and data types.

Application Mode

  • Loads all .app files in deephaven.application.dir (CLI flag to directory).
  • Use ApplicationService to subscribe to field changes.
  • Script types of application must match deephaven.console.type.
  • Static (script-less, immutable) applications subclass StaticClassApplication.
  • Dynamic (script-less, non-immutable) applications subclass DynamicApplication.
  • Initialize REPL environment with a Script Application (Query Scope changes are available to REPL sessions).

Python Integration

  • New deephaven.Types import, defines constants for Deephaven column data types and adds support for table creation from constant data with an API resembling Panda’s DataFrame. This builds on the QST support for building tables also mentioned in the Java client section.

Python client

  • Session object connects to DH server, convenience methods for creating empty tables, time tables, importing table from Arrow data, merging tables, opening tables by names, and running scripts.
  • Table object represents a DH table, supports most table operations including filtering, selection, snapshotting, aggregation and joins.
  • Query object defines a set of table operations to be performed in a single batch on the server.
  • Available for download from PYPI.

Java client

  • Arrow Flight integration; doGet (doPut next release).
  • Query syntax tree (QST) support.
  • Serial mode.
  • Batch mode.
  • Console session support.
  • Command line utilities and examples.
  • Graphviz visualization for QST.
  • Jars published to Maven Central; https://search.maven.org/search?q=g:io.deephaven

C++ Client

  • Arrow flight integration (basically a passthrough to the C++ FlightClient provided by the Arrow libraries). Provides doGet, doPut, and others.
  • Async mode.
  • Unit tests and examples.

UI

  • CSV upload and download are now supported via the web UI.

Enhancement

Improved interactions between two windows communicating with the same worker.

  • The shared query-scope fields show up in the panels drop-down.
  • The UI recognizes initial change to an existing query-scope field as a new field.

Improved detection of scenarios where the backend is not working as expected.

  • If any of grpc-web, envoy, or the worker exit, the UI detects this and notifies the user.
  • If a Barrage subscription fails to snapshot, all subscribers are notified and closed.

Improvements to Arrow Flight integration

  • Migrated all Barrage metadata into FlightData metadata field.
  • doGet works from official Java / Python and C++ Arrow Flight clients.

Improved error messages related to tickets (missing and invalid) to gRPC clients

gRPC API

  • Added FetchTable to TableService; can be used in a batch.
  • Added ExportFromTicket to SessionService; does not validate type of export.
  • Added ExportNotification.RUNNING state to distinguish from QUEUED.
  • Reduced log volume related to UNAUTHENTICATED requests.

Javascript Client (jsapi)

  • Wired up selectDistinct (re-enables existing UI functionality for filter pop-ups).

Address bugs and inefficiencies in autocomplete.

Improved error handling in the JS API.

Bug fix

  • Fixed several issues regarding Liveness tracking in SessionState et. al.
  • Fix SessionState NPE caused by cancel / perform-export-work race.
  • Fix barrage subscription update race due to acquiring wrong lock.
  • Fix bugs in SortedRanges.SearchIterator for hasNext after exhaustion.

Internal

  • Cleanup many of the rawtypes related to ColumnSources.
  • Upgrade JavaParser from 2.0.0 to 3.23.0.
  • AutoComplete is now performed over a bi-directional stream.
  • Javascript Client
    • Prefers to use websockets in non-ssl for bi-directional streams.
    • Uses HTTP headers x-deephaven-stream-ticket, x-deephaven-stream-sequence, and x-deephaven-stream-halfclose to simulate client-streams (and bidirectional-streams) when using SSL.

System Administration

  • Flag io.deephaven.console.type renamed to deephaven.console.type.
  • New Flag deephaven.application.dir (see Application Mode in New Features).
  • Added no-cache headers to nginx_default.conf.
  • Added layouts directory adjacent to notebooks in nginx_default.conf.
  • Disable websocket idle timeout in envoy via stream_idle_timeout: 0s.

Commits

Raw Git release notes:
3349c42 Cut for 0.4.0
f9fb90b Remove old docker/ contents (#1170)
b7edf2c Deephaven C++ client (#1176)
5681bad Propagate Barrage propagation errors to the client table widget (#1182)
d11aa6c Give table update listeners short version of propagation error (#1179)
7b23050 MVP release, version set to 0.4.0, fixes #1171 (#1175)
d79bed1 Improve Ticket/Descriptor error messages for all gRPC usages (#1174)
b9bbef2 Added python tests for every Kafka spec type. Closes #1123. (#1168)
8361935 Properly ignore non-refreshing dynamic nodes; fix jsapi getObject to wait for Ticket (#1173)
2ff6962 Add execute script support for java client (#1156)
2b11f45 Add publishing for java-client jars and dependencies, fixes #1126 (#1149)
0753e2d Remove files that should have been never added. (#1169)
34cd17a ScriptSession#setVariable require subclasses to notify old vs new Value (#1166)
b181861 Add python documentation for Kafka integration. (#1160)
36b4c2d add compose files for standalone docker containers (#1058)
ff335f4 Use the correct location (Classpaths.groovy) to declare which java parser version (#1165)
e7ca952 Update web to v0.3.1 (#1164)
5cfd357 Improved validations for comboagg grpc request (#1014)
cf19dfa Initial Kafka python tests. (#1155)
657a716 fixed a compatibilty issue in the Python client due to the latest chanage to session proto, fixes #1158 (#1159)
ad6ff1b Update all .env files (#1161)
c2bd84f Updating web UI to v0.3.0 (#1154)
e0d85c4 Application mode for long running workers (#1082)
37e5f32 Adding Javadoc to KafkaTools.java. Closes #1122 (#1148)
6ef97c8 Deephaven Python API MVP (#1094)
404d5fd Upgrade javaparser-core to 3.23.0 (#1106)
a791fd3 Massive RawTypes Cleanup (#1145)
cd8745d Add safety check for .gitignore, fixes #1114 (#1147)
2e36920 Learn library logic, no dependencies (#1140)
51f19a5 Support bidirectional streaming on websockets (#1111)
a45154c Use DBDateTime insteat of qst...instant from python in deephaven.Types for the time being. (#1144)
602dfd9 Disambiguate DynamicTableWriter constructors taking an array from the point of view of jpy. (#1143)
590a746 Disable envoy websocket timeout (#1139)
7b2eacf Parquet: Lazy dictionary loading, shared page caches, and better file channel cache behavior (#1130)
d3d6aa9 QST to graphviz DOT format; and SVG, PNG, and others (#935)
34552e7 Fix CODEOWNERS references (#1137)
0add3f8 Run ./gradlew :Generators:generateAllPython (#1132)
473a40a DynamicTableWriter ctor to support using qst...Types (#1125)
f39d8f2 java-client session and flight (#953)
86319e2 Update code style (#1121)
a7a0454 Downgrade Alpine to avoid a DNS issue on older mac docker installs (#1034)
10fc2f6 Update CONTRIBUTING.md for styleguide (#1118)
9b851e5 Add DynamicTableWriter to deephaven module (#1117)
3d46a74 Fix tests after styling applied
8a5099f Apply style guide
038f8ee Remove style guide ratchet
abcfa45 Add style guide for most projects
14a997c Fix .gitignore rules (#1115)
c646eb2 Rename KafkaIngester module to Kafka. (#1110)
5e135dd Some Avro converter unit tests. (#1108)
8e64658 Adding Types.py for wrapping DH column types and table operations around them. (#1088)
11bda2e Fix a bug for the implementation of simple in the python side of KafkaTools. (#1109)
c932216 Add support for seek to end to Kafka API. (#1103)
440a166 rerun ./gradlew :DB:replicate (#1099)
2074ed2 Force jvm to load class in desired package (#1095)
7fe69b9 CrossJoin cardinality calculation may overflow integer (#1098)
87aa98e Properly use liveness API for dynamic non-refreshing nodes in SessionState exports (#1092)
8addc73 Revert "First iteration of Learn library (#974)" (#1096)
216ca7d First iteration of Learn library (#974)
0c22475 Native arrays, Db Arrays, and types. (#1076)
152b2b8 Add support for converting stream tables to append from python. (#1090)
41cd9a3 JS API Error handling (#1086)
686e5f8 TableMap support in KafkaTools (#1079)
c80f7c8 Move away from using **kwargs in python Kafka API. (#1081)
852878d Fix bugs and missing files preventing Java2NumpyCopyTests from compiling. (#1067)
7e2bb30 Run DB:replicate (#1077)
0ceb3ae Stream sorting optimizations, and improve unsupported aggregation messaging (#1074)
39e337f New KafkaTools.consumeToTable general API. (#1064)
27765e0 docker version (#1075)
0cf3e4b Enhance listener error checking in TestKeyedArrayBackedMutableTable (#1071)
a263cb9 Propagate errors from Kafka to tables. (#1059)
3f02491 expose ABS_SUM aggregation to the web API (#938)
effad18 Fix a bug in the handling of keyword arguments for consumeToTable impacting avro. (#1066)
8de114e Table creator helpers (#1051)
c80cca6 Kafka: Support efficient add-only and stream sorted first/last by (#1061)
7fc16a0 Fix an issue with kafka avro ingestion introduced in the last refactoring. (#1063)
d291498 Remove TableTools.colDef and its need from KafkaTools; fix Boolean cols. (#1055)
e0ba57e remove reference to QueryScope and the alias qs from the deephaven package, fixes #1056 (#1057)
858253d Fixing a bug on handling table_type. (#1053)
8592d66 Migrate all barrage metadata structures to live inside of FlightData (#1020)
4d85eb6 Fix a bug for no-key streaming aggregations; filter no-op updates (#1050)
4db2ab7 Json support on top of stream adapters, plus refactorings towards reuse facilitating that. (#1043)
371b44e JSAPI: wire up selectDistinct; fixes the select values box in advanced filters (#1042)
4ee451b Query Engine: Port fix for DH-11484 (naturalJoin does not detect duplicates when right table is live and left is static) (#1044)
0e5cde7 Add a layouts directory adjacent to the notebooks directory (#1048)
582b04b Correctly fail the build when python needs to be regenerated (#1023)
e83cc93 Generate pydoc. (#1047)
af43a71 SessionState: fix NPE caused by cancel/exec race (#1032)
edf6050 BarrageProducer: the subscription lock was replaced with a class level lock (#1033)
4e1c319 Support a delegating getChunk implementation on SwitchColumnSource, and associated cleanups (#1029)
768fe04 ModifiedColumnSet: utility to set range of columns dirty (#1031)
218c225 Only upload javadoc artifacts for main (#1028)
3ff2e0e samples has been moved to deephaven/examples, deephaven/examples#21 (#1027)
f8cd47d Switch from recursive to iterative post-order implementation. Add rel… (#1010)
367bdd9 Bump java version (#1021)
e643582 Plumb KafkaIngester into StreamToTableAdapter so that we use chunks for ingestion instead of tablewriters. (#998)
ed53a85 Generated pydoc. (#1019)
bef62fb dramatically reduce log spam for unauthenticated requests (#965)
da23f6e Bump setup-java action to v2.2.0 (#1015)
16b39ad Oops, I forgot to reset my shared contexts. (#1018)
4bb1a38 Kafka: Stream firstBy and lastBy chunked operators, address other engine features for stream-tables (#977)
8436631 remove QueyScope from Deephaven Python package, fixes #846 (#989)
04d3fc1 Add QST and/or filters, partial #829 (#1000)
88edf42 Remove allowFailure logic, fixes #598 (#999)
184acfc Parquet: Address regressions and format irregularities discovered during testing (#1006)
865b823 Avoid using "e" as a local variable name, as it could conflict with a (#1004)
8957519 Parquet: improve error messages for unsupported file structure (#1001)
d5c0407 Improve avro kafka ingestion python API. Support NULL UNIONs, and... (#993)
8f08743 Add table-api to javadoc (#987)
969e2b9 Reduce the odds that TestSpecialPrimitives fails (#982)
5d458da Parquet: Support _metadata and _common_metadata, multiple row groups, and a generic readTable; also clean up regioned column sources and fix dictionary writing/symbol tables (#954)
5323022 Add instructions on how to install confluent-kafka. (#975)
261afc4 Add help and discussion links to README resources (#973)
fd0d191 Handle basic single-type deserializers in KafkaTools; add docker-composed apicurio. (#943)
3de986d Fix #950 SortedRanges.SearchIterator hasNext() still returns true after exhaustion via advance (#955)
7a49d50 Clean up proto packages/namespaces, remove flight ticket (#956)
2dfd169 Add spotlessCheck into quick and smoke tasks. Will ensure code style check is run as part of check-ci. (#962)
7d9cff6 Add QST to javadocs (#960)
32e498c Fix TableSpec.snapshot. Update other building patterns to ensure no similar mistakes slip through. (#957)
c7c8c48 Add trigger to slack workflow on nightly CI failure. Fixes #771. (#948)
ebb1c1f Bump next version to 0.4.0 (#942)