This library provides an implementation of the DBSP language for incremental streaming computations. It is a tool primarily meant for research. See it as the PyTorch of streaming.
It has zero
dependencies, and is written in pure python.
Here you can find a single-notebook implementation of almost everything in the DBSP paper. It mirrors what is in this library in an accessible way, and with more examples.
DBSP is differential dataflow's less expressive successor. It is a competing theory and framework to other stream processing systems such as Flink and Spark.
Its value is most easily understood in that it is capable of transforming "batch" possibly-iterative relational queries into "streaming incremental ones". This however only conveys a fraction of the theory's power.
As an extreme example, you can find a incremental Interpreter for Datalog under pydbsp.algorithm
. Datalog is a query language that is
similar to SQL, with focus in efficiently supporting recursion. By implementing Datalog interpretation with dbsp
, we get an interpreter
whose queries can both change during runtime and respond to new data being streamed in.
- Graph Reachability
- Datalog Interpretation
- Not-interpreted Datalog
- Streaming Pandas
- Streaming Pandas on the GPU
There many examples living in each test/test_*.py
file.