feat: async support #44

dignifiedquire · 2023-01-10T21:20:40Z

Took a first stab at async support, just encoding for now.

A big issue I am running into right now, is that I want to use this the Encoder write to a network stream without buffering much, but unfortunately that requires Read + Seek, both of which are not available on an outgoing stream connection.

Any thoughts on how/how hard it would be to build an encoder that does not require reading back any content from the underlying target?

Ref #43

oconnor663 · 2023-01-10T22:36:14Z

Apologies, I probably won't have time to read through all the code today.

Any thoughts on how/how hard it would be to build an encoder that does not require reading back any content from the underlying target?

The fundamental challenge here is that all the nodes in the Bao stream are laid out in pre-order. So the very first thing that needs to hit the wire is effectively the root hash of the entire input, and that means you need to process the entire input before even a single byte of encoded output is ready. (Well, the first 8 bytes are just the length, so it's actually the ninth byte that requires the entire input.)

The intended way to work around this is bao slice and bao decode-slice. The assumption there is that you run the entire bao encode locally and save the result on disk. If you save the "combined" encoded file, you can just serve that verbatim. If you save the "outboard" encoded file, to avoid the space overhead of an entire copy of the original input, then you can use bao slice together with the original and the outboard encoding to reproduce the "combined" encoding. That does work in streaming-friendly way, and it doesn't require seek support in the output.

Currently Bao always saves the entire hash tree, which is ~6% the size of the original input. (So a "combined" encoding is 106% of the input size, and an "outboard" encoding is 6%.) But a big TODO for me is to support configurable "chunk group" sizes, where Bao would omit the lower levels of the tree and recompute those on the fly as needed. Then you could tune the size of the outboard tree, saving space on disk in exchange for needing to buffer more input during decoding and seeking.

dvc94ch · 2023-01-10T22:47:56Z

so the preorder/postorder thing is an artifact of doing a two pass encoding of the bao format using two buffers.

for my purposes I wrote a hasher that produces a Vec<Insertion> and writes the contents atomically into a kv store so that pre/post order traversal can be performed efficiently. The main reason for not using the bao crate was that decoding all slices of the file should be able to reproduce the "hash tree" without having to rehash the data. this allows peer to provide the slices they have without having to first sync the entire file and rehash it's contents.

See https://github.com/dvc94ch/blake-tree/blob/master/core/src/hasher.rs and https://github.com/dvc94ch/blake-tree/blob/master/core/src/tree.rs for the details

dvc94ch · 2023-01-10T22:51:42Z

But with regards two this PR I have two questions.

does making it async provide a measurable difference in any benchmark
is this work intended to be used in iroh? that would be an exciting break from existing ipfs orthodoxy

dignifiedquire · 2023-01-11T12:37:30Z

does making it async provide a measurable difference in any benchmark

it's less about specific perf improvements, more about easier integration into existing pipelines/code

is this work intended to be used in iroh?

Eventually, yes that is the goal, or at least sth like this

feat: first pass at async encoding

36a7def

dignifiedquire changed the title ~~feat: first pass at async encoding~~ feat: async support Jan 10, 2023

first pass at decoding

4bfdf75

dignifiedquire mentioned this pull request Feb 20, 2023

Avoid buffering of the data when receiving it n0-computer/iroh#773

Closed

2 tasks

ramfox mentioned this pull request Feb 20, 2023

Add async decode to bao n0-computer/iroh#754

Closed

rklaehn mentioned this pull request Jan 27, 2023

Minimal async decoder support #46

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: async support #44

feat: async support #44

dignifiedquire commented Jan 10, 2023 •

edited

Loading

oconnor663 commented Jan 10, 2023 •

edited

Loading

dvc94ch commented Jan 10, 2023

dvc94ch commented Jan 10, 2023 •

edited

Loading

dignifiedquire commented Jan 11, 2023

feat: async support #44

Are you sure you want to change the base?

feat: async support #44

Conversation

dignifiedquire commented Jan 10, 2023 • edited Loading

oconnor663 commented Jan 10, 2023 • edited Loading

dvc94ch commented Jan 10, 2023

dvc94ch commented Jan 10, 2023 • edited Loading

dignifiedquire commented Jan 11, 2023

dignifiedquire commented Jan 10, 2023 •

edited

Loading

oconnor663 commented Jan 10, 2023 •

edited

Loading

dvc94ch commented Jan 10, 2023 •

edited

Loading