-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: async support #44
base: master
Are you sure you want to change the base?
Conversation
Apologies, I probably won't have time to read through all the code today.
The fundamental challenge here is that all the nodes in the Bao stream are laid out in pre-order. So the very first thing that needs to hit the wire is effectively the root hash of the entire input, and that means you need to process the entire input before even a single byte of encoded output is ready. (Well, the first 8 bytes are just the length, so it's actually the ninth byte that requires the entire input.) The intended way to work around this is Currently Bao always saves the entire hash tree, which is ~6% the size of the original input. (So a "combined" encoding is 106% of the input size, and an "outboard" encoding is 6%.) But a big TODO for me is to support configurable "chunk group" sizes, where Bao would omit the lower levels of the tree and recompute those on the fly as needed. Then you could tune the size of the outboard tree, saving space on disk in exchange for needing to buffer more input during decoding and seeking. |
so the preorder/postorder thing is an artifact of doing a two pass encoding of the bao format using two buffers. for my purposes I wrote a hasher that produces a See https://github.com/dvc94ch/blake-tree/blob/master/core/src/hasher.rs and https://github.com/dvc94ch/blake-tree/blob/master/core/src/tree.rs for the details |
But with regards two this PR I have two questions.
|
it's less about specific perf improvements, more about easier integration into existing pipelines/code
Eventually, yes that is the goal, or at least sth like this |
Took a first stab at async support, just encoding for now.
A big issue I am running into right now, is that I want to use this the
Encoder
write to a network stream without buffering much, but unfortunately that requiresRead + Seek
, both of which are not available on an outgoing stream connection.Any thoughts on how/how hard it would be to build an encoder that does not require reading back any content from the underlying target?
Ref #43