-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(trie): Leveling up our bonsai-trie
#241
Comments
It seems bonsai might not be ideally suited for generating merkle proofs from historical states or efficiently retrieving past key values. This limitation arises because bonsai’s design optimizes for current block execution, leveraging the flat DB cache for performance. |
this would invalidate this whole roadmap |
So these are thoughts after doing some research internally. The main purpose is to answer the question if we should use bonsai or not. TradeoffsThis is the tradeoff space from what we understand. Assume that you start from (1) and apply changes incrementally.
L1 SitutationThe problems we're trying to answer already exist on the L1.
L2 situation
Do we keep Bonsai or go with MPT?It seems there are only two main use cases of the getProof endpoint
There are no major use cases for proofs at the application level just yet (and we don't have a working light client implementation either at the moment). And to be precise the tradeoff to get a fast A. We store historical tree nodes and increase storage We think we should go down the path of reth/erigon. We optimise for current (or recent state access) which means getProof only works for very recent blocks with max block limit + getProof by design would be slower when compared to something like Pathfinder because that's not what we're optimising. |
bonsai-trie
bonsai-trie
The context
In deoxys, we disabled the trie logs from bonsai-trie. This is because it enables a huge speedup during sync as well as a big space win.
However, everything comes at a cost: there are a few features that we'd like to implement in the near future that require them, such as
We need to be able to query the global tries at a
current_block - N
block, where N is a relatively small constant (more on that later) for:We also need to be able to revert the state of the global tries to
current_block - N
, where N is very small in order to handle reorgs correctly.For context, the p2p specs also has a capability protocol that allow peers to differentiate between "archive nodes" (where N is infinity, every trie log is stored) and normal nodes. The way it does this is by simply saying how much that
N
is at a given time, from what I understand of Shahak's presentation at the starknet node core dev meetup at bruxelles.Note that, like ethereum (I believe?), a non-archive node does have every block and can actually bootstrap an archive node. It has all the info needed to do that, as the archive node can simply replay the blocks from genesis to sync.
N should be around 128 by default.
Performance
There are some performance concerns here:
get
s are necessarily way slower thanput
s. Rocksdb needs to get the current value from disk most of the time, whereput
really is just a write to memory that will eventually be flushed to disk in batch.For context, back before the codeswap with deoxys, we were very focused on sync performance as we saw this was one of the best way to differentiate deoxys from the juno/pathfinder competition. We are of the opinion that their sync speed is slow, and we wanted to see how fast we could get ours.
We are okay with giving up some sync performance now, as the context has changed quite a lot. However, this would make me sad and I'd like to avoid that as much as possible :)
If we want to keep the same performance as before, a simple way to do that would be to disable all trie-log related stuff up until we reach
current_block - N
. In the near future, with the second part of the block-import pipeline refactor, it will be possible to know how far behind we currently are from the tip of the blockchain. But before then I think we'll have to take the performance hit :(There are probably way better ways to batch the
get
s required to make the trie logs work, too.Possible future work
There are a few things that I want to throw in there about the bonsai-trie:
current_block - N
or not store them at all there. we should experiment with thatRoadmap
Okay, with all of the context out of the way - let's distill that into tasks that we can work on today and prioritize them :)
mc-db
part of the codebase.N
a CLI argumentAfter that, bonsai-trie will probably get deprioritized for some time, but following from that
The text was updated successfully, but these errors were encountered: