feat(trie): Leveling up our `bonsai-trie` #241

cchudant · 2024-09-02T13:36:19Z

The context

In deoxys, we disabled the trie logs from bonsai-trie. This is because it enables a huge speedup during sync as well as a big space win.

However, everything comes at a cost: there are a few features that we'd like to implement in the near future that require them, such as

We need to be able to query the global tries at a current_block - N block, where N is a relatively small constant (more on that later) for:

storage proofs for SNOS input (first priority)
the v0.8 rpc spec now has a get_storage_proof method that we need to support
the snap sync specs are being developed for p2p, and it needs introspection of the global trie in the past as well as storage proofs

We also need to be able to revert the state of the global tries to current_block - N, where N is very small in order to handle reorgs correctly.

For context, the p2p specs also has a capability protocol that allow peers to differentiate between "archive nodes" (where N is infinity, every trie log is stored) and normal nodes. The way it does this is by simply saying how much that N is at a given time, from what I understand of Shahak's presentation at the starknet node core dev meetup at bruxelles.

Note that, like ethereum (I believe?), a non-archive node does have every block and can actually bootstrap an archive node. It has all the info needed to do that, as the archive node can simply replay the blocks from genesis to sync.

N should be around 128 by default.

Performance

There are some performance concerns here:

storing bonsai-trie logs is kind of expensive by itself
When storing bonsai-trie values in the DB, to create trie-log, it needs the current value in the db at the time. I removed that entirely from our BonsaiDb adaptor, and the gain was around 20% iirc.
- that might be surprising, but this is very much expected as gets are necessarily way slower than puts. Rocksdb needs to get the current value from disk most of the time, where put really is just a write to memory that will eventually be flushed to disk in batch.
- we have to put that back :(

For context, back before the codeswap with deoxys, we were very focused on sync performance as we saw this was one of the best way to differentiate deoxys from the juno/pathfinder competition. We are of the opinion that their sync speed is slow, and we wanted to see how fast we could get ours.
We are okay with giving up some sync performance now, as the context has changed quite a lot. However, this would make me sad and I'd like to avoid that as much as possible :)

If we want to keep the same performance as before, a simple way to do that would be to disable all trie-log related stuff up until we reach current_block - N. In the near future, with the second part of the block-import pipeline refactor, it will be possible to know how far behind we currently are from the tip of the blockchain. But before then I think we'll have to take the performance hit :(
There are probably way better ways to batch the gets required to make the trie logs work, too.

Possible future work

There are a few things that I want to throw in there about the bonsai-trie:

The trie log is very redundant with our state-diff column. We may want to remove our state-diff column and add methods to bonsai-trie to query it.
The flat DB in bonsai-trie is useless, as we use our own historical key-value flat storage for that.
- we can just remove the flat db or make it optional in bonsai-trie -- that wouldn't break anything because bonsai-trie stores the hashes of its leafs in the trie column too.
- oh also technically we could delete keys before current_block - N or not store them at all there. we should experiment with that
I started an experiment to parallelize the last remaining thing in bonsai-trie that could give us a perf boost, which is batch inserting. I really want to get around to finishing that someday, ahah

Roadmap

Okay, with all of the context out of the way - let's distill that into tasks that we can work on today and prioritize them :)

First, it's imperative to remove our KasarLabs/bonsai-trie fork. It is synced to this draft pull request that refactors and fixes a big part of the library. It's still in draft though, as we did not have the time to land it properly yet
Re-enable trie-logs in madara. This touches the mc-db part of the codebase.
- This seems simple at a glance but I'm expecting us to find some surprise bugs here and there.
Make that N a CLI argument
Test the trie-logs in depth, as I'm not confident it's free of bugs right now. (we had issues with other parts of the bonsai-trie having surprise bugs in the past)
Test the storage proofs in depth in bonsai-trie, for the same reasons.
- Storage proof verification might need to be fixed as well, as we can't know if our proofs are good in tests if we have no way of verifying them. (there might ways around this though) There is currently already an open issue around storage proof verification.
Hook it up in madara, and make the endpoint for SNOS (top priority)
- The SNOS input looks like this. It needs a storage proof for every key-value used in the block (which we can easily deduce from the state-diff)
- If possible, let's make it the same endpoint as the new v0.8 get_storage_proof endpoint. v0.8 also adds a get_pcs which is also needed for SNOS - and I'm kind of thinking that Ariel made these endpoints to enable our usecase specifically, in fact. neat!

After that, bonsai-trie will probably get deprioritized for some time, but following from that

Document the trie log in bonsai-trie, and the rocksdb snapshot optimization that Massalabs added too.
Handle reorgs
- should be the next priority, in my opinion
- we'll need to make all the other columns in madara revertible, see feat(reorg): Add support for possible l1 and l2 reorgs #202
Remove the flat column in bonsai-trie or make it optional
Remove the state-diff column in madara and get it from trie-log
Performance improvements, as said in the last section.
p2p capability, snap sync and other stuff depends on the future starknet roadmap

The text was updated successfully, but these errors were encountered:

jbcaron · 2024-09-02T14:24:56Z

It seems bonsai might not be ideally suited for generating merkle proofs from historical states or efficiently retrieving past key values. This limitation arises because bonsai’s design optimizes for current block execution, leveraging the flat DB cache for performance.
Further exploration of alternatives or enhancements in this area might be needed.

cchudant · 2024-09-02T14:38:17Z

It seems bonsai might not be ideally suited for generating merkle proofs from historical states or efficiently retrieving past key values. This limitation arises because bonsai’s design optimizes for current block execution, leveraging the flat DB cache for performance. Further exploration of alternatives or enhancements in this area might be needed.

this would invalidate this whole roadmap
after talking privately with jbcaron i realize that we should investigate that question asap, bonsai vs ref-counted mpt - we may be going completely in the wrong direction with bonsai

apoorvsadana · 2024-09-04T12:26:41Z

So these are thoughts after doing some research internally. The main purpose is to answer the question if we should use bonsai or not.

Tradeoffs

This is the tradeoff space from what we understand. Assume that you start from (1) and apply changes incrementally.

Store historical trees i.e. widely used MPT
1. Get fast proofs for older state
2. High storage
3. Get slow reads for leaf data both historical and current
Update tree in place i.e. prune any historical structure
3. Impossible to get proofs of older state
4. Low storage
5. Get slow reads for only current leaf data
Store leafs at hash(key) and not hash(node)
1. Impossible to get proofs of older state
2. Low storage
3. Get fast reads for only current leaf data
Store state diffs at each block
1. Possible to get proofs of older state but would need to apply state diffs and reconstruct tree in memory. So it's slower.
2. Low storage (you're storing all leaves as case (1) but you don't need to store intermediate nodes)
3. Get fast reads for leaf data both historical and current

L1 Situtation

The problems we're trying to answer already exist on the L1.

Geth (1 and some part of 3 with snapshots I guess) => uses MPT which makes historical access easier. I believe infura uses Geth (or at least has some Geth nodes running in archive). Which is why, if you query infura for a proof of a very old block, you get it.
1. cast proof 0xdAC17F958D2ee523a2206206994597C13D831ec7 0x123 --block 1 --rpc-url https://mainnet.infura.io/v3/<api_key>
2. This works :)
Reth/Erigon (4) => they use the flat DB structure similar to Bonsai. As a result, it's really difficult for them to query "proofs" of older state. For example, reth has a default proof window of 0 and max window of 216000. So if you run
1. cast proof 0xdAC17F958D2ee523a2206206994597C13D831ec7 0x123 --block 20676345 --rpc-url https://eth.merkle.io
2. which is block from today afternoon, you get the error distance to target block exceeds maximum proof window

L2 situation

Madara (4) => we're very similar to Reth/Erigon. I am not even sure how Bonsai is different from the flat DB architecture of Erigon. In fact, the consensys team mentions something similar in their meeting notes with Erigon here.
1. Erigon threw out the MPT (merkle patricia trie) completely and computes state root post execution and other than that we have a flat state. Plain state table: value = account, key = account address. We are almost there with bonsai on the flat storage but we should work on simplifying
Juno (4) => they're inspired by the Erigon design as well from what we've understood from the code/talks. They don't store historical tree structure either and face a similar problem like us as mentioned here.
Pathfinder (not sure but it looks like they've the option to store historical trees + state diffs so they've fast reads for leaf data) => Pathfinder should've a lot of storage requirements by default and seems closer to Geth

Do we keep Bonsai or go with MPT?

It seems there are only two main use cases of the getProof endpoint

SNOS input
Storage proofs for applications like Herodotus or light clients

There are no major use cases for proofs at the application level just yet (and we don't have a working light client implementation either at the moment). And to be precise the tradeoff to get a fast proof endpoint is

A. We store historical tree nodes and increase storage
OR
B. we store only current tree nodes and build the remaining on the fly

We think we should go down the path of reth/erigon. We optimise for current (or recent state access) which means getProof only works for very recent blocks with max block limit + getProof by design would be slower when compared to something like Pathfinder because that's not what we're optimising.

antiyro assigned cchudant and jbcaron Sep 6, 2024

antiyro changed the title ~~Leveling up our bonsai-trie~~ feat(trie): Leveling up our bonsai-trie Sep 25, 2024

antiyro mentioned this issue Sep 25, 2024

feat(storage): Retrieve storage proof from the bonsai trie #203

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(trie): Leveling up our `bonsai-trie` #241

feat(trie): Leveling up our `bonsai-trie` #241

cchudant commented Sep 2, 2024 •

edited by HermanObst

Loading

jbcaron commented Sep 2, 2024

cchudant commented Sep 2, 2024 •

edited

Loading

apoorvsadana commented Sep 4, 2024

feat(trie): Leveling up our bonsai-trie #241

feat(trie): Leveling up our bonsai-trie #241

Comments

cchudant commented Sep 2, 2024 • edited by HermanObst Loading

The context

Performance

Possible future work

Roadmap

jbcaron commented Sep 2, 2024

cchudant commented Sep 2, 2024 • edited Loading

apoorvsadana commented Sep 4, 2024

Tradeoffs

L1 Situtation

L2 situation

Do we keep Bonsai or go with MPT?

feat(trie): Leveling up our `bonsai-trie` #241

feat(trie): Leveling up our `bonsai-trie` #241

cchudant commented Sep 2, 2024 •

edited by HermanObst

Loading

cchudant commented Sep 2, 2024 •

edited

Loading