Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[config change] use MessageCommitMode when executing future head block messages #2705

Merged
merged 39 commits into from
Oct 9, 2024

Conversation

magicxyyz
Copy link
Contributor

@magicxyyz magicxyyz commented Sep 26, 2024

Fixes NIT-2812
Pulls: OffchainLabs/go-ethereum#362
Includes: #2712

This PR:

  • Fixes use of MessageRunMode values, so as MessageCommitMode is used when, and only when, the message is part of a soon-to-be head block. Previously, newly sequenced / synced messages were executed in MessageReplayMode - newly activated / set-cached stylus programs were not cached in long term cache (only in LRU).

  • Improves repopulating of long term cache after node restart - if program is onchain marked as cached, if its wasm is found in LRU then it is also added to long term cache. That can happen e.g. when a ephemeral call to cached program precedes its onchain execution.

  • Adds tests for stylus long term cache + for repopulating long term cache from LRU cache.

  • Adds metrics for Stylus long term cache (merged from Diego's draft: Stylus cache improvements #2712)

  • Adds config to disable collection of Stylus metrics from Go side (also from the Diego's draft)

@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Sep 26, 2024
Copy link
Contributor

@diegoximenes diegoximenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, requesting changes because I miss a test for this fix

system_tests/state_fuzz_test.go Show resolved Hide resolved
execution/gethexec/executionengine.go Show resolved Hide resolved
@diegoximenes
Copy link
Contributor

I created a draft PR that exposes Stylus long term cache metrics that can be helpful when implementing tests in this PR.

I didn't implement tests in my PR since it requires that long term caching is working properly, which is not true in the master branch 😬

In case you want to use what I developed you can get the changes from my branch into your branch, and then continue and implement the tests.
You can merge my PR into your branch.


// See if the item is in the long term cache
if let Some(item) = cache.long_term.get(&key) {
return Some(item.data());
}

// See if the item is in the LRU cache, promoting if so
if let Some(item) = cache.lru.get(&key) {
let data = item.data();
if let Some(item) = cache.lru.peek(&key).cloned() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath clones data twice: once here in the "get" and the other when returning item.data().
Cloning the entry_size_estimate_bytes is o.k., but we don't want to clone module and engine unnecessarily.
This is where rust gets you :)
There are probably some solutions that would avoid cloning result of the peek, but I think simplest would probably be if you can avoid cloning in item.data() because item itself is discarded right after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tsahi pointed out that there's one more unnecessary clone, so that's not fixed yet, working on it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passed item.module and item.engine without cloning to the returned Option, let me know if that checks out :)

@magicxyyz magicxyyz changed the title use MessageCommitMode when executing future head block messages [config change] use MessageCommitMode when executing future head block messages Oct 3, 2024
@magicxyyz magicxyyz marked this pull request as draft October 4, 2024 10:45
@magicxyyz magicxyyz marked this pull request as ready for review October 4, 2024 11:29
Copy link
Contributor

@diegoximenes diegoximenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

arbos/programs/native.go Outdated Show resolved Hide resolved
system_tests/common_test.go Outdated Show resolved Hide resolved
arbitrator/stylus/src/lib.rs Outdated Show resolved Hide resolved
system_tests/program_test.go Outdated Show resolved Hide resolved
system_tests/program_test.go Show resolved Hide resolved
system_tests/program_test.go Show resolved Hide resolved
diegoximenes
diegoximenes previously approved these changes Oct 8, 2024
Copy link
Contributor

@tsahee tsahee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good. Still need to review program_test.go

will create an issue to fix the remaining problem in caching

}
cache.long_term_counters.misses += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should only be increased if long_term_tag is 1. Because that would mean "this should be in long term cache".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, it will filter out API calls noise and we should be able to observe how many misses are there when a node starts up. Changed :)

pub extern "C" fn stylus_get_lru_cache_metrics() -> LruCacheMetrics {
InitCache::get_lru_metrics()
pub extern "C" fn stylus_get_cache_metrics() -> CacheMetrics {
InitCache::get_metrics()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diegoximenes
this is a bug in the previous PR as well.
You're allocating memory here in rust and returning the pointer to go.
Go discards it because it's a garbage collected language and the memory is never released.
Solution is is to allocate CacheMetrics in go, pass a pointer and let rust update the data in the struct. That way rust doesn't allocate anything new and no memory is lost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a separate issue.

@tsahee tsahee enabled auto-merge October 9, 2024 20:57
@tsahee tsahee merged commit 32c3f4b into master Oct 9, 2024
16 checks passed
@tsahee tsahee deleted the fix-run-mode branch October 9, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-approved s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants