Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Str 304 migrate prover guest code #339

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

prajwolrg
Copy link
Contributor

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature/Enhancement (non-breaking change which adds functionality or enhances an existing one)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor

Checklist

  • I have performed a self-review of my code.
  • I have commented my code where necessary.
  • I have updated the documentation if needed.
  • My changes do not introduce new warnings.
  • I have added tests that prove my changes are effective or that my feature works.
  • New and existing tests pass with my changes.

Related Issues

Copy link

codecov bot commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 0% with 28 lines in your changes missing coverage. Please review.

Project coverage is 57.51%. Comparing base (b47eb40) to head (5dcc926).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
sequencer/src/rpc_server.rs 0.00% 10 Missing ⚠️
crates/consensus-logic/src/checkpoint.rs 0.00% 6 Missing ⚠️
crates/storage/src/managers/checkpoint.rs 0.00% 6 Missing ⚠️
crates/storage/src/ops/checkpoint.rs 0.00% 4 Missing ⚠️
crates/rpc/types/src/types.rs 0.00% 2 Missing ⚠️
@@            Coverage Diff             @@
##           master     #339      +/-   ##
==========================================
- Coverage   60.69%   57.51%   -3.18%     
==========================================
  Files         221      244      +23     
  Lines       24110    25686    +1576     
==========================================
+ Hits        14633    14774     +141     
- Misses       9477    10912    +1435     
Files with missing lines Coverage Δ
crates/rpc/api/src/lib.rs 0.00% <ø> (ø)
crates/rpc/prover-client-api/src/lib.rs 0.00% <ø> (ø)
crates/rpc/types/src/types.rs 0.00% <0.00%> (ø)
crates/storage/src/ops/checkpoint.rs 0.00% <0.00%> (ø)
crates/consensus-logic/src/checkpoint.rs 0.00% <0.00%> (ø)
crates/storage/src/managers/checkpoint.rs 0.00% <0.00%> (ø)
sequencer/src/rpc_server.rs 0.00% <0.00%> (ø)

... and 29 files with indirect coverage changes

Copy link
Contributor

@MdTeach MdTeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems also the codes/temp data from the Guest CL STF are also included here. We might need to revert/reject those and keep updates done on prover_client only

Comment on lines 12 to 39
alpen-express-common = { workspace = true }
alpen-express-db = { workspace = true }
alpen-express-rocksdb = { workspace = true }
alpen-express-rpc-types = { workspace = true }
alpen-express-state = { workspace = true }
express-proofimpl-evm-ee-stf = { workspace = true }
express-prover-client-rpc-api = { workspace = true }
express-sp1-adapter = { workspace = true, features = ["prover"] }
express-sp1-guest-builder = { path = "../../provers/sp1" }
express-zkvm = { workspace = true }


alpen-express-btcio.workspace = true
alpen-express-primitives.workspace = true
anyhow = { workspace = true }
argh = { workspace = true }
async-trait = { workspace = true }
bincode = { workspace = true }
bitcoin.workspace = true
borsh = { workspace = true }
jsonrpsee = { workspace = true, features = ["http-client"] }
rayon = "1.8.0"
reth-rpc-types = { workspace = true }
rockbound = { workspace = true }
serde = { workspace = true }
strata-tx-parser.workspace = true
thiserror = { workspace = true }
tokio = { workspace = true }
tracing = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use consistent syntax for all the dependencies.
Maybe use {workspace = true} for all.

Copy link
Collaborator

@delbonis delbonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to rework the data/signalling flow here, making it controlled instead of freerunning and desynced.

}

async fn fetch_input(&self, block_num: u64) -> Result<Self::Input, anyhow::Error> {
debug!("Fetching CL block input for block {}", block_num);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use log params. debug!(%block_num, "fetching CL block proof input") or something.

}

async fn fetch_input(&self, block_num: u64) -> Result<Self::Input, anyhow::Error> {
debug!("Fetching EL block input for block {}", block_num);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log params, see other comment.

Comment on lines 21 to 23
/// Start proving the given el block
#[method(name = "getTaskStatus")]
async fn get_task_status(&self, task_id: String) -> RpcResult<Option<String>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing the impl fn where this parses as a UUID, this can take a Uuid directly. And all the other ones can return Uuid. Unless we intentionally want to make it so that the caller has to treat the values like a black box?

Comment on lines 9 to 19
/// Start proving the given el block
#[method(name = "proveBtcBlock")]
async fn prove_btc_block(&self, el_block_num: u64) -> RpcResult<String>;

/// Start proving the given el block
#[method(name = "proveELBlock")]
async fn prove_el_block(&self, el_block_num: u64) -> RpcResult<String>;

/// Start proving the given cl block
#[method(name = "proveCLBlock")]
async fn prove_cl_block(&self, cl_block_num: u64) -> RpcResult<String>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think these should take blkids.

Comment on lines 13 to 14
pub type CLProofInput = (ChainState, L2Block);
pub type CLProofPublicParams = ([u8; 32], [u8; 32]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be actual structs unless there's a good reason to right?

Comment on lines 6 to 9
#[cfg_attr(not(feature = "client"), rpc(server, namespace = "dev_alp"))]
#[cfg_attr(feature = "client", rpc(server, client, namespace = "dev_alp"))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using strata as the namespace prefix now, so this would probably make sense to be strataprover.

Comment on lines 24 to 37
alpen-express-btcio.workspace = true
alpen-express-primitives.workspace = true
anyhow = { workspace = true }
argh = { workspace = true }
async-trait = { workspace = true }
bincode = { workspace = true }
bitcoin.workspace = true
borsh = { workspace = true }
jsonrpsee = { workspace = true, features = ["http-client"] }
rayon = "1.8.0"
reth-rpc-types = { workspace = true }
rockbound = { workspace = true }
serde = { workspace = true }
strata-tx-parser.workspace = true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use { workspace = true } syntax.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this isn't correct anymore.

Comment on lines 82 to 140
let rpc_context = RpcContext::new(
btc_dispatcher.clone(),
el_dispatcher.clone(),
cl_dispatcher.clone(),
);

// Run dispatchers in background
tokio::spawn(async move { btc_dispatcher.start().await });
tokio::spawn(async move { el_dispatcher.start().await });
tokio::spawn(async move { cl_dispatcher.start().await });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data flow here is concerning. See comment on dispatcher type.

zkvm_manager.add_vm(ProofVm::CLAggregation, vec![]);
zkvm_manager.add_vm(ProofVm::Checkpoint, GUEST_CHECKPOINT_ELF.into());

Self {
pool: rayon::ThreadPoolBuilder::new()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using threadpool elsewhere. Is there something rayon provides that's needed beyond this?

Comment on lines 46 to 65
let btc_client = Arc::new(
BitcoinClient::new(
args.get_btc_rpc_url(),
args.bitcoind_user.clone(),
args.bitcoind_password.clone(),
)
.unwrap(),
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should create a ticker to fetch block data needed for proving from the rollup client. Ideally we shouldn't have to rely on the Bitcoin client being there since we might be pulling data from multiple places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollup only stores the transactions relevant to us, but we need all the transactions for the completeness proof.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd be storing the full block data and/or fetching it from the Bitcoin client as a proxy. The idea is to abstract out how blocks are actually stored.

@MdTeach MdTeach force-pushed the STR-304-migrate-prover-guest-code branch from 4c4dfad to 045c043 Compare October 2, 2024 08:07
Copy link
Collaborator

@delbonis delbonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way get_btc_params works is really not good. For starters it's just incorrect, it's hardcoded to always return MAINNET. At the very least we should be specifying the network in the RollupParams. It would make some more sense to group together the specific sections needed for specific proofs into dedicated sub-params structs so we don't need to pass around the whole structure. I am not sure how the making it hardcoded vs configurable will interact with the rollup params bookkeeping. This is a longer discussion to have in more detail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really needs to be made configurable fwiw.

Comment on lines 14 to 24
#[derive(Debug, Clone)]
#[allow(dead_code)]
#[allow(clippy::large_enum_variant)]
pub enum ProverInput {
BtcBlock(Block, Vec<TxFilterRule>),
ElBlock(WitnessData),
ClBlock(WitnessData),
ClBlock(CLProverInput),
L1Batch(L1BatchInput),
L2Batch(L2BatchInput),
Checkpoint(CheckpointInput),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to mingle all these together here or would it make more sense to pass around the serialized representation and then decode it when it comes time to try to use it? That way we can add more types of proofs without going and adding special handling in all this code around the edges. Curious about your thoughts.

pub struct ProvingTask {
pub id: Uuid,
pub prover_input: ProverInput,
pub status: ProvingTaskStatus,
pub dependencies: Vec<Uuid>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we ensure these are sorted? Or do we not anticipate there being very many of these?

Also, should we expose an RPC to query the status of a proving task by its UUID and also return its whole dependency tree?

.build()?;
let (proof, vk) = vm.prove(input)?;
let agg_input = ProofWithVkey::new(proof, vk);
Ok(agg_input)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was referring to in the comment on ProverInput. I would imagine it would be better to split each of these kinds of proofs into their own traits and then have some kind of "proof registry" (which also might contain the ZkVMManager type? not sure how this is used) with nice typing and shims that handle the decoding generically. Instead we have to manually do each of the different cases one by one.

I see the ProvingOperations trait kinda goes in the direction of what I'm describing here. Could we lean into that?

@@ -194,33 +294,37 @@ where
"Witness for task id {:?}, submitted multiple times.",
task_id,
)),
ProvingTaskState::Err(e) => Err(e),
ProvingTaskState::Err(e) => Err(anyhow::anyhow!(e)),
}
}

pub(crate) fn get_proof_submission_status_and_remove_on_success(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather wordy. Why do we want to remove it on success? What if the caller crashes immediately after calling this function and we lose the proof? These proofs aren't that large, it shouldn't be harmful to keep them around a little bit longer than we strictly need to if it improves resiliency.

Comment on lines +13 to 18
if [ ! -z $PROVER_TEST ]; then
echo "Running on prover mode"
# cargo build -F "prover"
cargo build -F "prover" --release
export PATH=$(realpath ../target/release/):$PATH
else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting a little screwy here. Looks like a tabs vs spaces issue.

@@ -556,6 +556,17 @@ impl<D: Database + Send + Sync + 'static> AlpenApiServer for AlpenRpcImpl<D> {
Ok(batch_comm.map(|bc| bc.checkpoint().clone().into()))
}

async fn get_last_checkpoint_info(&self) -> RpcResult<Option<RpcCheckpointInfo>> {
let latest_check_point = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhhh don't forget this.

Comment on lines 203 to +206
/// L1 height the checkpoint covers
pub l1_height: u64,
pub l1_range: (u64, u64),
/// L2 height the checkpoint covers
pub l2_height: u64,
pub l2_range: (u64, u64),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to define if these are open/closed/half-open. Should add accessors that make it clear what the intention is instead of doing .0/.1.

@MdTeach MdTeach force-pushed the STR-304-migrate-prover-guest-code branch from 7f9e7fe to feae6b4 Compare October 4, 2024 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants