Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: transaction rlp encoding #1536

Closed
wants to merge 2 commits into from

Conversation

morph-dev
Copy link
Collaborator

What was wrong?

Transaction can be rlp encoded/decoded with (less common) or without (most common) additional rlp header.

Currently, we have to deal with less common case manually, which isn't great. As of now, we have two use cases for it: inside BlockBody and as block size inside eth_getBlockBy* calls (which is not implemented).

How was it fixed?

Added const generic argument to the Transaction struct.

The default value represents the most common case, which means that we don't have to specify it in most of the cases.

To-Do

@morph-dev
Copy link
Collaborator Author

I'm not extremely happy with this approach, as it complicates the Transaction object, which is very commonly used (even tho complexity is mostly hidden away).

The only other reasonable approach that I see is to have another type (e.g. TransactionWithRlpHeader, up for a better name suggestion) that would be a wrapper around existing Transaction with its own Encodable/Decodable implementation.

@KolbyML
Copy link
Member

KolbyML commented Oct 18, 2024

I'm not extremely happy with this approach, as it complicates the Transaction object, which is very commonly used (even tho complexity is mostly hidden away).

The only other reasonable approach that I see is to have another type (e.g. TransactionWithRlpHeader, up for a better name suggestion) that would be a wrapper around existing Transaction with its own Encodable/Decodable implementation.

Couldn't we implement alloy_rlp length trait with the pre-existing code as well. If neither of the proposed refactors are a "happy approaches" we can just implement the length trait and call it a day no? which would also be much similar than the proposed changes.

I think the manual rlp calls instead of deriving them is fine because as you said the "manual" path is hardly used. So is it worth the complexity to implement auto derives for a rare usecase in the grand scheme of things.

@carver
Copy link
Collaborator

carver commented Oct 19, 2024

Oh yeah this is starting to come back to me, it's been about 3 years!

Since you're looking for alternatives, I'll share what we did in py-evm. We found it helpful to have a unified pre-RLP-encoded representation. So that means the primitives of byte-strings and lists (er, vectors), which can contain more byte-strings and vectors. When we correctly serialize into this format, then the RLP library should handle the actual encoding for us (by prepending the length of the payload recursively).

So the approach we took in py-evm was to offer two separate methods on the global transaction type:

  • serialize(): generate primitives that are suitable for RLP encoding
  • encode(): generate a self-contained byte-string representing a single transaction, useful for saving to a database, etc

For a legacy transaction, that looks like:

  • serialize: return a vector of byte-strings, with an entry for each field of the transaction
  • encode: rlp.encode(self.serialize())

For a typed transaction, split it into two layers, a wrapping layer that identifies the transaction type and a payload layer that is fully defined by that type ID. The payload layer works effectively the same as a legacy transaction, for serialization.

In the wrapping layer:

  • serialize: return the self.encode() value as a byte-string at the root level, with no vector
  • encode: prepend the type byte to the output of payload.encode()

Then the standard RLP encoding should work on a list of transactions that have each been run through transaction.serialize(). I haven't messed around with the alloy::rlp library much yet, but I expect it to be straightforward once you are supplying RLP-primitive values.

This is a lot, so I'm happy to get on a call, if that would help.

@carver
Copy link
Collaborator

carver commented Oct 19, 2024

nit since this is adding a new ability in the rpc, I think naming commit something like this would be clearer:
feat: add size to block returned in eth_getBlock*

Also, it could make sense to split into a refactor: commit and a feat: commit (or PR), especially if the refactor continues to grow.

@carver
Copy link
Collaborator

carver commented Oct 19, 2024

Since you're looking for alternatives, I'll share what we did in py-evm.

Hm, the longer I look at this, the more I think it's potentially an orthogonal problem. I guess it's possible we never dealt with this funny case just when calculating block size, where just the typed transactions are double-rlp-encoded (it doesn't seem familiar to me). So I guess if I don't have any better ideas. 🤷🏻‍♂️

I guess it's somewhat related how we have to add an extra rlp-encoding to legacy transactions when calculating the transaction root.

@KolbyML

This comment was marked as outdated.

@KolbyML

This comment was marked as outdated.

@KolbyML
Copy link
Member

KolbyML commented Oct 19, 2024

Nvm I think I am wrong about the performance claims I made as after reading the implementation for the RlpEncodable macro it will implement the Encodable traits length() function which avoids any objects being encoded to RLP to get the length

So with this refactor, assuming everything implements length(), whether that is through a macro, or manually, there should be a performance increase. As all primitives should rely on .len() instead of encoding the rlp

@KolbyML
Copy link
Member

KolbyML commented Oct 19, 2024

I gave this PR a brief review. We should implement length() for the header as currently that will still use encode() to get the length as we don't use RlpEncodable macro for header.

I will give a more thorough once all the tests are passing. Sorry about the confusion in my earlier messages.

@@ -123,17 +123,16 @@ impl BlockBody {
/// Returns reference to uncle headers.
///
/// Returns None post Merge fork.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns None post Merge fork.
/// Returns None post Merge fork.

this doc is no longer true

Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: PR looks good. I think this solution is fine as I would want to avoid a wrapper type as it just seems kinda feels off to me.

let size = None;
// Note: transactions are encoded with header
let size = {
let payload_size = header.length()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let payload_size = header.length()
let payload_size = header.length()

Our Header type here doesn't use the macro or implement length in its implementation of Encodable so we are stilling encoding to find the length of Header. Would you want to implement length() for header in this PR or make an issue for it any we can solve it in another PR.

This would be the only place in calculating size where we still are encoding to get the length I believe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do it in a separate PR (no need to open issue about it, I will do it right now).

@morph-dev
Copy link
Collaborator Author

@KolbyML @carver I created another PR (very similar to this one) where I used separate type that is just a wrapper around Transaction with different RLP encoding.

Now that I see both of them, I like that one a bit more because it doesn't complicate the Transaction object that is very frequently used. But I can be convinced otherwise. So let me know what you think about both of them.

@KolbyML
Copy link
Member

KolbyML commented Oct 19, 2024

@KolbyML @carver I created another PR (very similar to this one) where I used separate type that is just a wrapper around Transaction with different RLP encoding.

Now that I see both of them, I like that one a bit more because it doesn't complicate the Transaction object that is very frequently used. But I can be convinced otherwise. So let me know what you think about both of them.

I like it more too. I thought it would look weirder in my head.

Copy link
Collaborator

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer the other approach

@morph-dev
Copy link
Collaborator Author

Closing in favor of #1539

@morph-dev morph-dev closed this Oct 20, 2024
@morph-dev morph-dev deleted the rlp_transaction branch October 20, 2024 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants