Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for Receipt Chunking #450

Closed
3 tasks
zeeshanlakhani opened this issue Nov 22, 2023 · 2 comments
Closed
3 tasks

Option for Receipt Chunking #450

zeeshanlakhani opened this issue Nov 22, 2023 · 2 comments
Labels
enhancement New feature or request workflows Related to IPVM workflows

Comments

@zeeshanlakhani
Copy link
Contributor

zeeshanlakhani commented Nov 22, 2023

Summary

As @chadkoh re-pointed out, should we need to chunk receipts, as outputs can be very large and may not be transmitted over the network.

Effect(s) to the Rescue?

As @matheus23 keenly noted, do we need this if we push toward using the state effect (with our block store underneath, e.g. #189, CA (content-addressed) I/O) for outputs of a certain maximum length? The answer is we probably don't need chunking then, at least not right away.

☝🏽 using the state effect opens up the possibility of runners providing the capability to act as a storage provider, i.e. keeping things in the provided block store for some n length of time for reuse, vs the general rule to clean up post workflow run (or eventually GC on maxed out failures). The other option is to use another effect, like HTTP post, or some upload to a trusted provider (stored as a CA) to make the content output available elsewhere for some amount of time.

On the most generalized level, we may want to do both, effect-driven vs receipt-only-driven, as the latter can actually be helpful for parallelization tasks. But, with @matheus23's point, this is more of an enhancement at the moment, while we begin work on #189.

Solution (if chunking)?

The initial idea I had was to incorporate monotonic sequence numbers inside a receipt (currently, not spec'ed), and a total number/count. A non-chunked receipt always starts at 0. Chunked ones update the sequence number.
Upon lookup of the instruction cid, if multiple receipts are read, then the output has to be stitched together (by sequence number) to be used as an input to another function. Essentially, HELLO TCP!

Components

  • Check if the receipt's out byte size is greater than a transmit maximum (not generally configurable). We'll probably only do this on byte buffers vs other types. If it's maxed out, generate multiple receipts covering the spliced points of the output, chunked into even sizes (as is possible, for reuse).
  • Upon lookup of an instruction CID, if multiple receipts are read, stitch the output together by sequence number, if if we have the correct count/total for the sequence. The instruction CIDs and task/invocation ran CIDs should be the same for all these receipts. The receipt CIDS will be different, respectively.
  • This will affect Data: DB and Blockstore GC / LRU Pruning  #264 as well, as we should prune the entire sequence group.
@zeeshanlakhani zeeshanlakhani changed the title Receipt Chunking Option for Receipt Chunking Nov 22, 2023
@zeeshanlakhani zeeshanlakhani added enhancement New feature or request workflows Related to IPVM workflows labels Nov 22, 2023
@expede
Copy link
Member

expede commented Nov 24, 2023

Just to move some conversation out here from an internal conversation: instead of producing a stream with a seq number, using a CID link for outputs over some threshold lets the receiptent tune how they stream in the output. Receipts should never be used for storage, as they can be GCed at any time (i.e. storage is not promised), and the outer shell of receipts should be kept as small as humanly possible so that they can be gossiped.

@zeeshanlakhani
Copy link
Contributor Author

Going to close this on @expede's comment and open up one around receipt size specifically. Only thing to re-highlight is the key that storage is not promised in the blockstore or receipt context. Storage is essentially a choice, but pruning/GC will happen at a default clip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request workflows Related to IPVM workflows
Projects
None yet
Development

No branches or pull requests

2 participants