-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option for Receipt Chunking #450
Comments
Just to move some conversation out here from an internal conversation: instead of producing a stream with a seq number, using a CID link for outputs over some threshold lets the receiptent tune how they stream in the output. Receipts should never be used for storage, as they can be GCed at any time (i.e. storage is not promised), and the outer shell of receipts should be kept as small as humanly possible so that they can be gossiped. |
Going to close this on @expede's comment and open up one around receipt size specifically. Only thing to re-highlight is the key that storage is not promised in the blockstore or receipt context. Storage is essentially a choice, but pruning/GC will happen at a default clip. |
Summary
As @chadkoh re-pointed out, should we need to chunk receipts, as outputs can be very large and may not be transmitted over the network.
Effect(s) to the Rescue?
As @matheus23 keenly noted, do we need this if we push toward using the state effect (with our block store underneath, e.g. #189, CA (content-addressed) I/O) for outputs of a certain maximum length? The answer is we probably don't need chunking then, at least not right away.
☝🏽 using the state effect opens up the possibility of runners providing the capability to act as a storage provider, i.e. keeping things in the provided block store for some n length of time for reuse, vs the general rule to clean up post workflow run (or eventually GC on maxed out failures). The other option is to use another effect, like HTTP post, or some upload to a trusted provider (stored as a CA) to make the content output available elsewhere for some amount of time.
On the most generalized level, we may want to do both, effect-driven vs receipt-only-driven, as the latter can actually be helpful for parallelization tasks. But, with @matheus23's point, this is more of an enhancement at the moment, while we begin work on #189.
Solution (if chunking)?
The initial idea I had was to incorporate monotonic sequence numbers inside a receipt (currently, not spec'ed), and a total number/count. A non-chunked receipt always starts at 0. Chunked ones update the sequence number.
Upon lookup of the instruction cid, if multiple receipts are read, then the output has to be stitched together (by sequence number) to be used as an input to another function. Essentially, HELLO TCP!
Components
out
byte size is greater than a transmit maximum (not generally configurable). We'll probably only do this on byte buffers vs other types. If it's maxed out, generate multiple receipts covering the spliced points of the output, chunked into even sizes (as is possible, for reuse).ran
CIDs should be the same for all these receipts. The receipt CIDS will be different, respectively.The text was updated successfully, but these errors were encountered: