Wait to mark reports as aggregated until just before committing #374

cjpatton · 2023-07-21T19:04:56Z

Loads that require a lot of CPU time can trigger the Workers runtime to reset itself, causing a 500 error. This is most likely to happen when processing the AggregationJobInitReq, as this involves the expensive, VDAF prep initialization step. Some reports may still get marked as aggregated; if the peer retries the request, those reports will get rejected.

This problem is so common that we need to make this request idempotent. As a first step, wait to mark the reports as aggregated until just before we have committed to aggregating them. In particular, instead of doing this in DapReportInitializer::initialize_reports(), wait until DapAggregator::put_out_shares().

cjpatton · 2023-07-21T19:05:30Z

daphne/src/lib.rs

-                max_time: out_share.time,
-                checksum: out_share.checksum,
-                data: Some(out_share.data),
-            })?;


We can't aggregate the output shares yet because we still may need to reject in case of replay.

cjpatton · 2023-07-21T19:07:39Z

daphne_worker/src/dap.rs

-                    // replayed. This would happen if, for example, a fresh report was marked as
-                    // aggregated in the failed request, then marked as replayed by a retried
-                    // request. In this case we cannot determine definitively if the report was
-                    // replayed, so we drop it to be safe.


We no longer mark the reports as aggregated, so this logic for handling failed DO requests is no longer necessary. Retry should be safe.

daphne/src/roles/helper.rs

bhalleycf

This makes sense to me, and I don't see any problems.

Loads that require a lot of CPU time can trigger the Workers runtime to reset itself, causing a 500 error. This is most likely to happen when processing the AggregationJobInitReq, as this involves the expensive, VDAF prep initialization step. Some reports may still get marked as aggregated; if the peer retries the request, those reports will get rejected. This problem is so common that we need to make this request idempotent. As a first step, wait to mark the reports as aggregated until just before we have committed to aggregating them. In particular, instead of doing this in `DapReportInitializer::initialize_reports()`, wait until `DapAggregator::put_out_shares()`.

cjpatton commented Jul 21, 2023

View reviewed changes

cjpatton marked this pull request as ready for review July 21, 2023 19:09

cjpatton requested review from armfazh, bhalleycf, captain-mota, chris-wood, mendess and will118 as code owners July 21, 2023 19:09

bhalleycf reviewed Jul 21, 2023

View reviewed changes

daphne/src/roles/helper.rs Outdated Show resolved Hide resolved

bhalleycf approved these changes Jul 21, 2023

View reviewed changes

cjpatton force-pushed the cjpatton/defer-mark-aggregated branch from 01b1bf9 to 7c08da7 Compare July 21, 2023 19:51

bhalleycf approved these changes Jul 21, 2023

View reviewed changes

cjpatton merged commit e6d1c2b into main Jul 21, 2023
4 checks passed

cjpatton deleted the cjpatton/defer-mark-aggregated branch July 21, 2023 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait to mark reports as aggregated until just before committing #374

Wait to mark reports as aggregated until just before committing #374

cjpatton commented Jul 21, 2023

cjpatton Jul 21, 2023 •

edited

Loading

cjpatton Jul 21, 2023

bhalleycf left a comment

Wait to mark reports as aggregated until just before committing #374

Wait to mark reports as aggregated until just before committing #374

Conversation

cjpatton commented Jul 21, 2023

cjpatton Jul 21, 2023 • edited Loading

Choose a reason for hiding this comment

cjpatton Jul 21, 2023

Choose a reason for hiding this comment

bhalleycf left a comment

Choose a reason for hiding this comment

cjpatton Jul 21, 2023 •

edited

Loading