When offline runs of multiple Entity-related submissions come in out of order, there are conflicts and Entity state can be broken #669

lognaturel · 2024-05-16T04:26:12Z

Problem description

This issue is to add support to Central for offline runs of multiple submissions from the same client changing the same Entity.

Currently clients create and update Entities via form submissions. There is nothing in the OpenRosa spec that mandates a certain order for submissions and submissions can be manually sent out of order. Submission failures can also lead to the server receiving submissions out of order.

Additionally, it's possible for updates to come in from multiple sources while one or more of those are offline. The most likely example is that an entity is created, it's downloaded by a client that goes offline, it's updated on the server manually, and then the client submits its updates.

We'd like Central to hold submissions it knows are out of order and then apply them once their predecessors are successfully processed.

To satisfy this need, we will likely need two Entity version concepts instead of the single one that baseVersion currently represents: the last version received from the server and the last version on the device. We will also likely need a way to identify individual clients. This could be deviceID which is sent with every submission or a separate "branch ID" concept added in the form spec.

TODO (part 1)

Parse and validate branch ID, baseVersion (client version), and trunkVersion (server version)
Store branch ID, trunk version, and branch base version on the entity_defs row. Add a uniqueness constraint on the combination of the two.
Return branch ID, trunk version, and branch base version over the API.
Add a representation to the database of an entity "queue" / purgatory / holding area for entity changes that should not be processed yet.
When a submission includes an entity change from an offline run, and the base version is higher than the trunk version, check whether the previous change in the run has been processed. If not, add the change to the queue.
Whenever an entity change from an offline run is processed, check whether the next change in the run is currently in the queue. If it is, process it.
Mark an entity version as a conflict if the logical base version is not the previous version on the server.

TODO (part 2)

Account for the fact that a submission can be reprocessed if datasetApproval is toggled. If a submission is being processed for entities, and the submission has already had the effect of adding an offline update to the queue, then processing should short-circuit without error. (Check approvalRequired reprocessing with offline entity queue central-backend#1162)
Offline update after offline create (with null trunkVersion) is buggy - (form was buggy, not Central)
~~Reprocessing action in audit log shows a "Comment by X" event in frontend log.~~ --> Exclude submission.reprocess events #686
~~Process submissions for entities where the entities specification version is 2024.1.~~ --> Enforce entities spec version 2024.1 for submissions with offline actions #681
~~Add a task to apply changes that have been in the queue without being processed for a certain amount of time. Run the task on a regular basis.~~ --> Add a task to process offline entity submissions that have been held for a certain amount of time #682
Update frontend to show more information about offline branches, especially when involved in conflicts or when there is no single entity version on the server that exactly represents what the data collector saw (the "author's view"). --> Update frontend to show more information about offline branches #683

High-level goals

Provide Central with enough information to reconstruct the precise history of offline entities. That might be needed in the future for certain use cases or to make the entity history clearer in Frontend.
At the same time, minimize changes for now to the API and to Frontend. That means continuing to represent entity history as flat.

Identifying changes

To uniquely identify each change within a branch of offline changes to the same entity, we need to identify

the last known version from the server (known as trunkVersion)
the branchId to keep updates in the same branch together. this will be a UUID.
the ordering of each change, captured through using baseVersion as before, except now baseVersion is context-specific and will refer to the client's/Collect's version of the entity that may have been updated offline.

These three fields (branchId, trunkVersion, and existing baseVersion) will be enough to process a branch of offline updates (even starting with an offline create) in the correct order.

Additional terms

Trunk version. This is the latest version of the entity from the server that was known to the client before it began its offline run.
Current server version. The latest version of the entity currently on the server.
Prior local version. This is defined for the second or later submission in a branch, where baseVersion > trunkVersion (instead of being equal to it, which it will be at the start). It is the version that corresponds to the previous change in the run.

Processing changes from offline runs

When processing an entity change via submission, Central will check the XML for branchId, trunkVersion and baseVersion:

If the trunkVersion and baseVersion are the same, then Central will immediately apply the change as it does today, because this is an update to a version known to the server.
If baseVersion is higher than trunkVersion, then we know that the change is specifically an update. Central will look for an entity version that is in the same branch and is meant to be the base version, though the baseVersion will be translated to be the correct version within the context of Central, rather than Collect.
- If such an entity version exists, Central will immediately apply the update being processed.
- Otherwise, it will add the update being processed to a queue to be applied later.
  - Whenever an entity change is processed successfully, Central will check this table for a row with the same branch ID and with an incremented base version. If Central finds such a row, it will reprocess the associated submission. (Probably "queue" isn't the right term for this table...)
Central will store the branch ID, trunk version, and branch base version on the entity version (on the entity_defs row).
Central will not wait for all the submissions in an offline run to be received before processing earlier submissions in the run. As soon as a submission is received, it will be processed if it can be, and its change will be applied. That means that if two clients make separate offline runs to the same entity, and their submissions are sent at the same time, their two runs could end up being interwoven in the entity history.

After an entity update has been in the queue for a certain amount of time, it should be applied even if the prior local change has not been processed. For this to work, Collect will need to specify the base server version for every change in an offline run. Otherwise, Central might not know how to set the baseVersion property of updates applied out-of-order from the queue. Central needs to know which version to apply the update against. Collect will not specify a base server version if the entity was created as part of the offline run.

If an entity update is applied from the queue, but the entity has not been created yet, then Central should create the entity. That can happen if the entity was created as part of the offline run, but the first change from the run has not been processed yet. If the entity update does not specify a label, then either the UUID or a fixed label (e.g., "Label unknown") should be used.

Keeping the following notes, but I think by translating baseVersion from Collect's context to Central's context, the conflict detection doesn't have to change.

There is still the case that an offline branch of multiple updates combined with some online updates or another offline branch could still show confusing information, e.g. a diff about a situation that no one ever actually saw, but it should mostly work out okay.

Conflict properties

We need to account for the following properties related to conflicts that are returned over the API: the conflict type (soft/hard/null), the conflictingProperties list (for a hard conflict), baseDiff, and serverDiff.

When Central processes an entity update with a run index > 1, then the conflict type of the resulting entity version is null if the current server version of the entity is the prior local version. Otherwise, the update being processed is a conflict (either soft or hard).
- The conflict is hard if the label or a property specified by the update being processed differs from the current server version, and the label/property of the current server version also differs from the prior local version (+ maybe some details about new properties). I think that's how we define it today (source), so no change to that logic would be needed.
serverDiff would continue to be the diff between the current server version and the update being processed. That's what's shown in Frontend under "server's view".
If run index > 1, the baseVersion returned by the API would be the version property of the prior local version (as set by the server when the prior local change was processed). If the update is processed out-of-order, the baseVersion would be the version property of the latest version from the run that's on the server and has a lower run index. If no version from the run is on the server with a lower run index, baseVersion would be the base server version.

The problem with `baseDiff`

baseDiff is what's shown in Frontend under "author's view". baseDiff poses challenges for run index > 1. We can't simply diff the prior local version and the update being processed, because the prior local version will include updates after the base server version. baseDiff is an array of property names. For example:

Version 1: An entity is created via bulk upload. It has a label of "Elm" and a height property of 10.
An offline run is started. First, a submission is created to update the label to "elm".
A second submission is created to change the label to "ELM" and the height to 11.
Version 2: Before the submissions are sent, the height property is updated to 12.
The submissions are sent.
Version 3: The first update via submission is processed. It is a soft conflict. Now label = "elm" and height = 12.
Version 4: The second update via submission is processed. It is a hard conflict on height. Now label = "ELM" and height = 11.
- The baseVersion returned by the API will be 3.
- The serverDiff will be ['label', 'height'].
- The baseDiff would also be ['label', 'height']. The trouble in this case is that Frontend will show all the old values from the base version (version 3): it will show that in the author's view, version 4 changed the label from "elm" to "ELM" and the height from 12 to 11. However, it actually changed the height from 10 to 11. The old value of the label ("elm") comes from version 3, but the old value of the height (10) comes from version 1. This is a side effect of flattening entity history: there is no single entity version that corresponds to what the author saw. There's no single version that can be diffed against.

To solve this, I think baseDiff would need to be an object of entity properties (names + values), not an array of property names. Backend could probably calculate such an object on the fly. But for now, Backend just shouldn't try to calculate the baseDiff if it identifies a problem case (something like: run index > 1 and any previous version from the same run is a conflict). We should still show the "Author's view" tab in Frontend, but the tab should show a description of what happened.

On the other hand, if there isn't a problem — if there is no entity update between the base server version and the first version from the run, and no change was applied out of order from the queue — then Backend should continue to return baseDiff as it does today.

The text was updated successfully, but these errors were encountered:

ktuite · 2024-07-18T21:24:42Z

I'm marking this issue as done/closed. Remaining tasks are in their own issues.

lognaturel added the needs discussion Discussion needed before work can begin label May 16, 2024

matthew-white added backend Requires a change to the API server entities Multiple Encounter workflows frontend Requires a change to the UI enhancement New feature or behavior and removed needs discussion Discussion needed before work can begin labels May 17, 2024

matthew-white assigned ktuite May 23, 2024

ktuite mentioned this issue May 29, 2024

Processing out of order/offline entities by runId and runIndex getodk/central-backend#1148

Closed

2 tasks

ktuite mentioned this issue Jun 11, 2024

Handle offline entities with baseVersion and trunkVersion getodk/central-backend#1154

Merged

2 tasks

ktuite mentioned this issue Jul 11, 2024

Check approvalRequired reprocessing with offline entity queue getodk/central-backend#1162

Merged

2 tasks

ktuite linked a pull request Jul 18, 2024 that will close this issue

Handle offline entities with baseVersion and trunkVersion getodk/central-backend#1154

Merged

2 tasks

ktuite closed this as completed Jul 18, 2024

matthew-white mentioned this issue Aug 29, 2024

Update conflict logic for offline entities #698

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When offline runs of multiple Entity-related submissions come in out of order, there are conflicts and Entity state can be broken #669

When offline runs of multiple Entity-related submissions come in out of order, there are conflicts and Entity state can be broken #669

lognaturel commented May 16, 2024 •

edited by ktuite

Loading

ktuite commented Jul 18, 2024

When offline runs of multiple Entity-related submissions come in out of order, there are conflicts and Entity state can be broken #669

When offline runs of multiple Entity-related submissions come in out of order, there are conflicts and Entity state can be broken #669

Comments

lognaturel commented May 16, 2024 • edited by ktuite Loading

Problem description

TODO (part 1)

TODO (part 2)

High-level goals

Identifying changes

Additional terms

Processing changes from offline runs

Conflict properties

The problem with baseDiff

ktuite commented Jul 18, 2024

lognaturel commented May 16, 2024 •

edited by ktuite

Loading

The problem with `baseDiff`