Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quarantine forms due to malformed xml as unrecoverable #2859

Merged

Conversation

avazirna
Copy link
Contributor

@avazirna avazirna commented Sep 18, 2024

Summary

Errors that occur when saving the forms to disk are categorized under a single umbrella status SAVE_ERROR, which causes forms to be quarantined due to LOCAL_PROCESSING_ERROR motive, which in turn prevents forms from being sent back to the queue for reprocessing. This PR adds a new SaveStatus, SAVE_UNRECOVERABLE_ERROR, to be used when quarantining forms due to malformed or corrupted form XML, leaving SAVE_ERROR to any other errors that are deemed recoverable. These forms are quarantine as RECORD_ERROR and can be added to the submissions queue to be reprocessed.

This PR was motivated by an NPE that occurred when sending a data update broadcast during the Save to disk process, fixed in this PR, which caused the form to be quarantined in an unrecoverable manner. With this change, and while not applicable anymore due to the PR above, in case the same were to occur, the expectation is that the exception will bubble up as it did before, get caught by the generic catch clause in CommCareTask.doInBackground(), return null and subsequently save the form with a SAVE_ERROR status, the difference is that this status will now cause the form to be quarantined as RECORD_ERROR.

cross-request: dimagi/commcare-core#1437

Safety Assurance

  • If the PR is high risk, "High Risk" label is set
  • I have confidence that this PR will not introduce a regression for the reasons below
  • Do we need to enhance manual QA test coverage ? If yes, "QA Note" label is set correctly

Safety story

Forms can be quarantined for various reasons and in order to test a few of the scenarios covered by this change, IllegalStateException and RuntimeException were purposely thrown at specific areas to simulate common real issues. Those quarantined due to RuntimeException were then sent back to the queue and submitted to HQ while the IllegalStateException ones didn't have the option to do so.

@avazirna
Copy link
Contributor Author

@damagatchi retest this please

Copy link
Contributor

@shubham1g5 shubham1g5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safety story should detail out the testing performed for this change.

@@ -126,7 +127,7 @@ protected ResultAndError<SaveStatus> doTaskBackground(Void... nothing) {
// Likely a user level issue, so send error to HQ as a app build error
XPathErrorLogger.INSTANCE.logErrorToCurrentApp(cleanedMessage);

return new ResultAndError<>(SaveStatus.SAVE_ERROR, cleanedMessage);
return new ResultAndError<>(SaveStatus.SAVE_UNRECOVERABLE_ERROR, cleanedMessage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think all the places where we are throwing SaveStatus.SAVE_ERROR above should be SaveStatus.SAVE_UNRECOVERABLE_ERROR with this change.

@@ -126,7 +127,7 @@ protected ResultAndError<SaveStatus> doTaskBackground(Void... nothing) {
// Likely a user level issue, so send error to HQ as a app build error
XPathErrorLogger.INSTANCE.logErrorToCurrentApp(cleanedMessage);

return new ResultAndError<>(SaveStatus.SAVE_ERROR, cleanedMessage);
return new ResultAndError<>(SaveStatus.SAVE_UNRECOVERABLE_ERROR, cleanedMessage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also would you be able to provide more details around the error pathway you think the crash linked in the PR will surface as in this method.

@avazirna
Copy link
Contributor Author

Safety story should detail out the testing performed for this change.

hi @shubham1g5 I have added more details to the description of the PR, including the way I tested the changes.

@shubham1g5
Copy link
Contributor

Those quarantined due to RuntimeException were then sent back to the queue and submitted to HQ

Where exactly in code are we quarantining the forms erroring due to RuntimeException ?

@avazirna
Copy link
Contributor Author

Those quarantined due to RuntimeException were then sent back to the queue and submitted to HQ

Where exactly in code are we quarantining the forms erroring due to RuntimeException ?

Here, based on the SaveStatus we decide the quarantine reason, QuarantineReason_LOCAL_PROCESSING_ERROR for unrecoverable errors and QuarantineReason_RECORD_ERROR otherwise.

@shubham1g5
Copy link
Contributor

shubham1g5 commented Sep 19, 2024

Here, based on the SaveStatus we decide the quarantine reason, QuarantineReason_LOCAL_PROCESSING_ERROR for unrecoverable errors and QuarantineReason_RECORD_ERROR otherwise.

@avazirna How is that related to RuntimeException or how does RuntimeException gets represented as SAVE_ERROR or SAVE_UNRECOVERABLE_ERROR ?

@avazirna
Copy link
Contributor Author

avazirna commented Sep 19, 2024

@avazirna How is that related to RuntimeException or how does RuntimeException gets represented as SAVE_ERROR or SAVE_UNRECOVERABLE_ERROR ?

Using the same method where the NPE was thrown as example, the exception is caught by this generic catch which then wraps the exception in a RuntimeException that is then caught by this other generic catch, this is then wrapped in a SQLException that bubbles all the way up until CommCareTask.doInBackground(). There is gets caught by this catch clause and returns null. With the result null, onPostExecute saves the form with SaveStatus.SAVE_ERROR.

Copy link
Contributor

@shubham1g5 shubham1g5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation.

if (!CommCareApplication.instance().isConsumerApp()) {
new UserfacingErrorHandling<>().createErrorDialog(this, errorMessage,
Localization.get("notification.formentry.save_error.title"),
FormEntryConstants.EXIT);
}
quarantineRecordOnError(errorMessage);
String reasonType = (saveStatus == SAVE_UNRECOVERABLE_ERROR) ? QuarantineReason_LOCAL_PROCESSING_ERROR
: QuarantineReason_RECORD_ERROR ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:formatting

@avazirna
Copy link
Contributor Author

@damagatchi retest this please

@avazirna avazirna merged commit 503c57c into feature/givewell Sep 20, 2024
1 check failed
@avazirna avazirna deleted the quarantine-malformed-form-xml-as-unrecoverable branch September 20, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants