Overlapping resolveServer() calls lead to a crash #437

KitsuneRal · 2021-01-07T16:15:32Z

Apparently, the jobs somehow get entangled with each other, with an immediate cause of the crash being the finished() handler finding that d->resolveJob is already null.

The text was updated successfully, but these errors were encountered:

KitsuneRal · 2021-01-07T20:05:54Z

The sequence of events leading to the crash (quite reliably) turned out to be somewhat complicated.

Connection::resolveServer() uses isJobRunning() to find out if the ongoing job should be abandon()ed. However, when a job waits for a retry isJobRunning() returns false because the job is not in Pending state. As a result, the previous job is not cancelled and proceeds to retry.
At this point in time d->baseUrl is already set by the next resolveServer() invocation, and since the job constructs its request at the moment of making it (even if retrying), the new request goes to the new base URL.
As a result, two jobs end up requesting the same server; whichever of them returns first, triggers the lambda handling its finished() signal. If the request was successful or returned 404, the handler calls setHomeserver() which resets d->resolverJob to nullptr.
The other job returns, enters its finished() handler which tries to check the result using d->resolverJob, leading to a segmentation fault.

Fixes #437.

See #437 for the discussion. (cherry picked from commit 6101971)

To be very clear what this function checks. See also #437.

See #437 for the discussion.

KitsuneRal · 2021-01-08T07:11:01Z

The above exposed a few issues:

isJobRunning() shouldn't return false during retry interval. The intention of this facility function is to help with cases where, e.g., the next iteration of a job is invoked from the handler of the previous iteration. A retry means that the previous iteration is not over, and none of its completion handlers (even finished()) was triggered. This is fixed in aa79040 (master, with renaming to isJobPending() in the next commit) and 12e00b2 (0.6.x).
Explicitly resetting a QPointer<> smells of violating the separation of concerns, unless the intention is to explicitly dismiss the ownership/observation for some reason. Abandoning a job leads to resetting the pointer anyway - but only if it still points to that job, which is right. It was enough to not reset d->resolveJob in setHomeserver() for this crash not to happen (and it doesn't break anything, it's not even a pessimisation). This is fixed in 6101971 (master) and 41e57c9 (0.6.x)
The next piece that would render the whole sequence impossible is late binding of base URL in jobs. There's no particular reason for that, except not having to store the network request (or at least the base URL) between retries. On the other hand, late binding in theory may lead to obscure effects around changing the homeserver but that's a corner case: setHomeserver() is only supposed to be called during initial stages of Connection lifecycle, and once the access token is obtained, the homeserver really shouldn't change (see Apply proper RAII to Connection #439). Perhaps base URL should really be expected to be generally invariant, and the server resolution case be carefully special-cased (as it is by now).
As a side note not directly affecting the issue, the handler in resolveServer() doesn't treat abandoning the job as a special case, leading to emission of resolveError() in such cases. This is not exactly correct; 6af9ae2 (master) and be00308 (0.6.x) fix that.

KitsuneRal added the crash A crash occurs in the library code label Jan 7, 2021

KitsuneRal self-assigned this Jan 7, 2021

KitsuneRal changed the title ~~Closely following resolveServer() calls lead to a crash~~ Overlapping resolveServer() calls lead to a crash Jan 7, 2021

KitsuneRal added a commit that referenced this issue Jan 7, 2021

BaseJob: setStatus(Pending) on scheduling a retry

12e00b2

Fixes #437.

KitsuneRal added a commit that referenced this issue Jan 7, 2021

Connection: don't explicitly reset QPointers

41e57c9

See #437 for the discussion. (cherry picked from commit 6101971)

KitsuneRal closed this as completed in aa79040 Jan 8, 2021

KitsuneRal added a commit that referenced this issue Jan 8, 2021

isJobRunning() -> isJobPending()

78a3137

To be very clear what this function checks. See also #437.

KitsuneRal added a commit that referenced this issue Jan 8, 2021

Connection: don't explicitly reset QPointers

6101971

See #437 for the discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overlapping resolveServer() calls lead to a crash #437

Overlapping resolveServer() calls lead to a crash #437

KitsuneRal commented Jan 7, 2021 •

edited

Loading

KitsuneRal commented Jan 7, 2021

KitsuneRal commented Jan 8, 2021 •

edited

Loading

Overlapping resolveServer() calls lead to a crash #437

Overlapping resolveServer() calls lead to a crash #437

Comments

KitsuneRal commented Jan 7, 2021 • edited Loading

KitsuneRal commented Jan 7, 2021

KitsuneRal commented Jan 8, 2021 • edited Loading

KitsuneRal commented Jan 7, 2021 •

edited

Loading

KitsuneRal commented Jan 8, 2021 •

edited

Loading