Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid assertion issues grouping #537

Merged
merged 31 commits into from
Jun 10, 2024
Merged

Conversation

tustanivsky
Copy link
Collaborator

@tustanivsky tustanivsky commented Apr 10, 2024

This PR adds a custom FOutputDeviceError implementation which should allow to hook into Unreal's error handling flow and override assert reporting logic.

By default, an exception is raised whenever assert is fired, which is then captured by crashpad/breakpad backend. The topmost callstack frames for such events are always the same preventing them from being grouped properly and they all appear under a single issue at Sentry.

To avoid this situation instead of crashing the app a corresponding event can be constructed manually and reported via CaptureEvent. Once it's done the app is forced to shut down which somewhat repeats the original flow.

The current problem with the approach suggested here is that the engine's error handling is mixed up with some platform-specific logic (i.e. logging, restoring UI, etc.) which leads to skipping some parts of it when using the new FSentryOutputDeviceError.

Closes #489
Closes #514
Closes #1

@tustanivsky
Copy link
Collaborator Author

On Android assertions are intercepted by FSentryOutputDeviceError which also sets several global tags used for further analysis during the beforeSend on the next application launch. Those tags allow us to identify whether the assertion occurred and how many call stack frames must be cut off. This change also fixes the #1

@Edstub207
Copy link
Contributor

@tustanivsky is there an update on this change?

Copy link
Contributor

@bitsandfoxes bitsandfoxes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the effort put into this! It'll be a huge QOL improvement!
There are just a couple of things that concern me:

  1. Addition to CaptureException/Assertion/Ensure to the public API. As far as I can tell they are getting used by the OutputDevice? But since they are public and user facing they don't make much sense to me - especially with the skip frames in the signature.
  2. I'm not a huge fan of modifying the stacktrace on the client to "clean it up" as this gets messy easily. Updates and changes to the engine might require changes to the SDK which will require users to update their project e.t.c. So ideally, this would be done by the server. Since this works and we don't have an alternative in place I'm fine with adding this to the SDK but it should be done in a way that's easy for us to remove/replace. And it should be done away from anything public.

CHANGELOG.md Outdated Show resolved Hide resolved
@@ -89,6 +94,24 @@ public Double sample(SamplingContext samplingContext) {
});
}

private static void checkForUnrealException(SentryEvent event) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a whole lot more than just check for an Unreal exception as it returns nothing but modifies an event.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the naming could be better here - changed it to a more generic preProcessEvent yet I'm open to considering other options.

@@ -169,6 +169,32 @@ USentryId* SentrySubsystemAndroid::CaptureEventWithScope(USentryEvent* event, co
return SentryConvertorsAndroid::SentryIdToUnreal(*id);
}

USentryId* SentrySubsystemAndroid::CaptureException(const FString& type, const FString& message, int32 framesToSkip)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this. CaptureException does not actually capture an exception. Instead it adds a whole bunch of markers that are later getting picked up by the bridge?

@@ -153,6 +153,7 @@ void SentrySubsystemDesktop::InitWithSettings(const USentrySettings* settings, U
sentry_options_set_max_breadcrumbs(options, settings->MaxBreadcrumbs);
sentry_options_set_before_send(options, HandleBeforeSend, this);
sentry_options_set_on_crash(options, HandleBeforeCrash, this);
sentry_options_set_shutdown_timeout(options, 3000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this coming from?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, UE caused the app to crash whenever an assertion failed and the Crashpad captured these crashes for us. We've changed how assertions are handled so now when it fails a timeout is used to ensure a corresponding event is sent to Sentry within the same session before shutting down the app manually.

Co-authored-by: Stefan Jandl <[email protected]>
@bitsandfoxes
Copy link
Contributor

One more thing that comes to mind: We can mark those engine frames as InAppExclude so those won't get considered by the grouping algorithm. And those should be able to be propagated to the respective SDKs.

@tustanivsky
Copy link
Collaborator Author

Since there are quite a few questions regarding the assertions/ensures capturing on Android I'll try to address these with the following explanation:

The general idea of this PR is to hook into Unreal's error handling process using a custom FSentryOutputDeviceError so that whenever assertion is fired we capture a manually created exception event with a proper callstack and shut down the application.

The problem with this approach on Android is that Java event's callstack doesn't contain any C++ frames and thus it's not informative at all. There's no straightforward way to convert a native callstack which we can obtain on Unreal's side to a corresponding Java structure to override the default one either. However, instead of reporting assert in a described way within a single session we can set some flags identifying that it happened and force the app to crash. This allows to pass things over to sentry-native which captures a native crash with a "normal" C++ callstack. Now on the next application launch in beforeSend handler we verify if the crash was an Unreal assertion by checking aforementioned flags and cut off the redundant callstack part if there's any. Also, these flags (if any) should be removed and not get sent to Sentry since we need them only for client-side event pre-processing.

The assertion-specific flags are set to an event via tags and not context since sentry-java can't serialize the latter for native crashes and in this case the required values are lost upon the next application launch. I believe this is a known limitation of Java SDK which we've encountered a couple of times in the past.

@bitsandfoxes bitsandfoxes merged commit 51bea9b into main Jun 10, 2024
16 checks passed
@bitsandfoxes bitsandfoxes deleted the fix/desktop-callstack branch June 10, 2024 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants