Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid grouping of native crashes coming from a dynamic library #1609

Open
Augustyniak opened this issue Mar 2, 2022 · 6 comments
Open

Comments

@Augustyniak
Copy link

Augustyniak commented Mar 2, 2022

Description

Bugsnag grouping of crashes does not work well for Android native crashes that happen inside of dynamically loaded libraries. Bugsnag seems to be using a whole path for where the error happens when grouping events into errors and the path seems to depend on the device's manufacturer. This leads to a situation in which the same crash is recognized by Bugsnag as a few separate errors which makes following up on the underlying crash hard/confusing.

Example

Let's take a look at one of the native crashes that's being reported by Bugsnag as two separate errors - even though bought of them represent exactly the same crash.

The last two stacktrace entries in Bugsnag error 1 (coming from Google pixel phones):

/apex/com.android.runtime/lib64/bionic/libc.so:536992 abort
/data/app/me.lyft.android-e5wo6hyI4Ov_FwVO9wyp8g==/split_config.arm64_v8a.apk!/lib/arm64-v8a/libenvoy_jni.so:17669316 Envoy::Network::DnsResolverImpl::AddrInfoPendingResolution::availableInterfaces() [/data/app/me.lyft.android-e5wo6hyI4Ov_FwVO9wyp8g==/split_config.arm64_v8a.apk!/lib/arm64-v8a/libenvoy_jni.so:17669316 Envoy::Network::DnsResolverImpl::AddrInfoPendingResolution::availableInterfaces()]()

The last two stacktrace entries in Bugsnag error 2 (coming from Samsung 95%/LG 5%):

/apex/com.android.runtime/lib64/bionic/libc.so:325724 abort
/data/app/~~A9JzPlzk_YgX3GgLEeqI4A==/me.lyft.android-OUs7t61Y4otTWqNTPl8p4A==/split_config.arm64_v8a.apk!/lib/arm64-v8a/libenvoy_jni.so:17669316 Envoy::Network::DnsResolverImpl::AddrInfoPendingResolution::availableInterfaces()

Describe the solution you'd like
Stop using the whole path when grouping Bugsnag events - use the last path component only.

Describe alternatives you've considered
N/A

Additional context
An example stacktrace entries come from a crash that happens in an Android app running Envoy Mobile networking library. It's open source and available at https://github.com/envoyproxy/envoy-mobile in case it's helpful.

@Augustyniak Augustyniak changed the title Invalid grouping of crashes coming from a dynamic library Invalid grouping of native crashes coming from a dynamic library Mar 2, 2022
@yousif-bugsnag
Copy link
Contributor

Hi @Augustyniak, thanks for the report. Our default grouping algorithm groups events sharing the same error class, file and line number of the top in-project stack frame of the innermost exception.

I think the under-grouping you're seeing may indicate that these errors are actually being grouped based on the error class, file and line number of the first frame, e.g. SIGABRT libc.so:536992 and SIGABRT libc.so:325724 rather than the second frame from libenvoy_jni.so. This can happen if our backend was not able to distinguish between in-project and out-of-project frames for some reason.

Please could you write in to [email protected] with some links to the different error groupings in your Dashboard so that we can take a look at how the grouping algorithm was applied in these cases?

@yousif-bugsnag yousif-bugsnag added the awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks. label Mar 3, 2022
@Augustyniak
Copy link
Author

@yousif-bugsnag Thank you for a fast response. Per your suggestion I sent an email to [email protected]. 👍

@mattdyoung mattdyoung removed the awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks. label Mar 4, 2022
@Augustyniak
Copy link
Author

I wonder whether our issues where not caused by some of the issues that were fixed in https://github.com/bugsnag/bugsnag-android/releases/tag/v5.21.0 since they seem to be related to grouping of native crashes.

We've been talking with your folks from [email protected] but did not make any progress there as of yet. We are going to update the SDK.

Are you aware of any other issues with grouping of crashes in Android Bugsnag SDK that you are currently working on fixing?

@yousif-bugsnag
Copy link
Contributor

Hi @Augustyniak, hopefully we have answered your question above via the support ticket, but for visibility and future travellers, the fixes shipped in v5.21.0 would not resolve the issue you are seeing here.

The grouping issue you are seeing stems from the fact that the uploaded mapping files are symbol table mapping files - these only contain method names and so Bugsang is not able to map back to the file path and line number.

This in turn means that Bugsnag isn't able to identify which frames are in or out of project (and should therefore be ignored for grouping) based on the stack frame's path and the projectRoot property, which leads to events being incorrectly grouped on frames from NDK system libraries such as libc.so.

As discussed on the support thread, uploading the full symbol mapping files should mean that Bugsang is able to map frames back to a file path and line number, which should fix the in-project detection and therefore the invalid grouping on libc.so frames.

Augustyniak added a commit to envoyproxy/envoy-mobile that referenced this issue Apr 6, 2022
Signed-off-by: Rafal Augustyniak <[email protected]>

Description: Generate symbol mapping file as opposed to symbol table mapping file as it contains more information. Bugsnag's guide explaining the difference between symbol table mapping files and symbol mapping files and how  to generate the latter can be found at https://docs.bugsnag.com/api/ndk-symbol-mapping-upload/#uploading-mapping-files. 

See this Bugsnag Android Github issue bugsnag/bugsnag-android#1609.

Risk Level: Low, affects debugging features only, not production code
Testing: N/A
Docs Changes: N/A
Release Notes: N/A
@JeremyGreen-TomTom
Copy link

libc isn't something that comes from the NDK or app, it's part of the android platform, and each android device generally has a different libc version. This should be handled specially by the bugsnag grouping algorithm - grouping by the application code that calls any libc function - including (but not limited to) abort().

@yousif-bugsnag
Copy link
Contributor

Hi @JeremyGreen-TomTom, we agree, and are planning to look at how we can improve handling of system libraries such as libc.so in future, so that they are excluded from grouping in cases such as this. I don't have a firm ETA for that at this time but we'll be sure to keep you posted!

jpsim pushed a commit to envoyproxy/envoy that referenced this issue Nov 29, 2022
Signed-off-by: Rafal Augustyniak <[email protected]>

Description: Generate symbol mapping file as opposed to symbol table mapping file as it contains more information. Bugsnag's guide explaining the difference between symbol table mapping files and symbol mapping files and how  to generate the latter can be found at https://docs.bugsnag.com/api/ndk-symbol-mapping-upload/#uploading-mapping-files.

See this Bugsnag Android Github issue bugsnag/bugsnag-android#1609.

Risk Level: Low, affects debugging features only, not production code
Testing: N/A
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: JP Simard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants