Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free threaded resources on Android #2761

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

alasram
Copy link
Collaborator

@alasram alasram commented Aug 24, 2024

When tiles are streamed, GPU resources are uploaded to the GPU synchronously in the render thread. A quick change in Zoom or location can block the render thread due to very slow uploads: Many uploads are serialized delaying actual rendering and affecting the user experience. The following Tracy screenshot shows a slow frame. Most of the frame is processing uploads of vertex buffers to the GPU:
slow_frame

Zoomed version of the same screenshot where we see actual rendering at the very end of the frame:
slow_frame_zoom

Pipelining the renderer would alleviate the problem but it won't be enough when many uploads are processed in the same frame. Also, pipelining the renderer requires significant efforts. An easier solution that may scale better is to distribute uploads to as many threads as possible and only wait for them to finish when they are needed in a draw call. This can scale better when many free CPU cores exist.

This PR adds shared EGL contexts to the Android backend to issue buffer uploads in worker threads.
The existing thread pool has been updated to enable persistent EGL contexts (I noticed very poor performance on Qualcomm drivers when migrating contexts between threads hence the thread pool with persistent contexts: no calls to eglMakeCurrent every frame and in different threads). This allows concurrent uploads to the GPU:
slow_frame_with_async_upload

The uploads are synchronized with the draws: the render thread only waits for a buffer when a draw needs it. This allow drawing while uploading data:
parallel_draw_and_upload
The uploads are currently randomly scheduled. Parallelizing drawing and uploading can be improved by first scheduling uploads for resources that get used late in the frame, e.g. uploading resources used by translucent objects before opaque objects since opaques are rendered first.

This PR is specific to Android and is made such as other backends are unaffected and have minimal code change. Once Vulkan is the official backend on Android similar and better handling of resources can be done.

Texture uploads are also slow but less frequent and will be handled in a separate PR:
upload_texture_future_work

There are unused index buffer that we currently wait on before we destroy them and this is slowing down the render thread #2760
unused_index_buffers
We can destroy them in a future frame but we should understand why they are there in the first place.

There are slowdowns in the the render thread that will be investigated separately (mailbox related):
slow_mail_box_push_lock

Copy link

github-actions bot commented Aug 24, 2024

Bloaty Results (iOS) 🐋

Compared to main

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0% +3.17Ki  [ = ]       0    TOTAL

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results-ios/pr-2761-compared-to-main.txt

Copy link

github-actions bot commented Aug 24, 2024

Bloaty Results 🐋

Compared to main

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.2%  +337Ki  +0.3%  +107Ki    TOTAL

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results/pr-2761-compared-to-main.txt

Compared to d387090 (legacy)

    FILE SIZE        VM SIZE    
 --------------  -------------- 
   +28% +32.9Mi  +427% +25.5Mi    TOTAL

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results/pr-2761-compared-to-legacy.txt

Copy link

github-actions bot commented Aug 24, 2024

Benchmark Results ⚡

Benchmark                                                     Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------------
OVERALL_GEOMEAN                                            -0.0155         -0.0154             0             0             0             0

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/benchmark-results/pr-2761-compared-to-main.txt

@louwers louwers added cpp-core OpenGL Issues related to the OpenGL renderer backend ⚡ performance labels Aug 24, 2024
@alasram alasram force-pushed the alasram/free_threaded_resources branch 3 times, most recently from c676779 to aa00054 Compare August 29, 2024 19:16
@alasram alasram force-pushed the alasram/free_threaded_resources branch 4 times, most recently from bb58491 to 8c1f7c7 Compare September 5, 2024 23:15
@alasram alasram force-pushed the alasram/free_threaded_resources branch 3 times, most recently from f314bdf to 0d64d6c Compare September 13, 2024 17:45
@sjg-wdw
Copy link
Collaborator

sjg-wdw commented Sep 17, 2024

Just tagging @mwilsnd and @alexcristici to take a look at this.

@alasram
Copy link
Collaborator Author

alasram commented Sep 17, 2024

I will be making this opt-in because the performance is variable depending on vendors:

  • Emulator. Shared EGL contexts not well supported (known issue)
  • Mali drivers (Google pixel 6 and 7) on Android: uploads are like memcpy so no benefit of using shared contexts
  • Adreno drivers on Android: buffer updates are slow compared to a memcpy so this is useful
  • Adreno drivers on QNX: buffer upload perf is much lower than a regular Android driver and shared contexts scale the perf (screenshots are on QNX)

@alasram alasram force-pushed the alasram/free_threaded_resources branch from 0d64d6c to 22e5b23 Compare September 18, 2024 00:59
Copy link
Collaborator

@louwers louwers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the added complexity worth it if it is opt-in?

Probably a tiny fraction of users has enough OpenGL knowledge and control over their distribution for this to be useful as a feature.

If the performance gains are significant, can we make an informed choice for the user automatically?

@alasram alasram force-pushed the alasram/free_threaded_resources branch 4 times, most recently from 20ba5d1 to 9838211 Compare September 24, 2024 23:14
@alasram alasram force-pushed the alasram/free_threaded_resources branch from 9838211 to 5185804 Compare September 25, 2024 17:55
@alasram
Copy link
Collaborator Author

alasram commented Sep 25, 2024

I changed this to always on by default on emulator and any device if GLES 3 is supported otherwise a GLES 2 context is requested and multi threading is disabled.

@alasram alasram force-pushed the alasram/free_threaded_resources branch from 5185804 to 5865533 Compare October 9, 2024 22:38
@alasram alasram force-pushed the alasram/free_threaded_resources branch from 5865533 to 8e0b4ed Compare October 21, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp-core OpenGL Issues related to the OpenGL renderer backend ⚡ performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants