-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new NVTX range for task GPU ownership #11596
base: branch-24.12
Are you sure you want to change the base?
Conversation
Signed-off-by: Jihoon Son <[email protected]>
89b6f6c
to
5400dfb
Compare
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSemaphore.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSemaphore.scala
Outdated
Show resolved
Hide resolved
@@ -258,6 +262,10 @@ private final class SemaphoreTaskInfo(val taskAttemptId: Long) extends Logging { | |||
// We now own the semaphore so we need to wake up all of the other tasks that are | |||
// waiting. | |||
hasSemaphore = true | |||
if (trackSemaphore) { | |||
nvtxRange = | |||
Some(new NvtxUniqueRange(s"Sem-${taskAttemptId}", NvtxColor.ORANGE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NvtxColor
is a hex number in argb (alpha, red, green, blue). I think it would make this a lot more useable if task X hashed to a specific color, consistently. That way you can tell them apart while the task is alive. Bonus points to make it color match with the current Task ranges? That might make this even more useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a really good idea. I will try as you suggested. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually changed my mind, and want to try it later. It is a nice idea and I definitely want to try, but don't want to keep this PR open for unnecessarily long. Created #11645.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSemaphore.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to my eyes.
build |
If you have ever been curious about the semaphore-based GPU concurrency control mechanism, you may have wondered at some point exactly what tasks are holding the semaphore at a given point of time during query processing. This PR can help you in that case. This PR adds a new NVTX range that shows what task owns GPU in the nsys profile result. This feature is off by default (since I don't think it's always useful), and can be enabled by setting
spark.rapids.sql.traceTaskGpuOwnership = true
. The screenshot below shows an example nsys result with the new semaphore ranges. The orange boxes withSem-${taskAttemptId}
represent the ranges in which each spark task was holding the semaphore.