-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#6471 sched idle: Add idle time #6472
Conversation
Adds a new STATUS_IDLE return code, and a corresponding TRACE_MARKER_TYPE_CORE_IDLE record. Changes the scheduler behavior to no longer return STATUS_EOF for an output when the ready queue is empty: instead STATUS_IDLE is returned until every single input is at EOF. This results in a more realistic schedule where other cores can pick up work later rather than disappearing from the system. Augments the schedule_stats tool to count idle replies and compute a % cpu usage metric. Adds a unit test for counting idles. Augments the scheduler_launcher to also compute %cpu usage. Updates all the scheduler tests for the new change. Adding idle time due to blocking syscalls will be done separately. Issue: #6471
RISCV is suddenly failing: |
x32 are the AMD 32-bit failures #6417 |
Adds a fallback to return EOF when all outputs are past the last record in replay mode, to handle a schedule where not every input reaches EOF or has regions of interest. Adds a unit test for this case, which hangs without the fix. Fixes a record bug which seems to cause the final segment entry for an input to have a stop instruction ordinal prior to the input's real endpoint. This is what caused hangs in actual usage of PR #6472 and led to adding the fallback just described (which is useful on its own). On selecting a new input which turned out to be at EOF, the original code called close_schedule_segment() on *prev_input* for some reason. That replaces the default -1 sentinel (read to EOF) with the current ordinal. The code then tries again for a new input, but if it ends up finding the prior one it would keep executing that without a new schedule segment entry. If that's the last input on that core, close_schedule_segment() is not called (the default is relied upon). I observed missing sentinels for one input in some runs and this is my theory as to what happened; those cases are gone now with this fix. Unfortunately it is difficult to create a unit test for this, but testing on the original hangs (without the fallback) show it fixed. Issue: #6471
Adds a fallback to return EOF when all outputs are past the last record in replay mode, to handle a schedule where not every input reaches EOF or has regions of interest. Adds a unit test for this case, which hangs without the fix. Fixes a record bug which seems to cause the final segment entry for an input to have a stop instruction ordinal prior to the input's real endpoint. This is what caused hangs in actual usage of PR #6472 and led to adding the fallback just described (which is useful on its own). On selecting a new input which turned out to be at EOF, the original code called close_schedule_segment() on *prev_input* for some reason. That replaces the default -1 sentinel (read to EOF) with the current ordinal. The code then tries again for a new input, but if it ends up finding the prior one it would keep executing that without a new schedule segment entry. If that's the last input on that core, close_schedule_segment() is not called (the default is relied upon). I observed missing sentinels for one input in some runs and this is my theory as to what happened; those cases are gone now with this fix. Unfortunately it is difficult to create a unit test for this, but testing on the original hangs (without the fallback) show it fixed. Issue: #6471
Adds a new STATUS_IDLE return code, and a corresponding TRACE_MARKER_TYPE_CORE_IDLE record.
Changes the scheduler behavior to no longer return STATUS_EOF for an output when the ready queue is empty: instead STATUS_IDLE is returned until every single input is at EOF. This results in a more realistic schedule where other cores can pick up work later rather than disappearing from the system.
Augments the schedule_stats tool to count idle replies and compute a % cpu usage metric. Adds a unit test for counting idles.
Augments the scheduler_launcher to also compute %cpu usage.
Updates all the scheduler tests for the new change.
Adding idle time due to blocking syscalls will be done separately.
Issue: #6471