Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#6471 sched idle: Add idle time #6472

Merged
merged 5 commits into from
Nov 28, 2023
Merged

i#6471 sched idle: Add idle time #6472

merged 5 commits into from
Nov 28, 2023

Conversation

derekbruening
Copy link
Contributor

Adds a new STATUS_IDLE return code, and a corresponding TRACE_MARKER_TYPE_CORE_IDLE record.

Changes the scheduler behavior to no longer return STATUS_EOF for an output when the ready queue is empty: instead STATUS_IDLE is returned until every single input is at EOF. This results in a more realistic schedule where other cores can pick up work later rather than disappearing from the system.

Augments the schedule_stats tool to count idle replies and compute a % cpu usage metric. Adds a unit test for counting idles.

Augments the scheduler_launcher to also compute %cpu usage.

Updates all the scheduler tests for the new change.

Adding idle time due to blocking syscalls will be done separately.

Issue: #6471

Adds a new STATUS_IDLE return code, and a corresponding
TRACE_MARKER_TYPE_CORE_IDLE record.

Changes the scheduler behavior to no longer return STATUS_EOF for an
output when the ready queue is empty: instead STATUS_IDLE is returned
until every single input is at EOF.  This results in a more realistic
schedule where other cores can pick up work later rather than
disappearing from the system.

Augments the schedule_stats tool to count idle replies and compute a %
cpu usage metric.  Adds a unit test for counting idles.

Augments the scheduler_launcher to also compute %cpu usage.

Updates all the scheduler tests for the new change.

Adding idle time due to blocking syscalls will be done separately.

Issue: #6471
@derekbruening
Copy link
Contributor Author

(x32 failure is #6417 and win64 is replaceall #5412)

clients/drcachesim/common/trace_entry.h Outdated Show resolved Hide resolved
clients/drcachesim/common/trace_entry.h Show resolved Hide resolved
clients/drcachesim/scheduler/scheduler.h Outdated Show resolved Hide resolved
clients/drcachesim/tests/scheduler_launcher.cpp Outdated Show resolved Hide resolved
clients/drcachesim/tests/scheduler_launcher.cpp Outdated Show resolved Hide resolved
clients/drcachesim/tools/schedule_stats.cpp Show resolved Hide resolved
clients/drcachesim/tools/schedule_stats.cpp Show resolved Hide resolved
@derekbruening
Copy link
Contributor Author

derekbruening commented Nov 28, 2023

RISCV is suddenly failing: rsync: [sender] change_dir "/home/runner/work/dynamorio/dynamorio/../extract/lib/riscv64-linux-gnu" failed: No such file or directory (2). It looks like the packages we manually extract for the cross-compile have changed and one of the lib dirs is no longer there. Unrelated this PR (as it is during setup and not even into the build; plus PR #6475 hits the same issue). I have PR #6476 trying to fix that.

Win64 failures are traceopts #6423 and replaceall #5412.

@derekbruening
Copy link
Contributor Author

x32 are the AMD 32-bit failures #6417

@derekbruening derekbruening merged commit 2a632a9 into master Nov 28, 2023
12 of 15 checks passed
@derekbruening derekbruening deleted the i6471-idle-time branch November 28, 2023 03:30
derekbruening added a commit that referenced this pull request Nov 29, 2023
Adds a fallback to return EOF when all outputs are past the last
record in replay mode, to handle a schedule where not every input
reaches EOF or has regions of interest.  Adds a unit test for this
case, which hangs without the fix.

Fixes a record bug which seems to cause the final segment entry for an
input to have a stop instruction ordinal prior to the input's real
endpoint.  This is what caused hangs in actual usage of PR #6472 and
led to adding the fallback just described (which is useful on its
own).  On selecting a new input which turned out to be at EOF, the
original code called close_schedule_segment() on *prev_input* for some
reason.  That replaces the default -1 sentinel (read to EOF) with the
current ordinal.  The code then tries again for a new input, but if it
ends up finding the prior one it would keep executing that without a
new schedule segment entry.  If that's the last input on that core,
close_schedule_segment() is not called (the default is relied upon).
I observed missing sentinels for one input in some runs and this is my
theory as to what happened; those cases are gone now with this fix.
Unfortunately it is difficult to create a unit test for this, but
testing on the original hangs (without the fallback) show it fixed.

Issue: #6471
derekbruening added a commit that referenced this pull request Nov 29, 2023
Adds a fallback to return EOF when all outputs are past the last record
in replay mode, to handle a schedule where not every input reaches EOF
or has regions of interest. Adds a unit test for this case, which hangs
without the fix.

Fixes a record bug which seems to cause the final segment entry for an
input to have a stop instruction ordinal prior to the input's real
endpoint. This is what caused hangs in actual usage of PR #6472 and led
to adding the fallback just described (which is useful on its own). On
selecting a new input which turned out to be at EOF, the original code
called close_schedule_segment() on *prev_input* for some reason. That
replaces the default -1 sentinel (read to EOF) with the current ordinal.
The code then tries again for a new input, but if it ends up finding the
prior one it would keep executing that without a new schedule segment
entry. If that's the last input on that core, close_schedule_segment()
is not called (the default is relied upon). I observed missing sentinels
for one input in some runs and this is my theory as to what happened;
those cases are gone now with this fix. Unfortunately it is difficult to
create a unit test for this, but testing on the original hangs (without
the fallback) show it fixed.

Issue: #6471
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants