Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#6971: Use instr count instead of wallclock for simulated time #7015

Merged
merged 10 commits into from
Oct 4, 2024

Conversation

derekbruening
Copy link
Contributor

@derekbruening derekbruening commented Oct 1, 2024

When using the drmemtrace scheduler in an analyzer or other tool that does not track simulated time with the default QUANTUM_INSTRUCTIONS, the scheduler used to use wall-clock time to measure blocking-input and idle time. Here we change that to use the instruction count plus the idle count via a new idle counter.

The time_units_per_us and sched_time_units_per_us defaults are set to 1000, reflecting a 2gHz machine with IPC=0.5.
The old time_units_per_us=100 for wall clock was too low; to match it with counts, we need a low sched_time_units_per_us: 500 is better than 1000, but that seems unrealisitc. Instead we can get the results we want
from our large traces by exiting earlier, since most of the unwanted idle is still in seemingly unrepresentative regions at the end. We raise exit_if_fraction_left from 0.05 to 0.1 here.

Using counters provides a more reproducible result across different runs and machines.

Wall-clock time is still used to measure idle time on replay. Switching to the idle count added here is left as separate work under #7023. (Replay also uses wall-clock time to coordinate concurrent outputs beyond shared input constraints; that will likely always remain.)

The new default values of the options were tested on larger traces and found to produce a representative level of idle time.

This change means that the clock going backward problem (#6966) is no longer seen in default runs. The analyzer still supports wall-clock with the -sched_time option so a check to avoid underflow is added.

Fixes #6971
Fixes #6966

When using the drmemtrace scheduler in an analyzer or other tool that
does not track simulated time, the scheduler used to use wall-clock
time.  Here we change that to use the instruction count plus a scaled
idle count.  An idle counter is added and a new scale option
scheduler_options_t.time_units_per_idle (and CLI
-sched_time_units_per_idle) defaulting to 5.

The time_units_per_us and sched_time_units_per_us defaults are set to
1000, reflecting a gHz machine with IPC=0.5

Using counters provides a more reproducible result across different
runs and machines.

Adds a test of the new option.

The default values of the options were tested on a large trace and
found to produce a representative level of idle time during the main
execution (and the whole run when combined with the forthcoming
exit-early feature for #6959).

This means that the clock going backward problem (#6966) is no longer
seen in default runs.  The analyzer still supports wall-clock with the
-sched_time option so a check to avoid underflow is added.

Fixes #6971
Fixes #6966
…eft to 0.25

The conclusion is that the old time_units_per_us=100 for wall clock
was too low; to match it with counts, either the time_units_per_idle
scaling is needed or just dropping time_units_per_us from 1000 to 500
but that seems unrealistic.  Instead we can get the results we want
from our large traces by exiting earlier, since most of the unwanted
idle is still in seemingly unrepresentative regions at the end.
@derekbruening
Copy link
Contributor Author

I abandoned the idle weighting as it did not seem to have a good
justification based on evaluating idle vs instructions in schedule_stats.
I needed it because the old time_units_per_us=100 for wall clock
was too low; to match it with counts, either the time_units_per_idle
scaling is needed or dropping time_units_per_us from 1000 to 500
but that seems unrealistic. Instead we can get the results we want
from our large traces by exiting earlier, since most of the unwanted
idle is still in seemingly unrepresentative regions at the end.
The plan is to raise exit_if_fraction_left from PR #7018 to 0.25 by default.
I didn't want to delete this branch and remake it on top of that one so
will make that change after PR #7018 goes in.

Copy link
Contributor

@brettcoon brettcoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, this change only affects the QUANTUM_INSTRUCTION scheduler mode, replacing wall-clock time with instructions + idles, (except for schedule replay that still uses wall clock). Can you add something like that (or a corrected version), to the check-in comment? It wasn't initially clear to me from the existing text exactly what modes were affected.

clients/drcachesim/common/options.cpp Outdated Show resolved Hide resolved
clients/drcachesim/scheduler/scheduler.cpp Outdated Show resolved Hide resolved
clients/drcachesim/scheduler/scheduler.cpp Show resolved Hide resolved
clients/drcachesim/scheduler/scheduler.h Show resolved Hide resolved
@derekbruening
Copy link
Contributor Author

As I understand it, this change only affects the QUANTUM_INSTRUCTION scheduler mode, replacing wall-clock time with instructions + idles, (except for schedule replay that still uses wall clock). Can you add something like that (or a corrected version), to the check-in comment? It wasn't initially clear to me from the existing text exactly what modes were affected.

Updated.

Adds a -verbose 1 dump of the scheduler options at startup. This helps
to record what options were passed in a particular run.

Issue: #6938
Improves diagnostics by augmenting the all-runqueue printing:

+ It now constructs its many-line string in memory and then prints it
all at once, to make it more atomic.

+ It includes the remaining blocked times for blocked inputs.

+ It is moved from pop_from_ready_queue() where the popped input is in
flux to pick_next_input() where the current running input is valid.

+ It is printed more frequently.

Also prints the size of the unscheduled queue when moving it.

Issue: #6938
Adds a new scheduler feature and CLI option exit_if_fraction_inputs_left. This
applies to -core_sharded and -core_serial modes. When an input reaches
EOF, if the number of non-EOF inputs left as a fraction of the original
inputs is equal to or less than this value then the scheduler exits
(sets all outputs to EOF) rather than finishing off the final inputs.
This helps avoid long sequences of idles during staggered endings with
fewer inputs left than cores and only a small fraction of the total
instructions left in those inputs.

The default value in scheduler_options_t and the CLI option is 0.05 (i.e., 5%),
which when tested on an large internal trace helps eliminate much of the
final idle time from the cores without losing many instructions.

Compare the numbers below for today's default with a long idle time and
so distinct differences between the "cpu busy by time" and "cpu busy by
time, ignoring idle past last instr" stats on a 39-core schedule-stats
run of a moderately large trace, with key stats and the 1st 2 cores (for
brevity) shown here:

```
  1567052521 instructions
   878027975 idles
       64.09% cpu busy by record count
       82.38% cpu busy by time
       96.81% cpu busy by time, ignoring idle past last instr
Core #0 schedule: CccccccOXHhUuuuuAaSEOGOWEWQqqqFffIiTETENWwwOWEeeeeeeACMmTQFfOWLWVvvvvFQqqqqYOWOooOWOYOYQOWO_O_W_O_W_O_W_O_WO_WO_O_O_O_O_O_OR_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_RY_YyyyySUuuOSISO_S_S_SOPpSOKO_KO_KCcDKWDB_B_____________________________________________ 
Core #1 schedule: KkLWSFUQPDddddddddXxSUSVRJWKkRNJBWUWwwTttGgRNKkkRWNTtFRWKkRNWUuuGULRFSRSYKkkkRYAYFffGSRYHRYHNWMDddddddddRYGgggggYHNWK_YAHYNnGYSNHWwwwwSWSNKSYyyWKNNWKNNGAKWGggNnNW_NNWE_E_EF__________________________________________________
```

And now with -exit_if_fraction_inputs_left 0.05, where we lose (1567052521 -
1564522227)/1567052521. = 0.16% of the instructions but drastically
reduce the tail from 14% of the time to less than 1% of the time:

```
  1564522227 instructions
   120512812 idles
       92.85% cpu busy by record count
       96.39% cpu busy by time
       97.46% cpu busy by time, ignoring idle past last instr
Core #0 schedule: CccccccOXHKYEGGETRARrrPRTVvvvRrrNWwwOOKWVRRrPBbbXUVvvvvvOWKVLWVvvJjSOWKVUuTIiiiFPpppKAaaMFfffAHOKWAaGNBOWKAPPOABCWKPWOKWPCXxxxZOWKCccJSOSWKJUYRCOWKCcSOSUKkkkOROK_O_O_O_O_O 
Core #1 schedule: KkLWSMmmFLSFffffffJjWBbGBUuuuuuuuuuuBDBJJRJWKkRNJWMBKkkRNWKkRNWKkkkRNWXxxxxxZOooAaUIiTHhhhSDNnnnHZzQNnnRNWXxxxxxRNWUuuRNWKXUuXRNKRWKNXxxRWKONNHRKWONURKWXRKXRKNW_KR_KkRK_KRKR_R_R_R_R_R_R_R_R_R_R_R__R__R__R___R___R___R___R___R
```

Fixes #6959
…05 on internal apps but not as disruptive to unit tests or other types of apps as 0.25 is
@derekbruening derekbruening merged commit dff98b7 into master Oct 4, 2024
17 checks passed
@derekbruening derekbruening deleted the i6971-count-instead-of-clock branch October 4, 2024 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants