i#6971: Use instr count instead of wallclock for simulated time #7015

derekbruening · 2024-10-01T03:17:57Z

When using the drmemtrace scheduler in an analyzer or other tool that does not track simulated time with the default QUANTUM_INSTRUCTIONS, the scheduler used to use wall-clock time to measure blocking-input and idle time. Here we change that to use the instruction count plus the idle count via a new idle counter.

The time_units_per_us and sched_time_units_per_us defaults are set to 1000, reflecting a 2gHz machine with IPC=0.5.
The old time_units_per_us=100 for wall clock was too low; to match it with counts, we need a low sched_time_units_per_us: 500 is better than 1000, but that seems unrealisitc. Instead we can get the results we want
from our large traces by exiting earlier, since most of the unwanted idle is still in seemingly unrepresentative regions at the end. We raise exit_if_fraction_left from 0.05 to 0.1 here.

Using counters provides a more reproducible result across different runs and machines.

Wall-clock time is still used to measure idle time on replay. Switching to the idle count added here is left as separate work under #7023. (Replay also uses wall-clock time to coordinate concurrent outputs beyond shared input constraints; that will likely always remain.)

The new default values of the options were tested on larger traces and found to produce a representative level of idle time.

This change means that the clock going backward problem (#6966) is no longer seen in default runs. The analyzer still supports wall-clock with the -sched_time option so a check to avoid underflow is added.

Fixes #6971
Fixes #6966

When using the drmemtrace scheduler in an analyzer or other tool that does not track simulated time, the scheduler used to use wall-clock time. Here we change that to use the instruction count plus a scaled idle count. An idle counter is added and a new scale option scheduler_options_t.time_units_per_idle (and CLI -sched_time_units_per_idle) defaulting to 5. The time_units_per_us and sched_time_units_per_us defaults are set to 1000, reflecting a gHz machine with IPC=0.5 Using counters provides a more reproducible result across different runs and machines. Adds a test of the new option. The default values of the options were tested on a large trace and found to produce a representative level of idle time during the main execution (and the whole run when combined with the forthcoming exit-early feature for #6959). This means that the clock going backward problem (#6966) is no longer seen in default runs. The analyzer still supports wall-clock with the -sched_time option so a check to avoid underflow is added. Fixes #6971 Fixes #6966

…unt-instead-of-clock

…eft to 0.25 The conclusion is that the old time_units_per_us=100 for wall clock was too low; to match it with counts, either the time_units_per_idle scaling is needed or just dropping time_units_per_us from 1000 to 500 but that seems unrealistic. Instead we can get the results we want from our large traces by exiting earlier, since most of the unwanted idle is still in seemingly unrepresentative regions at the end.

derekbruening · 2024-10-03T20:06:10Z

I abandoned the idle weighting as it did not seem to have a good
justification based on evaluating idle vs instructions in schedule_stats.
I needed it because the old time_units_per_us=100 for wall clock
was too low; to match it with counts, either the time_units_per_idle
scaling is needed or dropping time_units_per_us from 1000 to 500
but that seems unrealistic. Instead we can get the results we want
from our large traces by exiting earlier, since most of the unwanted
idle is still in seemingly unrepresentative regions at the end.
The plan is to raise exit_if_fraction_left from PR #7018 to 0.25 by default.
I didn't want to delete this branch and remake it on top of that one so
will make that change after PR #7018 goes in.

brettcoon

As I understand it, this change only affects the QUANTUM_INSTRUCTION scheduler mode, replacing wall-clock time with instructions + idles, (except for schedule replay that still uses wall clock). Can you add something like that (or a corrected version), to the check-in comment? It wasn't initially clear to me from the existing text exactly what modes were affected.

clients/drcachesim/common/options.cpp

clients/drcachesim/scheduler/scheduler.cpp

clients/drcachesim/scheduler/scheduler.h

derekbruening · 2024-10-04T17:50:21Z

As I understand it, this change only affects the QUANTUM_INSTRUCTION scheduler mode, replacing wall-clock time with instructions + idles, (except for schedule replay that still uses wall clock). Can you add something like that (or a corrected version), to the check-in comment? It wasn't initially clear to me from the existing text exactly what modes were affected.

Updated.

Adds a -verbose 1 dump of the scheduler options at startup. This helps to record what options were passed in a particular run. Issue: #6938

Improves diagnostics by augmenting the all-runqueue printing: + It now constructs its many-line string in memory and then prints it all at once, to make it more atomic. + It includes the remaining blocked times for blocked inputs. + It is moved from pop_from_ready_queue() where the popped input is in flux to pick_next_input() where the current running input is valid. + It is printed more frequently. Also prints the size of the unscheduled queue when moving it. Issue: #6938

Adds a new scheduler feature and CLI option exit_if_fraction_inputs_left. This applies to -core_sharded and -core_serial modes. When an input reaches EOF, if the number of non-EOF inputs left as a fraction of the original inputs is equal to or less than this value then the scheduler exits (sets all outputs to EOF) rather than finishing off the final inputs. This helps avoid long sequences of idles during staggered endings with fewer inputs left than cores and only a small fraction of the total instructions left in those inputs. The default value in scheduler_options_t and the CLI option is 0.05 (i.e., 5%), which when tested on an large internal trace helps eliminate much of the final idle time from the cores without losing many instructions. Compare the numbers below for today's default with a long idle time and so distinct differences between the "cpu busy by time" and "cpu busy by time, ignoring idle past last instr" stats on a 39-core schedule-stats run of a moderately large trace, with key stats and the 1st 2 cores (for brevity) shown here: ``` 1567052521 instructions 878027975 idles 64.09% cpu busy by record count 82.38% cpu busy by time 96.81% cpu busy by time, ignoring idle past last instr Core #0 schedule: CccccccOXHhUuuuuAaSEOGOWEWQqqqFffIiTETENWwwOWEeeeeeeACMmTQFfOWLWVvvvvFQqqqqYOWOooOWOYOYQOWO_O_W_O_W_O_W_O_WO_WO_O_O_O_O_O_OR_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_RY_YyyyySUuuOSISO_S_S_SOPpSOKO_KO_KCcDKWDB_B_____________________________________________ Core #1 schedule: KkLWSFUQPDddddddddXxSUSVRJWKkRNJBWUWwwTttGgRNKkkRWNTtFRWKkRNWUuuGULRFSRSYKkkkRYAYFffGSRYHRYHNWMDddddddddRYGgggggYHNWK_YAHYNnGYSNHWwwwwSWSNKSYyyWKNNWKNNGAKWGggNnNW_NNWE_E_EF__________________________________________________ ``` And now with -exit_if_fraction_inputs_left 0.05, where we lose (1567052521 - 1564522227)/1567052521. = 0.16% of the instructions but drastically reduce the tail from 14% of the time to less than 1% of the time: ``` 1564522227 instructions 120512812 idles 92.85% cpu busy by record count 96.39% cpu busy by time 97.46% cpu busy by time, ignoring idle past last instr Core #0 schedule: CccccccOXHKYEGGETRARrrPRTVvvvRrrNWwwOOKWVRRrPBbbXUVvvvvvOWKVLWVvvJjSOWKVUuTIiiiFPpppKAaaMFfffAHOKWAaGNBOWKAPPOABCWKPWOKWPCXxxxZOWKCccJSOSWKJUYRCOWKCcSOSUKkkkOROK_O_O_O_O_O Core #1 schedule: KkLWSMmmFLSFffffffJjWBbGBUuuuuuuuuuuBDBJJRJWKkRNJWMBKkkRNWKkRNWKkkkRNWXxxxxxZOooAaUIiTHhhhSDNnnnHZzQNnnRNWXxxxxxRNWUuuRNWKXUuXRNKRWKNXxxRWKONNHRKWONURKWXRKXRKNW_KR_KkRK_KRKR_R_R_R_R_R_R_R_R_R_R_R__R__R__R___R___R___R___R___R ``` Fixes #6959

…05 on internal apps but not as disruptive to unit tests or other types of apps as 0.25 is

…unt-instead-of-clock

derekbruening added 3 commits September 30, 2024 22:31

Merge branch 'master' of github.com:DynamoRIO/dynamorio into i6971-co…

d713f19

…unt-instead-of-clock

derekbruening requested a review from brettcoon October 3, 2024 20:39

brettcoon approved these changes Oct 4, 2024

View reviewed changes

clients/drcachesim/common/options.cpp Outdated Show resolved Hide resolved

clients/drcachesim/scheduler/scheduler.cpp Outdated Show resolved Hide resolved

clients/drcachesim/scheduler/scheduler.cpp Show resolved Hide resolved

clients/drcachesim/scheduler/scheduler.h Show resolved Hide resolved

derekbruening added 7 commits October 4, 2024 13:50

i#6938 sched migrate: Print configuration at startup (#7020)

4c3fcb8

Adds a -verbose 1 dump of the scheduler options at startup. This helps to record what options were passed in a particular run. Issue: #6938

Set exit_if_fraction_left to 0.1 as better compromise: better than 0.…

aace7ca

…05 on internal apps but not as disruptive to unit tests or other types of apps as 0.25 is

Review requests: clarify docs for 3 CLI options; add comments to code

29e9f32

Add #7023 references for removing wall clock from replay idle durations

ef42607

Merge branch 'master' of github.com:DynamoRIO/dynamorio into i6971-co…

ac03dc6

…unt-instead-of-clock

derekbruening merged commit dff98b7 into master Oct 4, 2024
17 checks passed

derekbruening deleted the i6971-count-instead-of-clock branch October 4, 2024 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i#6971: Use instr count instead of wallclock for simulated time #7015

i#6971: Use instr count instead of wallclock for simulated time #7015

derekbruening commented Oct 1, 2024 •

edited

Loading

derekbruening commented Oct 3, 2024

brettcoon left a comment

derekbruening commented Oct 4, 2024

i#6971: Use instr count instead of wallclock for simulated time #7015

i#6971: Use instr count instead of wallclock for simulated time #7015

Conversation

derekbruening commented Oct 1, 2024 • edited Loading

derekbruening commented Oct 3, 2024

brettcoon left a comment

Choose a reason for hiding this comment

derekbruening commented Oct 4, 2024

derekbruening commented Oct 1, 2024 •

edited

Loading