i#6471 sched idle: Add replay fallback and fix #6481
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds a fallback to return EOF when all outputs are past the last record in replay mode, to handle a schedule where not every input reaches EOF or has regions of interest. Adds a unit test for this case, which hangs without the fix.
Fixes a record bug which seems to cause the final segment entry for an input to have a stop instruction ordinal prior to the input's real endpoint. This is what caused hangs in actual usage of PR #6472 and led to adding the fallback just described (which is useful on its own). On selecting a new input which turned out to be at EOF, the original code called close_schedule_segment() on prev_input for some reason. That replaces the default -1 sentinel (read to EOF) with the current ordinal. The code then tries again for a new input, but if it ends up finding the prior one it would keep executing that without a new schedule segment entry. If that's the last input on that core, close_schedule_segment() is not called (the default is relied upon). I observed missing sentinels for one input in some runs and this is my theory as to what happened; those cases are gone now with this fix. Unfortunately it is difficult to create a unit test for this, but testing on the original hangs (without the fallback) show it fixed.
Issue: #6471