Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#6471 sched idle: Add replay fallback and fix #6481

Merged
merged 4 commits into from
Nov 29, 2023
Merged

Conversation

derekbruening
Copy link
Contributor

Adds a fallback to return EOF when all outputs are past the last record in replay mode, to handle a schedule where not every input reaches EOF or has regions of interest. Adds a unit test for this case, which hangs without the fix.

Fixes a record bug which seems to cause the final segment entry for an input to have a stop instruction ordinal prior to the input's real endpoint. This is what caused hangs in actual usage of PR #6472 and led to adding the fallback just described (which is useful on its own). On selecting a new input which turned out to be at EOF, the original code called close_schedule_segment() on prev_input for some reason. That replaces the default -1 sentinel (read to EOF) with the current ordinal. The code then tries again for a new input, but if it ends up finding the prior one it would keep executing that without a new schedule segment entry. If that's the last input on that core, close_schedule_segment() is not called (the default is relied upon). I observed missing sentinels for one input in some runs and this is my theory as to what happened; those cases are gone now with this fix. Unfortunately it is difficult to create a unit test for this, but testing on the original hangs (without the fallback) show it fixed.

Issue: #6471

Adds a fallback to return EOF when all outputs are past the last
record in replay mode, to handle a schedule where not every input
reaches EOF or has regions of interest.  Adds a unit test for this
case, which hangs without the fix.

Fixes a record bug which seems to cause the final segment entry for an
input to have a stop instruction ordinal prior to the input's real
endpoint.  This is what caused hangs in actual usage of PR #6472 and
led to adding the fallback just described (which is useful on its
own).  On selecting a new input which turned out to be at EOF, the
original code called close_schedule_segment() on *prev_input* for some
reason.  That replaces the default -1 sentinel (read to EOF) with the
current ordinal.  The code then tries again for a new input, but if it
ends up finding the prior one it would keep executing that without a
new schedule segment entry.  If that's the last input on that core,
close_schedule_segment() is not called (the default is relied upon).
I observed missing sentinels for one input in some runs and this is my
theory as to what happened; those cases are gone now with this fix.
Unfortunately it is difficult to create a unit test for this, but
testing on the original hangs (without the fallback) show it fixed.

Issue: #6471
@derekbruening
Copy link
Contributor Author

x32 failure is AMD #6417

@derekbruening derekbruening merged commit 0254f5b into master Nov 29, 2023
14 of 15 checks passed
@derekbruening derekbruening deleted the i6471-output-eof branch November 29, 2023 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants