Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output vizualizer #2744

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open

Output vizualizer #2744

wants to merge 54 commits into from

Conversation

vmcj
Copy link
Member

@vmcj vmcj commented Oct 14, 2024

I tried to implement #367. This is still a bit rough but I rather first get some feedback instead of continuing with this if this is not what @eldering envisioned.

The main problem I currently have is which testcases to visualize, do we only want the failing one and if so only the wrong-answer one or do we just want to try all of them and accept that the script might fail as the team output might not conform to the expected output. (For example: the team might have stopped halfway through the problem or might still have debug output etc.)

The visualized team output for boolfind is ofcourse not very helpful but as this is only a demo this is the best for now.

When implementing this I found out we could just always visualize as a task with lower priority and update the visualization task with the judgehost which should have the needed files. I wrongly assume that the file will always be at the judgehost to keep this simple but that should still be fixed if we want to continue with this.

vmcj and others added 30 commits October 13, 2024 14:21
array_filter would only filter out the other judgehosts but still return an array of judgehost objects. By selecting the first (and only item) we can now get the `enabled` property and properly check.

```
array(1) {
  [1]=>
  array(5) {
    ["id"]=>
    string(1) "2"
    ["hostname"]=>
    string(8) "judgehost"
    ["enabled"]=>
    bool(true)
    ["polltime"]=>
    string(20) "1728821560.017400000"
    ["hidden"]=>
    bool(false)
  }
}
```
Such a script will most likely never be a generic script.
This is untested yet as the syntax should be further discussed.
This assumes a similar invocation as for the output_validator, the other alternative is to add this to the domjudge-problem.ini file if this doesn't end up in the ICPC problem spec.
Getting this directly via submission.problem.output_visualizer_executable (the property) seems to fail. It does show up in the twig dump but the translation fails when using it.
This is needed during testing of the job on the judgehost
Debugging via the API is hard, so we hardcode the search and just copy it over in a next commit.
@@ -768,7 +771,7 @@ function fetch_executable_internal(
$judgehosts = request('judgehosts', 'GET');
if ($judgehosts !== null) {
$judgehosts = dj_json_decode($judgehosts);
$judgehost = array_filter($judgehosts, fn($j) => $j['hostname'] === $myhost);
$judgehost = array_values(array_filter($judgehosts, fn($j) => $j['hostname'] === $myhost))[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of your other PR

$run_config['hash']
);
if (isset($error)) {
// FIXME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix me 😛

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loosely based on what we do for the debug tasks, I'll see if I can merge the shared parts and possibly fix this one.

$debug_package = base64_decode($request->request->get('visual_output'));
file_put_contents($tempFilename, $debug_package);
}
// FIXME: error checking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another fixme

@@ -1478,6 +1533,7 @@ public function getJudgeTasksAction(Request $request): array
throw new BadRequestHttpException('Argument \'hostname\' is mandatory');
}
$hostname = $request->request->get('hostname');
$hostname = 'Computer';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops

throw new BadRequestHttpException(
'Inconsistent data, no judging known with judgingid = ' . $judgeTask->getJobId() . '.');
}
if ($tempFilename = tempnam($this->dj->getDomjudgeTmpDir(), "visual-")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means it doesn't work with 2 servers in HA mode. We should store this in the DB as a file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is loosely based on what we do with debug packages, so we have the same problem there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't know. Then maybe for now that's fine but we should fix it at some point?

@@ -532,6 +532,12 @@
{{ runs | displayTestcaseResults(judgingDone) }}
</td>
<td>
{% if hasOutputVisualizer and judgingDone %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add and not visualization is defined?

@meisterT
Copy link
Member

Getting this error when clicking the "visualize" button:
image

vals = []
for line in lines:
if 'READ' in line:
vals.append(int(line.split(' ')[-1]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks if the submission writes bogus output, e.g. READ domjudge, how do we handle cases like this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this is something which is left up to the jury.

I wonder if there is something we can do with the output_validator & the judgemessage for communication. Either an extra exitcode or otherwise the jury has to parse the output in this script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's left to the jury. In this case you are the jury by implementing the visualizer :-)

Since people are going to model after examples we give them, we should make it robust and not crash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this, I think we should return 42 in the script if we were able to visualize the output, and 43 if not.

@@ -1,3 +1,4 @@
name: Boolean switch search

validation: custom interactive
visualization: default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that already spec'ed out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is not something which exists in the spec.

Copy link
Member

@meisterT meisterT Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking because I think this should be custom instead of default since there is no default visualizer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any string will do at this moment, would be nice to get @eldering his opinion as I just picked something. Will change it to custom for now.

@@ -334,7 +334,10 @@ function fetch_executable_internal(
$execrunpath = $execbuilddir . '/run';
$execrunjurypath = $execbuilddir . '/runjury';
if (!is_dir($execdir) || !file_exists($execdeploypath)) {
system('rm -rf ' . dj_escapeshellarg($execdir) . ' ' . dj_escapeshellarg($execbuilddir));
system('rm -rf ' . dj_escapeshellarg($execdir) . ' ' . dj_escapeshellarg($execbuilddir), $retval);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR

continue;
}

$teamoutput = $workdir . "/testcase" . sprintf('%05d', $judgeTask['testcase_id']) . '/1/program.out';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does the /1/ here come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the output of the first run, so in other words, this does not work for multipass problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right - then please add a TODO and consider making the one a local variabled, e.g. $pass = 1; and then using that in the path construction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we want to do in this case? I assume we want to visualize actually the last pass?

[$runpath, $teamoutput, $tmpfile]));
system($visual_cmd, $retval);
if ($retval !== 0) {
error("Running '$runpath' failed.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rather report an internal error like we do in other cases (and not crash judgedaemons)

$programStrings = [];
$programStrings['package_dir'] = 'output_validators/';
$programStrings['type'] = 'output validator';
$programStrings['clash'] = 'cmp';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does clash stand for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming clash, this is copied mostly from how the output_validator is uploaded.

@meisterT
Copy link
Member

Do we limit upload filesize anywhere?

'Inconsistent data, no judging known with judgingid = ' . $judgeTask->getJobId() . '.');
}
if ($tempFilename = tempnam($this->dj->getDomjudgeTmpDir(), "visual-")) {
$debug_package = base64_decode($request->request->get('visual_output'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update name (no longer debug_package)

#[OA\Response(response: 200, description: 'When the visual output has been added')]
public function addVisualization(
Request $request,
#[OA\PathParameter(description: 'The hostname of the judgehost that wants to add the debug info')]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is not debug info

'comment' => 'Team output visualization.',
])]
#[ORM\Index(columns: ['judgingid'], name: 'judgingid')]
class Visualization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow a visualizer to fail gracefully (see exit code 43 message above), we need to be able to record this here (and perhaps a message?).

@meisterT
Copy link
Member

One implication of current implementation (which is reasonable) is that if you shut down a judgehost, you are not going to get the visualization. We should probably think about a good way to message this to the user.

@meisterT
Copy link
Member

Out of scope for this PR, we could have a toggle at the problem level to visualize the output right after judging a test case (perhaps filtered down by verdict, e.g. only for wrong-answer).

@vmcj
Copy link
Member Author

vmcj commented Oct 22, 2024

One implication of current implementation (which is reasonable) is that if you shut down a judgehost, you are not going to get the visualization. We should probably think about a good way to message this to the user.

That problem is also there for the debug package I think. It wouldn't be that hard to extend here to optionally calculate the result on the testcase but if a submission is non-deterministic we would get a different visualization as what the original team-output had. Your remark #2744 (comment) would prevent the problem but would give a problem when visualization takes a some time and there is a backlog.

@vmcj
Copy link
Member Author

vmcj commented Oct 22, 2024

Out of scope for this PR, we could have a toggle at the problem level to visualize the output right after judging a test case (perhaps filtered down by verdict, e.g. only for wrong-answer).

I assume still with the priority of first the submission work and only if there is nothing do the visualization or do you want to extend the queued work to have a {Run+Visual}[x] with x as the length of the worklist?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants