Distinguish never-spawned from never-submitted. #6067

hjoliver · 2024-04-13T04:35:44Z

Close #6066

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
Tests are included (or explain why tests are not needed).
CHANGES.md entry included if this is a change that can affect users
Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

hjoliver · 2024-04-15T05:10:28Z

~~Fixing bullet point 2 has generated a DB issue that I don't understand yet.~~

cylc/flow/task_pool.py

hjoliver · 2024-04-18T06:48:42Z

suicide trigger

[scheduling]
    [[graph]]
        R1 = """
            a? & b => c
            a:failed? => !a & !c
        """
[runtime]
    [[a]]
        script = "sleep 3; false"
    [[b,c]]

now works as expected:

...
INFO - [1/c:waiting(runahead)] => waiting
INFO - [1/a/01:failed] removed from active task pool: suicide trigger
INFO - [1/c:waiting] removed from active task pool: suicide trigger
INFO - [1/b/01:running] => succeeded
INFO - Not spawning 1/c: already used in this flow
INFO - [1/b/01:succeeded] removed from active task pool: completed
INFO - Workflow shutting down - AUTOMATIC
INFO - DONE

cylc remove

Since #5643 is not done yet, we can't remove future tasks, so need an example that spawns a waiting task to be removed:

[scheduling]
    [[graph]]
        # b spawns c before a removes c
        R1 = "a & b => c"
[runtime]
    [[a]]
        script = "sleep 5; cylc remove $CYLC_WORKFLOW_ID//1/c"
    [[b,c]]

Result: 1/c does not run (already spawned in this flow; and prior removal was logged).

...
INFO - [1/c:waiting(runahead)] => waiting
INFO - [1/b/01:succeeded] removed from active task pool: completed
INFO - Command "remove_tasks" received. ID=f2834293-11dc-49cd-ae5c-e1d9f5cbdb42
    remove_tasks(tasks=['1/c'])
INFO - [1/c:waiting] removed from active task pool: request
INFO - Command "remove_tasks" actioned. ID=f2834293-11dc-49cd-ae5c-e1d9f5cbdb42
INFO - [1/a/01:running] => succeeded
INFO - Not spawning 1/c: already used in this flow
INFO - [1/a/01:succeeded] removed from active task pool: completed
INFO - Workflow shutting down - AUTOMATIC
INFO - DONE

I think this is what we want, subject to possible revision when #5643 is implemented.

hjoliver · 2024-04-18T23:53:00Z

Can rebase this onto 8.2.x if necessary (it seems we may get another 8.2 release out before 8.3.0?)

oliver-sanders · 2024-04-19T10:12:42Z

Yes, we've got a cylc-rose issue that we would like to fix in the 8.2 series so are planning another release.

hjoliver · 2024-04-21T09:21:46Z

This actually conflicts quite badly with 8.2.x. Mind if we leave it on master 8.3.0?

oliver-sanders · 2024-04-22T09:09:03Z

Shockingly, we've only had one report of this so far (failure cases are presumably less well tested in workflows?) so we can wait a little.

* Ensure removed tasks are not automatically re-spawned by task pool logic. * Ensure removed tasks can be manually re-spawned by request.

oliver-sanders · 2024-04-22T09:52:11Z

Looks good to me, I wrote a couple of extra tests whilst reviewing - hjoliver#49

Can confirm that this fixes the bug where removed tasks could be automatically respawned at a later time by task pool logic. There's no associated issue for this, but might be worth a mention in the changelog entry.

hjoliver · 2024-04-22T10:09:10Z

I wrote a couple of extra tests whilst reviewing - hjoliver#49

Thanks, I'll take a look tomorrow. I presume you saw my new integration tests on this branch (which seem to achieve 100% patch coverage) - do yours go beyond mine?

oliver-sanders · 2024-04-22T10:15:31Z

Yes, I extended your test to ensure that it's possible to get the task back by triggering and added a new one to ensure that removed tasks remain removed. These behaviours follow naturally from the implementation, but it's worth locking them down in the tests to protect against future changes.

changes.d/6067.fix.md

cylc/flow/task_pool.py

MetRonnie

Not related to this PR but while I've noticed it: the "added to active task pool" info messages introduced in #5658 seem a bit excessive. Consider demoting to debug?

MetRonnie · 2024-04-22T10:52:44Z

cylc/flow/task_pool.py

+            submit_num == 0
+        ):
+            # Previous instance removed before completing any outputs.
+            LOG.info(f"Not spawning {point}/{name}: already used in this flow")


Not sure this message is easy to understand

Indeed, but how to describe what's happening concisely?

Basically, the flow (on this branch of the graph) came to a halt at this point because the task was removed earlier by suicide trigger or cylc remove. That definitely warrants some kind of log message.

I'll change it to this (subject to further litigation, of course):

Suggested change

LOG.info(f"Not spawning {point}/{name}: already used in this flow")

LOG.info(f"Flow stopping at {point}/{name} - task previously removed")

Actually, "flow blocked at" might be better.

I don't like "flow blocked" I'm afraid! How about:

Skipping {point}/{name} {stringify_flow_nums(flow_nums)} - task previously removed

Or just

Not spawning {point}/{name} {stringify_flow_nums(flow_nums)} - task previously removed

as it was the "already used in this flow" that I think doesn't make much sense

I think "flow blocked" is better than your two suggestions - it correctly indicates that the flow will not continue here because the task was removed.

"Skipping" suggests the flow will skip over the removed task, which is not the case.

"Spawning" (as you'll have seen in the chat) is I think being expunged from user terminology.

OK, I think options are as follows (or some close variant thereof):

flow blocked here, the task was removed earlier

(note addition of "here" to avoid implicating the entire flow)

not spawning task, it was removed earlier

Personally I think in this instance "flow blocked" is more user-friendly, in that it is more or less self-explanatory, compared to "not spawning".

But given that @oliver-sanders agrees "not spawning" is correct, and (as I've just shown) spawning probably has to remain a user-facing term, perhaps we could just stick with that.

It needs to be logged at INFO level though, to explain why the flow does not continue here, at this time.

Note, the term "spawn" ... was never a user facing term in the first place.

Ah, really?!

That's Cylc 7!

I'm very confused by these comments. It was your own argument that users should not need to understand the management (insertion or removal) of task proxies or task pool logic in general which I got on board with. It was a primary goal of SoD to remove the need for us to communicate this internal implementation detail, and a necessary side effect as SoD would make it harder for users to monitor the task pool itself. That's why spawning was retired right, s/spawn/set-outputs, users shouldn't need to understand spawning, because SoD will ensure the task proxy is always there when needed right? E.G. #2143. I agreed with you on this at the time, it was a good idea!

To quote your own proposal:

Users should not have to know about the "task pool" and the concepts of task "insert" and "remove" anymore.

I appreciate this is going back a few years now, maybe reality crept in and the thinking has changed. Perhaps it was a bit ambitions to presume we could hide the internal logic. Maybe we should write up the SoD model and communicate it to users as essential reading? We could do with writing this up in any case.

It needs to be logged at INFO level though, to explain why the flow does not continue here, at this time.

I'm not convinced it does. We are trying to communicate to users the graph model, not the internal SoD implementation (or at least I thought we were).

Take this graph for example:

a => b => c => d a => x b => x c => x d => x

If the user runs cylc remove //x, they will get a message along the lines of "x removed".

They don't need to see four "not re-spawning x" messages in a row as the graph proceeds, there is no new information in these messages. Moreover, it isn't communicating anything to them about the graph. Of course x isn't be re-spawned, I removed it! On purpose!

My approval still stands of course, I'll leave you two to find something you can agree on.

I was trying to overt a terminology argument not to stoke one, I accepted the original phrasing.

I just think "flow blocked" sounds too negative and some users will see it and think something has gone wrong

OK @MetRonnie, let's try to agree. I don't see "flow blocked" as negative, but perhaps we could add more information to explain the reason for it.

We've established that "skipping" isn't right. Are you happy with either of these, or do you have another suggestion?:

"flow blocked by previous removal of [task]"

"not spawning [task] (previously removed)"

I like "flow blocked" because it automatically conveys the fact that the flow will not continue from this point. But having argued that "spawn" is still needed to explain how flows evolve, I don't mind if we use that here too.

tests/integration/test_task_pool.py

remove: add integration tests

Co-authored-by: Ronnie Dutta <[email protected]>

hjoliver · 2024-04-23T01:40:46Z

All comments addressed. Logging is nice and succinct now:

$ cylc play --no-detach --no-timestamp bug/run17

 ▪ ■  Cylc Workflow Engine 8.3.0.dev
 ██   Copyright (C) 2008-2024 NIWA
▝▘    & British Crown (Met Office) & Contributors

INFO - Extracting job.sh to /home/oliverh/cylc-run/bug/run17/.service/etc/job.sh
INFO - Workflow: bug/run17
INFO - Scheduler: url=tcp://NIWA-1022450.niwa.local:43023 pid=18085
INFO - Workflow publisher: url=tcp://NIWA-1022450.niwa.local:43032
INFO - Run: (re)start number=1, log rollover=1
INFO - Cylc version: 8.3.0.dev
INFO - Run mode: live
INFO - Initial point: 1
INFO - Final point: 1
INFO - Cold start from 1
INFO - New flow: 1 (original flow from 1) 2024-04-23T13:40:13
INFO - [1/a:waiting(runahead)] => waiting
INFO - [1/a:waiting] => waiting(queued)
INFO - [1/a:waiting(queued)] => waiting
INFO - [1/a:waiting] => preparing
INFO - [1/a/01:preparing] submitted to localhost:background[18110]
INFO - [1/a/01:preparing] => submitted
INFO - [1/a/01:submitted] => running
INFO - [1/a/01:running] => succeeded
INFO - Workflow shutting down - AUTOMATIC
INFO - DONE

MetRonnie

Would like to see "flow blocked" changed, but happy enough

hjoliver · 2024-04-28T21:54:19Z

That's Cylc 7!

Sorry, I guess I took "never" a bit too literally!

To quote your own proposal:

Users should not have to know about the "task pool" and the concepts of task "insert" and "remove" anymore.

Ah, OK @oliver-sanders, I do understand your point now.

That statement was idealistic in retrospect, and maybe that's my fault. Early SoD discussions were very scheduler-focused and we didn't know exactly what would be in the task pool (e.g. whether partially satisfied tasks should be "hidden" or not).

Now the GUI matters too because the n=0 window IS the task pool, so we have to explain what tasks are in it and how they get there.

After going through all the options that we've thought of (to my knowledge) it still seems to me that the best terminology for this is "task pool" and "tasks get spawned (into the pool) by completion of upstream outputs". [I have actually commented on this several times over the past couple years, but not surprising if it got lost in the noise.]

I appreciate this is going back a few years now, maybe reality crept in and the thinking has changed. Perhaps it was a bit ambitions to presume we could hide the internal logic. Maybe we should write up the SoD model and communicate it to users as essential reading? We could do with writing this up in any case.

Yeah, more or less what I meant by "idealistic". The basic "SoD for Users" story for users, which can serve as the basis for documenting flows, optional outputs, and interventions, is really quite simple though:

the "task pool" is the set of tasks under active management by the scheduler, as the flow moves along the graph
- (in an infinite graph the scheduler can only manage a subset of all the tasks, in any implementation)
tasks get "spawned" into the pool by upstream (graph) outputs, or automatically if they have no parents
- (tasks have to get into the current pool somehow, in any implementation)
tasks exit the pool if/when they achieve a final status with completed outputs
- (ditto)
in GUI you see a graph-based window around the task pool
flows (and what you see in the GUI) evolve according to the graph

hjoliver · 2024-04-28T22:04:59Z

If the user runs cylc remove //x, they will get a message along the lines of "x removed".

They don't need to see four "not re-spawning x" messages in a row as the graph proceeds, there is no new information in these messages.

Yes, but you've given a simple example where the removal and its effects are contiguous and obvious. In more complex workflows, and especially once we can remove future (non-pool) tasks as planned, that may not be the case.

This is not exposing SoD implementation, it just explains why the flow does not continue at this point the graph: because at some point in the past someone or something "removed" the task there.

Whether this is INFO or DEBUG is a pretty small thing, not worth dying in a ditch over, but I can't see how hurts to explain to users why the flow halts now on this branch when the "remove" operation may have happened some time ago.

hjoliver · 2024-04-29T00:44:09Z

Just noticed you approved this @MetRonnie - I could have saved myself some writing above 🤦‍♂️

OK, I'll merge this after tweaking the log message to give a bit more information as described above, so we can push on to 8.3.0.

Of course x isn't be re-spawned, I removed it! On purpose!

BTW I (unsuccessfully!) made the exact same argument for manual expiry, which has the exact same effect on the workflow: x doesn't spawn, which blocks the flow at that point in the graph, which may result in premature shutdown.

I'll concede and demote the message to DEBUG though, because removal is pretty niche, and if the user did it they don't have too much of a right to complain when the inevitable downstream effects occur.

It's definitely a candidate for our proposed intermediate log level though #3647 (comment)

hjoliver added the bug Something is wrong :( label Apr 13, 2024

hjoliver self-assigned this Apr 13, 2024

hjoliver changed the title ~~Distinguish never spawned from suicided before submit.~~ Distinguish never-spawned from suicided-before-submit. Apr 14, 2024

hjoliver changed the title ~~Distinguish never-spawned from suicided-before-submit.~~ Distinguish never-spawned from never-submitted. Apr 15, 2024

hjoliver force-pushed the tweak-suicide branch from 29277ec to 9d6d768 Compare April 15, 2024 01:34

hjoliver added this to the cylc-8.3.0 milestone Apr 15, 2024

hjoliver force-pushed the tweak-suicide branch from 9d6d768 to b1d95e1 Compare April 15, 2024 01:43

Distinguish never-spawned from never-submitted.

da9ac3a

hjoliver force-pushed the tweak-suicide branch from b1d95e1 to da9ac3a Compare April 15, 2024 12:04

Tweak previous for flow-wait.

b0292db

oliver-sanders reviewed Apr 15, 2024

View reviewed changes

cylc/flow/task_pool.py Outdated Show resolved Hide resolved

hjoliver added 2 commits April 18, 2024 17:00

Improve suicide trigger logging.

d1645a4

New integration test.

d1443ab

hjoliver added 2 commits April 18, 2024 18:50

Fix func test.

a803092

Change log.

3de678f

hjoliver marked this pull request as ready for review April 18, 2024 06:53

Replace a couple of funcional tests.

2d3df62

remove: add integration tests

b32c6e0

* Ensure removed tasks are not automatically re-spawned by task pool logic. * Ensure removed tasks can be manually re-spawned by request.

oliver-sanders approved these changes Apr 22, 2024

View reviewed changes

changes.d/6067.fix.md Outdated Show resolved Hide resolved

cylc/flow/task_pool.py Outdated Show resolved Hide resolved

oliver-sanders requested a review from MetRonnie April 22, 2024 10:20

MetRonnie reviewed Apr 22, 2024

View reviewed changes

hjoliver and others added 4 commits April 23, 2024 10:11

Merge pull request #49 from oliver-sanders/tweak-suicide-extra-tests

7e8a096

remove: add integration tests

Update cylc/flow/task_pool.py

68f8959

Update changes.d/6067.fix.md

6cf018a

Update tests/integration/test_task_pool.py

51aae7c

Co-authored-by: Ronnie Dutta <[email protected]>

hjoliver force-pushed the tweak-suicide branch 2 times, most recently from 0f264f5 to 995734b Compare April 23, 2024 02:15

MetRonnie approved these changes Apr 26, 2024

View reviewed changes

Reduce logging verbosity, tweak tests. [skip ci]

d7305c5

hjoliver force-pushed the tweak-suicide branch from 995734b to d7305c5 Compare April 29, 2024 01:22

hjoliver merged commit 91342c3 into cylc:master Apr 29, 2024

hjoliver deleted the tweak-suicide branch April 29, 2024 01:23

oliver-sanders mentioned this pull request Jun 11, 2024

Some doc updates for 8.3.0.dev changes. cylc/cylc-doc#727

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguish never-spawned from never-submitted. #6067

Distinguish never-spawned from never-submitted. #6067

hjoliver commented Apr 13, 2024 •

edited

Loading

hjoliver commented Apr 15, 2024 •

edited

Loading

hjoliver commented Apr 18, 2024

hjoliver commented Apr 18, 2024

oliver-sanders commented Apr 19, 2024

hjoliver commented Apr 21, 2024

oliver-sanders commented Apr 22, 2024

oliver-sanders commented Apr 22, 2024 •

edited

Loading

hjoliver commented Apr 22, 2024

oliver-sanders commented Apr 22, 2024 •

edited

Loading

MetRonnie left a comment

MetRonnie Apr 22, 2024

hjoliver Apr 22, 2024

hjoliver Apr 22, 2024

MetRonnie Apr 23, 2024 •

edited

Loading

hjoliver Apr 23, 2024 •

edited

Loading

hjoliver Apr 25, 2024 •

edited

Loading

oliver-sanders Apr 26, 2024 •

edited

Loading

oliver-sanders Apr 26, 2024 •

edited

Loading

MetRonnie Apr 26, 2024

hjoliver Apr 28, 2024 •

edited

Loading

hjoliver commented Apr 23, 2024

MetRonnie left a comment

hjoliver commented Apr 28, 2024 •

edited

Loading

hjoliver commented Apr 28, 2024 •

edited

Loading

hjoliver commented Apr 29, 2024 •

edited

Loading

	LOG.info(f"Not spawning {point}/{name}: already used in this flow")
	LOG.info(f"Flow stopping at {point}/{name} - task previously removed")

Distinguish never-spawned from never-submitted. #6067

Distinguish never-spawned from never-submitted. #6067

Conversation

hjoliver commented Apr 13, 2024 • edited Loading

hjoliver commented Apr 15, 2024 • edited Loading

hjoliver commented Apr 18, 2024

suicide trigger

cylc remove

hjoliver commented Apr 18, 2024

oliver-sanders commented Apr 19, 2024

hjoliver commented Apr 21, 2024

oliver-sanders commented Apr 22, 2024

oliver-sanders commented Apr 22, 2024 • edited Loading

hjoliver commented Apr 22, 2024

oliver-sanders commented Apr 22, 2024 • edited Loading

MetRonnie left a comment

Choose a reason for hiding this comment

MetRonnie Apr 22, 2024

Choose a reason for hiding this comment

hjoliver Apr 22, 2024

Choose a reason for hiding this comment

hjoliver Apr 22, 2024

Choose a reason for hiding this comment

MetRonnie Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

hjoliver Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

hjoliver Apr 25, 2024 • edited Loading

Choose a reason for hiding this comment

oliver-sanders Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

oliver-sanders Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

MetRonnie Apr 26, 2024

Choose a reason for hiding this comment

hjoliver Apr 28, 2024 • edited Loading

Choose a reason for hiding this comment

hjoliver commented Apr 23, 2024

MetRonnie left a comment

Choose a reason for hiding this comment

hjoliver commented Apr 28, 2024 • edited Loading

hjoliver commented Apr 28, 2024 • edited Loading

hjoliver commented Apr 29, 2024 • edited Loading

hjoliver commented Apr 13, 2024 •

edited

Loading

hjoliver commented Apr 15, 2024 •

edited

Loading

oliver-sanders commented Apr 22, 2024 •

edited

Loading

oliver-sanders commented Apr 22, 2024 •

edited

Loading

MetRonnie Apr 23, 2024 •

edited

Loading

hjoliver Apr 23, 2024 •

edited

Loading

hjoliver Apr 25, 2024 •

edited

Loading

oliver-sanders Apr 26, 2024 •

edited

Loading

oliver-sanders Apr 26, 2024 •

edited

Loading

hjoliver Apr 28, 2024 •

edited

Loading

hjoliver commented Apr 28, 2024 •

edited

Loading

hjoliver commented Apr 28, 2024 •

edited

Loading

hjoliver commented Apr 29, 2024 •

edited

Loading