ENH: Add pydra.tasks.core sequence tasks #434

effigies · 2021-03-01T15:44:39Z

Acknowledgment

I acknowledge that this contribution will be available under the Apache 2 license.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Summary

Starting off #429 with Merge.

Checklist

All tests passing
I have added tests to cover my changes
I have updated documentation (if necessary)
My code follows the code style of this project
(we are using black: you can pip install pre-commit,
run pre-commit install in the pydra directory
and black will be run automatically with each commit)

pydra/tasks/core/sequences.py

…s through

effigies · 2021-03-02T13:30:48Z

pydra/tasks/core/sequences.py

+        self.output_spec = SpecInfo(
+            name="Outputs",
+            fields=[(f"out{i + 1}", list) for i in range(len(splits))],
+            bases=(BaseSpec,),
+        )
+        super().__init__(*args, **kwargs)
+        self.inputs.splits = splits


Right now no way to update output_spec if splits is computed and passed via a workflow. I'm not sure we could determine the output names in order to connect the outputs of this task if the number of splits is not known.

We could instead do something like max_splits and create all out{i} fields with the understanding that anything depending on an output that is not generated will receive a None (or attr.NOTHING...)

i'm not following this i think. a task writer should not concern themselves with splits or combines. that's outside of the task's responsibility. it should simply take whatever input is given and work with it.

The task splits a list. It has nothing to do with splitting/combining.

semantics are hard! :)

Right now no way to update output_spec if splits is computed and passed via a workflow. I'm not sure we could determine the output names in order to connect the outputs of this task if the number of splits is not known.

This is not exactly what you want, but we have also option to pass all outputs, by using lzout.all_: https://github.com/nipype/pydra/blob/master/pydra/engine/tests/test_workflow.py#L3662

in nipype splits creates a named output for the different numbers of splits. an example is the following:

let's say a subject has M T1s and N T2s and i want them to be processed together. i can create a Merge node to aggregate them (merge also produces the relevant output to split them apart), do the processing and then split them back into 2 components (a list of processed T1s and a list of processed T2s. as a workflow designer i know that there are two categories here, what i don't ahead of time is the number. hence being able to set split would be required.

M,N can vary per subject, so cannot be determined up front.

another thought here is that the outputs would come from lzout, if we can't find an attribute at construction time we can check that attribute at runtime. it may still error out, but we should be able to evaluate that connection. @effigies - would that address your question, without having to define max_splits?

Yes, I think if we could have static and dynamic output specs, then that could work. In the static case we would do name checking, and in the dynamic case we leave it to post-run.

In the specific case of Split(), you need to know how many splits you have, even if you don't know the specific number in each split. We could either do that dynamically by connections or we could do it statically by declaring a number of splits.

so if you know the number of splits in Split() you could do it with normal python function. Could someone give me a simple workflow when this could be used. I'm probably missing something important, since I still don't see where I'd like to use it.

I don't specifically care about Split(). I've literally never used it. Nonetheless the point is that there are numerous Nipype interfaces where the full set of inputs or outputs isn't known until the interface is instantiated. The Merge() example in this PR is one such that I have used and it's easier to write:

merge3 = Merge(n=3) merge4 = Merge(n=4)

Than:

@pydra.mark.task def merge3(in1, in2, in3): return in1 + in2 + in3 m3 = merge3() ...

Similarly, to do the same thing with Split() would involve repeatedly writing:

@pydra.mark.task def custom_split(inlist, splits) -> {"out1": list, "out2": list, "out3": list}: # ...

ok, so it is design to mimic these simple python function, but we just want to allow people to use it instead of creating annotation. I'm fine with this, but I'm a bit afraid of using the name Split. Perhaps SplitTask and MergeTask? I know it's longer, but having split and Split - one used as a method, the other as a class can confuse people IMO

effigies · 2021-03-02T13:35:13Z

Before moving on to IO tasks, which I suspect will take longer, I think it would be worth reviewing this for whether this is how we want to model writing tasks, as well as how we want to expose these. For instance, do we want to import all into pydra.tasks.core or keep them in their more topical subpackages.

Also, no attachment to sequence. Just avoiding keyword list.

codecov · 2021-03-02T16:30:56Z

Codecov Report

Merging #434 (7894d0b) into master (40099c7) will increase coverage by 0.12%.
The diff coverage is 89.41%.

@@            Coverage Diff             @@
##           master     #434      +/-   ##
==========================================
+ Coverage   82.40%   82.52%   +0.12%     
==========================================
  Files          20       21       +1     
  Lines        3858     3936      +78     
  Branches     1049     1071      +22     
==========================================
+ Hits         3179     3248      +69     
- Misses        487      491       +4     
- Partials      192      197       +5

Flag	Coverage Δ
unittests	`82.44% <89.41%> (+0.12%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pydra/engine/core.py	`87.74% <75.00%> (-0.49%)`	⬇️
pydra/tasks/core/sequences.py	`92.42% <92.42%> (ø)`
pydra/mark/functions.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40099c7...7894d0b. Read the comment docs.

satra · 2021-03-02T16:53:46Z

i have another question. this PR does mimic nipype, but does pydra need to fully mimic nipype for function tasks.

is there a reason why these are not functions with annotations? for pure python i would like to be as close to python as possible. in pydra-ml i took this approach (completely open to discussions).

python functions that can be tested as normal (should be annotated, which i didn't do when i wrote that and forgot). https://github.com/nipype/pydra-ml/blob/master/pydra_ml/tasks.py

and converted to pydra tasks: https://github.com/nipype/pydra-ml/blob/master/pydra_ml/classifier.py#L32

effigies · 2021-03-02T17:16:48Z

is there a reason why these are not functions with annotations?

Merge() has a variable set of inputs. Split() has a variable set of outputs.

satra · 2021-03-02T17:24:57Z

i have forgotten a lot of things :)

djarecka · 2021-03-02T17:34:32Z

I left a comment in the review section, but I'm also not convince why we want to have a special task Split...

effigies commented Mar 1, 2021

View reviewed changes

pydra/tasks/core/sequences.py Outdated Show resolved Hide resolved

effigies commented Mar 1, 2021

View reviewed changes

pydra/tasks/core/sequences.py Outdated Show resolved Hide resolved

effigies force-pushed the enh/core_tasks branch from 38ef59e to f0bf241 Compare March 1, 2021 22:20

effigies added 9 commits March 2, 2021 08:17

ENH: Add Merge task in pydra.tasks.core.sequences

d2cf9b6

FIX: Only evolve inputs if inputs are provided

dd42f63

FIX: Allow input_spec/output_spec to be class variables

a41e83c

PY37: Use typing_extensions to get Literal in Python 3.7

205d600

ENH: Add Select task

268987a

FIX: Update decorator dunders to allow docstrings and doctests to pas…

720ac0e

…s through

FIX: Not all Tasks have input_spec class attributes

d725ece

ENH: Add Split task

0bf450a

FIX: Allow input_spec to be a spec

8442d5f

effigies force-pushed the enh/core_tasks branch from 2c0046f to 8442d5f Compare March 2, 2021 13:17

effigies commented Mar 2, 2021

View reviewed changes

effigies changed the title ~~WIP: Add pydra.tasks.core tasks~~ ENH: Add pydra.tasks.core sequence tasks Mar 2, 2021

effigies added 2 commits March 2, 2021 11:25

CI: Build docs with Python 3.8.6

60b062d

FIX: Only check for self.input_spec if not in state

7894d0b

effigies force-pushed the enh/core_tasks branch from ae88101 to 7894d0b Compare March 2, 2021 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add pydra.tasks.core sequence tasks #434

ENH: Add pydra.tasks.core sequence tasks #434

effigies commented Mar 1, 2021 •

edited

Loading

effigies Mar 2, 2021

satra Mar 2, 2021

effigies Mar 2, 2021

satra Mar 2, 2021

djarecka Mar 2, 2021 •

edited

Loading

satra Mar 2, 2021 •

edited

Loading

effigies Mar 3, 2021

djarecka Mar 3, 2021

effigies Mar 3, 2021

djarecka Mar 3, 2021

effigies commented Mar 2, 2021

codecov bot commented Mar 2, 2021 •

edited

Loading

satra commented Mar 2, 2021

effigies commented Mar 2, 2021

satra commented Mar 2, 2021

djarecka commented Mar 2, 2021

ENH: Add pydra.tasks.core sequence tasks #434

Are you sure you want to change the base?

ENH: Add pydra.tasks.core sequence tasks #434

Conversation

effigies commented Mar 1, 2021 • edited Loading

Acknowledgment

Types of changes

Summary

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djarecka Mar 2, 2021 • edited Loading

Choose a reason for hiding this comment

satra Mar 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

effigies commented Mar 2, 2021

codecov bot commented Mar 2, 2021 • edited Loading

Codecov Report

satra commented Mar 2, 2021

effigies commented Mar 2, 2021

satra commented Mar 2, 2021

djarecka commented Mar 2, 2021

effigies commented Mar 1, 2021 •

edited

Loading

djarecka Mar 2, 2021 •

edited

Loading

satra Mar 2, 2021 •

edited

Loading

codecov bot commented Mar 2, 2021 •

edited

Loading