Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add pydra.tasks.core sequence tasks #434

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

effigies
Copy link
Contributor

@effigies effigies commented Mar 1, 2021

Acknowledgment

  • I acknowledge that this contribution will be available under the Apache 2 license.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Summary

Starting off #429 with Merge.

Checklist

  • All tests passing
  • I have added tests to cover my changes
  • I have updated documentation (if necessary)
  • My code follows the code style of this project
    (we are using black: you can pip install pre-commit,
    run pre-commit install in the pydra directory
    and black will be run automatically with each commit)

Comment on lines +49 to +55
self.output_spec = SpecInfo(
name="Outputs",
fields=[(f"out{i + 1}", list) for i in range(len(splits))],
bases=(BaseSpec,),
)
super().__init__(*args, **kwargs)
self.inputs.splits = splits
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now no way to update output_spec if splits is computed and passed via a workflow. I'm not sure we could determine the output names in order to connect the outputs of this task if the number of splits is not known.

We could instead do something like max_splits and create all out{i} fields with the understanding that anything depending on an output that is not generated will receive a None (or attr.NOTHING...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not following this i think. a task writer should not concern themselves with splits or combines. that's outside of the task's responsibility. it should simply take whatever input is given and work with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task splits a list. It has nothing to do with splitting/combining.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

semantics are hard! :)

Copy link
Collaborator

@djarecka djarecka Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now no way to update output_spec if splits is computed and passed via a workflow. I'm not sure we could determine the output names in order to connect the outputs of this task if the number of splits is not known.

This is not exactly what you want, but we have also option to pass all outputs, by using lzout.all_: https://github.com/nipype/pydra/blob/master/pydra/engine/tests/test_workflow.py#L3662

Copy link
Contributor

@satra satra Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in nipype splits creates a named output for the different numbers of splits. an example is the following:

let's say a subject has M T1s and N T2s and i want them to be processed together. i can create a Merge node to aggregate them (merge also produces the relevant output to split them apart), do the processing and then split them back into 2 components (a list of processed T1s and a list of processed T2s. as a workflow designer i know that there are two categories here, what i don't ahead of time is the number. hence being able to set split would be required.

M,N can vary per subject, so cannot be determined up front.

another thought here is that the outputs would come from lzout, if we can't find an attribute at construction time we can check that attribute at runtime. it may still error out, but we should be able to evaluate that connection. @effigies - would that address your question, without having to define max_splits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think if we could have static and dynamic output specs, then that could work. In the static case we would do name checking, and in the dynamic case we leave it to post-run.

In the specific case of Split(), you need to know how many splits you have, even if you don't know the specific number in each split. We could either do that dynamically by connections or we could do it statically by declaring a number of splits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if you know the number of splits in Split() you could do it with normal python function. Could someone give me a simple workflow when this could be used. I'm probably missing something important, since I still don't see where I'd like to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't specifically care about Split(). I've literally never used it. Nonetheless the point is that there are numerous Nipype interfaces where the full set of inputs or outputs isn't known until the interface is instantiated. The Merge() example in this PR is one such that I have used and it's easier to write:

merge3 = Merge(n=3)
merge4 = Merge(n=4)

Than:

@pydra.mark.task
def merge3(in1, in2, in3):
    return in1 + in2 + in3

m3 = merge3()

...

Similarly, to do the same thing with Split() would involve repeatedly writing:

@pydra.mark.task
def custom_split(inlist, splits) -> {"out1": list, "out2": list, "out3": list}:
    # ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so it is design to mimic these simple python function, but we just want to allow people to use it instead of creating annotation. I'm fine with this, but I'm a bit afraid of using the name Split. Perhaps SplitTask and MergeTask? I know it's longer, but having split and Split - one used as a method, the other as a class can confuse people IMO

@effigies effigies changed the title WIP: Add pydra.tasks.core tasks ENH: Add pydra.tasks.core sequence tasks Mar 2, 2021
@effigies
Copy link
Contributor Author

effigies commented Mar 2, 2021

Before moving on to IO tasks, which I suspect will take longer, I think it would be worth reviewing this for whether this is how we want to model writing tasks, as well as how we want to expose these. For instance, do we want to import all into pydra.tasks.core or keep them in their more topical subpackages.

Also, no attachment to sequence. Just avoiding keyword list.

@codecov
Copy link

codecov bot commented Mar 2, 2021

Codecov Report

Merging #434 (7894d0b) into master (40099c7) will increase coverage by 0.12%.
The diff coverage is 89.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #434      +/-   ##
==========================================
+ Coverage   82.40%   82.52%   +0.12%     
==========================================
  Files          20       21       +1     
  Lines        3858     3936      +78     
  Branches     1049     1071      +22     
==========================================
+ Hits         3179     3248      +69     
- Misses        487      491       +4     
- Partials      192      197       +5     
Flag Coverage Δ
unittests 82.44% <89.41%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pydra/engine/core.py 87.74% <75.00%> (-0.49%) ⬇️
pydra/tasks/core/sequences.py 92.42% <92.42%> (ø)
pydra/mark/functions.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40099c7...7894d0b. Read the comment docs.

@satra
Copy link
Contributor

satra commented Mar 2, 2021

i have another question. this PR does mimic nipype, but does pydra need to fully mimic nipype for function tasks.

is there a reason why these are not functions with annotations? for pure python i would like to be as close to python as possible. in pydra-ml i took this approach (completely open to discussions).

python functions that can be tested as normal (should be annotated, which i didn't do when i wrote that and forgot). https://github.com/nipype/pydra-ml/blob/master/pydra_ml/tasks.py

and converted to pydra tasks: https://github.com/nipype/pydra-ml/blob/master/pydra_ml/classifier.py#L32

@effigies
Copy link
Contributor Author

effigies commented Mar 2, 2021

is there a reason why these are not functions with annotations?

Merge() has a variable set of inputs. Split() has a variable set of outputs.

@satra
Copy link
Contributor

satra commented Mar 2, 2021

i have forgotten a lot of things :)

@djarecka
Copy link
Collaborator

djarecka commented Mar 2, 2021

I left a comment in the review section, but I'm also not convince why we want to have a special task Split...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants