Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH]: Improve BasePreprocessor #310

Merged
merged 8 commits into from
Mar 13, 2024
Merged

Conversation

synchon
Copy link
Member

@synchon synchon commented Mar 7, 2024

  • description of feature/fix
  • tests added/passed
  • add an entry for the latest changes

This PR refactors preprocess method of preprocessors to not return "data type" and only return the preprocessed input data. Having to return "data type" forced a concrete preprocessor to only operate on a single "data type" (like BOLDWarper introduced in #267) and not allow the on parameter to be exposed to the user (as requested in #301). If a concrete preprocessor adds new "data type" like BOLD_mask in the "junifer data object" as fMRIPrepConfoundRemover (introduced in #111) does, it will be handled as usual with no changes.

A preprocessor should not create new "data types" (which was allowed earlier and hence the restriction) but only create and add "helper data types" like BOLD_mask which happens via extra_input of the preprocess method.

@synchon synchon force-pushed the refactor/preprocessor-fit-transform branch from adc0811 to fee73cf Compare March 7, 2024 11:53
@synchon synchon requested a review from fraimondo March 7, 2024 11:53
@synchon synchon self-assigned this Mar 7, 2024
@synchon synchon added preprocess Issues or pull requests related to preprocessors enhancement New feature or request labels Mar 7, 2024
Copy link

github-actions bot commented Mar 7, 2024

PR Preview Action v1.4.7
Preview removed because the pull request was closed.
2024-03-13 08:50 UTC

@fraimondo
Copy link
Contributor

I think I have an issue with this PR.

  1. How is the BOLD_mask kept even before the PR? I check the fMRIPrepConfoundRemover and all we do is add it to the extra_input. But how is this kept???? Is this actually working???

  2. Not allowing the preprocessor to "create new datatypes" is a bit strong. Indeed there's no difference between "new data types" and "helper data types".

@synchon
Copy link
Member Author

synchon commented Mar 7, 2024

  1. How is the BOLD_mask kept even before the PR? I check the fMRIPrepConfoundRemover and all we do is add it to the extra_input. But how is this kept???? Is this actually working???

Since the input is not copied for extra_input, it's kept but if we copy it like we do for markers, it won't be kept.

  1. Not allowing the preprocessor to "create new datatypes" is a bit strong. Indeed there's no difference between "new data types" and "helper data types".

The reason is because BOLD or T1w would be a "data type" for me but not BOLD_mask. But, we are on the same page with our understanding.

@synchon
Copy link
Member Author

synchon commented Mar 7, 2024

  1. How is the BOLD_mask kept even before the PR? I check the fMRIPrepConfoundRemover and all we do is add it to the extra_input. But how is this kept???? Is this actually working???

Since the input is not copied for extra_input, it's kept but if we copy it like we do for markers, it won't be kept.

We could have the extra_input be returned from preprocess method to make it explicit though. What do you think?

@fraimondo
Copy link
Contributor

Check this out:

This is how we "add" the BOLD_mask:

if extra_input is not None:
logger.debug("Setting mask_item")
extra_input["BOLD_mask"] = {
"data": mask_img,
"space": input["space"],
}

But this is how we treat the input in the _fit_transform function:

out = input
for type_ in self._on:
if type_ in input.keys():
logger.info(f"Preprocessing {type_}")
t_input = input[type_]
# Pass the other data types as extra input, removing
# the current type
extra_input = input
extra_input.pop(type_)
logger.debug(
f"Extra input for preprocess: {extra_input.keys()}"
)
key, t_out = self.preprocess(
input=t_input, extra_input=extra_input
)
# Add the output to the Junifer Data object
logger.debug(f"Adding {key} to output")
out[key] = t_out

So basically we have out, which is input, which is also extra_input. We then pop the type, modify the extra_input and re-add the type to out.

This is madness. My madness, but madness still.

I don't know how to tackle this to make it proper.

Problems we currently have:

  1. We are splitting the input, which is also the out variable in two, for no reason.
  2. We are giving the preprocess function all the data (input), even if they do not need it.
  3. We are "adding" new "data types" by modifying the extra_input variable.

A more declarative way would be to:

  1. Only pass the input and the relevant extra_input to the preprocess function (not all the data)
  2. Use the returned values to update the out. Maybe by getting a dictionary of key/value pairs?

@synchon
Copy link
Member Author

synchon commented Mar 7, 2024

A more declarative way would be to:

  1. Only pass the input and the relevant extra_input to the preprocess function (not all the data)
  2. Use the returned values to update the out. Maybe by getting a dictionary of key/value pairs?

I agree with the steps and it's similar to what I proposed in the earlier comment.

  • In the beginning of _fit_transform, we could copy the input to out like we do for DataReader so that we don't lose anything.
  • Then from preprocess we return the first value as the input dict (as we do now).
  • The second value from preprocess could be "helper data type(s)" dict or None and we check for it in _fit_transform. If we have it as None, we don't do anything, else we update out.

@fraimondo
Copy link
Contributor

Perfect!

@synchon synchon force-pushed the refactor/preprocessor-fit-transform branch from fee73cf to 5f41a4d Compare March 8, 2024 10:37
@synchon
Copy link
Member Author

synchon commented Mar 8, 2024

A more declarative way would be to:

  1. Only pass the input and the relevant extra_input to the preprocess function (not all the data)
  2. Use the returned values to update the out. Maybe by getting a dictionary of key/value pairs?

I agree with the steps and it's similar to what I proposed in the earlier comment.

  • In the beginning of _fit_transform, we could copy the input to out like we do for DataReader so that we don't lose anything.
  • Then from preprocess we return the first value as the input dict (as we do now).
  • The second value from preprocess could be "helper data type(s)" dict or None and we check for it in _fit_transform. If we have it as None, we don't do anything, else we update out.

@fraimondo The latest commits should implement this.

Copy link

codecov bot commented Mar 8, 2024

Codecov Report

Attention: Patch coverage is 96.96970% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 88.95%. Comparing base (6ff3fd3) to head (5668e63).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #310      +/-   ##
==========================================
+ Coverage   88.90%   88.95%   +0.05%     
==========================================
  Files         103      103              
  Lines        4615     4610       -5     
  Branches      868      868              
==========================================
- Hits         4103     4101       -2     
+ Misses        368      366       -2     
+ Partials      144      143       -1     
Flag Coverage Δ
junifer 88.95% <96.96%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
junifer/preprocess/base.py 87.50% <100.00%> (+6.25%) ⬆️
.../preprocess/confounds/fmriprep_confound_remover.py 98.74% <100.00%> (-0.04%) ⬇️
junifer/preprocess/bold_warper.py 38.88% <66.66%> (ø)

@synchon synchon added this to the 0.0.4 (alpha 3) milestone Mar 8, 2024
Copy link
Contributor

@fraimondo fraimondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are not passing yet, but code is OK from my side.

@synchon
Copy link
Member Author

synchon commented Mar 11, 2024

Tests are not passing yet, but code is OK from my side.

Was a network issue for downloading assets, nothing from our side. Let's wait and see.

@synchon synchon merged commit 5540b36 into main Mar 13, 2024
19 of 20 checks passed
@synchon synchon deleted the refactor/preprocessor-fit-transform branch March 13, 2024 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preprocess Issues or pull requests related to preprocessors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants