[ENH]: Improve `BasePreprocessor` #310

synchon · 2024-03-07T11:50:52Z

description of feature/fix
tests added/passed
add an entry for the latest changes

This PR refactors preprocess method of preprocessors to not return "data type" and only return the preprocessed input data. Having to return "data type" forced a concrete preprocessor to only operate on a single "data type" (like BOLDWarper introduced in #267) and not allow the on parameter to be exposed to the user (as requested in #301). If a concrete preprocessor adds new "data type" like BOLD_mask in the "junifer data object" as fMRIPrepConfoundRemover (introduced in #111) does, it will be handled as usual with no changes.

A preprocessor should not create new "data types" (which was allowed earlier and hence the restriction) but only create and add "helper data types" like BOLD_mask which happens via extra_input of the preprocess method.

github-actions · 2024-03-07T11:59:44Z

PR Preview Action v1.4.7
Preview removed because the pull request was closed.
2024-03-13 08:50 UTC

fraimondo · 2024-03-07T13:02:54Z

I think I have an issue with this PR.

How is the BOLD_mask kept even before the PR? I check the fMRIPrepConfoundRemover and all we do is add it to the extra_input. But how is this kept???? Is this actually working???
Not allowing the preprocessor to "create new datatypes" is a bit strong. Indeed there's no difference between "new data types" and "helper data types".

synchon · 2024-03-07T14:16:09Z

How is the BOLD_mask kept even before the PR? I check the fMRIPrepConfoundRemover and all we do is add it to the extra_input. But how is this kept???? Is this actually working???

Since the input is not copied for extra_input, it's kept but if we copy it like we do for markers, it won't be kept.

Not allowing the preprocessor to "create new datatypes" is a bit strong. Indeed there's no difference between "new data types" and "helper data types".

The reason is because BOLD or T1w would be a "data type" for me but not BOLD_mask. But, we are on the same page with our understanding.

synchon · 2024-03-07T14:18:40Z

How is the BOLD_mask kept even before the PR? I check the fMRIPrepConfoundRemover and all we do is add it to the extra_input. But how is this kept???? Is this actually working???

Since the input is not copied for extra_input, it's kept but if we copy it like we do for markers, it won't be kept.

We could have the extra_input be returned from preprocess method to make it explicit though. What do you think?

fraimondo · 2024-03-07T14:33:41Z

Check this out:

This is how we "add" the BOLD_mask:

junifer/junifer/preprocess/confounds/fmriprep_confound_remover.py

Lines 575 to 580 in a7bd0f6

    
           if extra_input is not None: 
        
               logger.debug("Setting mask_item") 
        
               extra_input["BOLD_mask"] = { 
        
                   "data": mask_img, 
        
                   "space": input["space"], 
        
               }

But this is how we treat the input in the _fit_transform function:

junifer/junifer/preprocess/base.py

Lines 173 to 192 in a7bd0f6

    
           out = input 
        
           for type_ in self._on: 
        
               if type_ in input.keys(): 
        
                   logger.info(f"Preprocessing {type_}") 
        
                   t_input = input[type_] 
        
                   # Pass the other data types as extra input, removing 
        
                   # the current type 
        
                   extra_input = input 
        
                   extra_input.pop(type_) 
        
                   logger.debug( 
        
                       f"Extra input for preprocess: {extra_input.keys()}" 
        
                   ) 
        
                   key, t_out = self.preprocess( 
        
                       input=t_input, extra_input=extra_input 
        
                   ) 
        
                   # Add the output to the Junifer Data object 
        
                   logger.debug(f"Adding {key} to output") 
        
                   out[key] = t_out

So basically we have out, which is input, which is also extra_input. We then pop the type, modify the extra_input and re-add the type to out.

This is madness. My madness, but madness still.

I don't know how to tackle this to make it proper.

Problems we currently have:

We are splitting the input, which is also the out variable in two, for no reason.
We are giving the preprocess function all the data (input), even if they do not need it.
We are "adding" new "data types" by modifying the extra_input variable.

A more declarative way would be to:

Only pass the input and the relevant extra_input to the preprocess function (not all the data)
Use the returned values to update the out. Maybe by getting a dictionary of key/value pairs?

synchon · 2024-03-07T14:59:51Z

A more declarative way would be to:

Only pass the input and the relevant extra_input to the preprocess function (not all the data)

Use the returned values to update the out. Maybe by getting a dictionary of key/value pairs?

I agree with the steps and it's similar to what I proposed in the earlier comment.

In the beginning of _fit_transform, we could copy the input to out like we do for DataReader so that we don't lose anything.
Then from preprocess we return the first value as the input dict (as we do now).
The second value from preprocess could be "helper data type(s)" dict or None and we check for it in _fit_transform. If we have it as None, we don't do anything, else we update out.

fraimondo · 2024-03-07T15:02:23Z

Perfect!

…eprocess()

…process()

…ver.preprocess() and clean up

synchon · 2024-03-08T10:39:15Z

A more declarative way would be to:

Only pass the input and the relevant extra_input to the preprocess function (not all the data)

Use the returned values to update the out. Maybe by getting a dictionary of key/value pairs?

I agree with the steps and it's similar to what I proposed in the earlier comment.

In the beginning of _fit_transform, we could copy the input to out like we do for DataReader so that we don't lose anything.

Then from preprocess we return the first value as the input dict (as we do now).

The second value from preprocess could be "helper data type(s)" dict or None and we check for it in _fit_transform. If we have it as None, we don't do anything, else we update out.

@fraimondo The latest commits should implement this.

codecov · 2024-03-08T11:10:36Z

Codecov Report

Attention: Patch coverage is 96.96970% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 88.95%. Comparing base (6ff3fd3) to head (5668e63).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #310      +/-   ##
==========================================
+ Coverage   88.90%   88.95%   +0.05%     
==========================================
  Files         103      103              
  Lines        4615     4610       -5     
  Branches      868      868              
==========================================
- Hits         4103     4101       -2     
+ Misses        368      366       -2     
+ Partials      144      143       -1

Flag	Coverage Δ
junifer	`88.95% <96.96%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
junifer/preprocess/base.py	`87.50% <100.00%> (+6.25%)`	⬆️
.../preprocess/confounds/fmriprep_confound_remover.py	`98.74% <100.00%> (-0.04%)`	⬇️
junifer/preprocess/bold_warper.py	`38.88% <66.66%> (ø)`

junifer/preprocess/confounds/fmriprep_confound_remover.py

fraimondo

Tests are not passing yet, but code is OK from my side.

synchon · 2024-03-11T11:34:14Z

Tests are not passing yet, but code is OK from my side.

Was a network issue for downloading assets, nothing from our side. Let's wait and see.

synchon force-pushed the refactor/preprocessor-fit-transform branch from adc0811 to fee73cf Compare March 7, 2024 11:53

synchon requested a review from fraimondo March 7, 2024 11:53

synchon self-assigned this Mar 7, 2024

synchon added preprocess Issues or pull requests related to preprocessors enhancement New feature or request labels Mar 7, 2024

synchon added 7 commits March 7, 2024 17:49

chore: improve docstrings and type annotation for preprocessors

686e0f8

refactor: disable return of data type from BasePreprocessor.preprocess()

ceb5c23

refactor: disable return of data type from BOLDWarper.preprocess()

ccc5d76

refactor: disable return of data type from fMRIPrepConfoundRemover.pr…

7f288de

…eprocess()

refactor: enable return of helper data type from BasePreprocessor.pre…

f2b433e

…process()

refactor: enable return of helper data type from BOLDWarper.preprocess()

124886b

refactor: enable return of helper data type from fMRIPrepConfoundRemo…

5f41a4d

…ver.preprocess() and clean up

synchon force-pushed the refactor/preprocessor-fit-transform branch from fee73cf to 5f41a4d Compare March 8, 2024 10:37

chore: add changelogs 310.{change,enh}

5668e63

synchon added this to the 0.0.4 (alpha 3) milestone Mar 8, 2024

fraimondo reviewed Mar 11, 2024

View reviewed changes

junifer/preprocess/confounds/fmriprep_confound_remover.py Show resolved Hide resolved

fraimondo approved these changes Mar 11, 2024

View reviewed changes

synchon merged commit 5540b36 into main Mar 13, 2024
19 of 20 checks passed

synchon deleted the refactor/preprocessor-fit-transform branch March 13, 2024 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH]: Improve `BasePreprocessor` #310

[ENH]: Improve `BasePreprocessor` #310

synchon commented Mar 7, 2024 •

edited

Loading

github-actions bot commented Mar 7, 2024 •

edited

Loading

fraimondo commented Mar 7, 2024

synchon commented Mar 7, 2024

synchon commented Mar 7, 2024

fraimondo commented Mar 7, 2024

synchon commented Mar 7, 2024

fraimondo commented Mar 7, 2024

synchon commented Mar 8, 2024

codecov bot commented Mar 8, 2024 •

edited

Loading

fraimondo left a comment

synchon commented Mar 11, 2024

[ENH]: Improve BasePreprocessor #310

[ENH]: Improve BasePreprocessor #310

Conversation

synchon commented Mar 7, 2024 • edited Loading

github-actions bot commented Mar 7, 2024 • edited Loading

fraimondo commented Mar 7, 2024

synchon commented Mar 7, 2024

synchon commented Mar 7, 2024

fraimondo commented Mar 7, 2024

synchon commented Mar 7, 2024

fraimondo commented Mar 7, 2024

synchon commented Mar 8, 2024

codecov bot commented Mar 8, 2024 • edited Loading

Codecov Report

fraimondo left a comment

Choose a reason for hiding this comment

synchon commented Mar 11, 2024

[ENH]: Improve `BasePreprocessor` #310

[ENH]: Improve `BasePreprocessor` #310

synchon commented Mar 7, 2024 •

edited

Loading

github-actions bot commented Mar 7, 2024 •

edited

Loading

codecov bot commented Mar 8, 2024 •

edited

Loading