Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH]: Handling of relative paths for user-defined masks/parcellations #212

Open
1 task done
fraimondo opened this issue Mar 28, 2023 · 8 comments
Open
1 task done
Assignees
Labels
concept Design/Implementation concepts enhancement New feature or request question Further information is requested

Comments

@fraimondo
Copy link
Contributor

Are you requiring a new dataset or marker?

  • I understand this is not a marker or dataset request

Which feature do you want to include?

Following the discussion in #127, we need a way of allowing the user to create an extension that registers a mask/parcel using a relative path to the mask (from the location of the .py extensions file).

Asking for absolute paths directly will not work in cases that extensions are stored in a repository, like juni-farm.

The issues comes with the queue command, as the .py file will be copied, but the data/parcellation/etc will not.

@LeSasse @synchon

How do you imagine this integrated in junifer?

I don't have a clear idea on how to do it, but I can imagine something like this.

  1. the .py file with the extension has a "list of resources" that need to be copied (maybe in a variable, or comment)
  2. the queue command:
    2.1 parses the file
    2.2 finds the list of resources
    2.3 computes the absolute path
    2.4 copies them to a data directory in the junifer_jobs directory.

One problem that this issue has is that the .py file needs to somehow be modified so the path of the resources, after the queue command, are changed to the new data directory in the junifer_jobs directory.

Maybe some function junifer_path('../relative/path/to/data.nii.gz', __FILE__) can take care of solving the relative path. This function will first check in the ./data directory before using Path(__FILE__).parent.

Other more strict but easier to implement option is that each extension has a data directory next to it that can be copied in the junifer jobs. The problem with this approach is that users might end up repeating data in repositories for each extension they create.

Do you have a sample code that implements this outside of junifer?

No response

Anything else to say?

No response

@fraimondo fraimondo added enhancement New feature or request triage New issues waiting to be reviewed labels Mar 28, 2023
@fraimondo fraimondo changed the title [ENH]: [ENH]: Handling of relative paths for user-defined masks/parcellations Mar 28, 2023
@fraimondo fraimondo added question Further information is requested concept Design/Implementation concepts and removed triage New issues waiting to be reviewed labels Mar 28, 2023
@LeSasse
Copy link
Collaborator

LeSasse commented Mar 28, 2023

Actually, maybe it makes sense to provide a YAML interface for the register_* functions, such that one puts in the YAML sth like:

register_parcellations:
  name: CustomParcellation
  path: path/to/parcellation.nii.gz
  parcels_labels: path/to/labels.txt

Then one can treat this as a relative path from YAML similar to other things, but not sure if this will then end up cluttering the YAML and will also not solve the problem for other extensions that still may or may not rely on relative paths to some degree. Perhaps something like allowing user made script-based extensions to use command line arguments. For example one could make a python script to register the parcellation like:

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("parc_path")

# just pseudocodeish
register_parcellation(parc_path)

The arguments could then be parametrised in the YAML, so that paths can be treated as relative from there:

with:
  - my_parc_extension.py: [path/to/parc, potential_other_cmd_arg]

Or one could impose some constraint, like, the argument could be a path from yaml to a directory that contains all of the extension data ressources.
Just a thought, but not sure how good this idea is, both from a technical point of view, and from a user-friendliness point of view.

I think the other option, that is also a good option, just stick with the implementation and just document it really well and explictly, to prevent confusion early on.

@fraimondo
Copy link
Contributor Author

These seems nice concepts that I have not considered. I will just think while I type.

The main goals of YAML are:

  1. To provide code-less configurations.
  2. Easy to use
  3. Reproduciblity (the yaml is embedded in the metadata of the feature as a json).

Option 1) from your suggestion (adding a register_parcellations section), for me, goes against 3. Reproduciblity is limited as long as you use core junifer features. The moment that there's a with statement with a non-junifer module (e.g.: external .py or package), we can't guarantee what's in there. This also applies with masks/parcellation. The mask/parcellation is not known to junifer, so we can't reproduce this result: if you send me this yaml file, I can't run it.

Additionally, users that want their specific masks/parcellations should be able to create them. At this point, we consider them power users and assume that they have enough coding knowledge to write the .py file that registers the mask.

Option 2 is a bit more complicated. First, it allows the user to "code" within the yaml. Second, the items in the with statement are imported, not run.

Overall, we need to consider what the user is doing, and not only the feature we want to support.

My take is:

  1. The parcellation is a common parcellation -> create an issue in github and we will add it. Adding common/published parcellations is not complicated.
  2. The parcellation is made by a user -> the user is a power user and can write the extension.

1 and 2 also applies for masks.

Let's also keep in mind that we have pending a nice feature regarding masks: https://www.templateflow.org/

from templateflow import api as tflow
tflow.get('MNI152NLin6Asym', desc=None, resolution=1,
          suffix='T1w', extension='nii.gz')
PosixPath('/templateflow_home/tpl-MNI152NLin6Asym/tpl-MNI152NLin6Asym_res-01_T1w.nii.gz')

@LeSasse
Copy link
Collaborator

LeSasse commented Mar 29, 2023

I agree with all the points above. I think if users that make extensions are considered power users they can be expected to understand the subtleties of the relative paths issue, so my leaning would be towards keeping the implementation as is (paths in the YAML relative from the YAML, paths in extensions relative from the junifer working directory), and simply summarising in the docs quite well how the paths are intended to be used in each situation.

@synchon
Copy link
Member

synchon commented Mar 29, 2023

From another POV:

  • Define a .juniferignore file which lists the file not of concern to junifer.
  • Everything else in the root directory (where we have .yaml and .py) is copied along with the .yaml and .py.
  • Now, the user uses relative paths for masks / parcellations / coordinates in .py i.e., relative to .py.
  • In .yaml, the relative paths there are relative to it so when we run we are sure everything works as expected.
  • In the docs, we explain the idea as "use stuff relative to whatever file you are in".

@fraimondo
Copy link
Contributor Author

I don't completely understand your approach @synchon

What we want to address here is relative paths in .py files that are included in a with statement. So lets forget about the rest.

Imagine this structure of a github repo in which juni-farm is added as a submodule in the external directory.

project_dir/
  junifer_yamls/
    my_yaml1.yaml  // adds ../external/masks/custom_mask_extension.py in the "with" section
    my_yaml2.yaml
  external/
    juni-farm/
      data/
        masks/
          this_custom_mask.nii.gz
      masks/
        custom_mask_extension.py  // registers ../data/masks/this_custom_mask.nii.gz

My take:

  1. The user needs to user relative paths by calling a junifer function: junifer_path('../relative/path/to/data.nii.gz', __FILE__) in custom_mask_extension.py
  2. The queue command then:
    2.1) parses the .py file
    2.2) look for lines with junifer_path
    2.3) computes the absolute path for each resource
    2.4) copies each resource to a data directory inside the corresponding jobdir
  3. the junifer_path function needs to first check inside the data dir relative to the yaml location for the file. Otherwise, looks in Path(__FILE__).parent / '../relative/path/to/data.nii.gz'

@synchon
Copy link
Member

synchon commented Mar 29, 2023

I don't completely understand your approach @synchon

What we want to address here is relative paths in .py files that are included in a with statement. So lets forget about the rest.

Imagine this structure of a github repo in which juni-farm is added as a submodule in the external directory.

project_dir/
  junifer_yamls/
    my_yaml1.yaml  // adds ../external/masks/custom_mask_extension.py in the "with" section
    my_yaml2.yaml
  external/
    juni-farm/
      data/
        masks/
          this_custom_mask.nii.gz
      masks/
        custom_mask_extension.py  // registers ../data/masks/this_custom_mask.nii.gz

My take:

  1. The user needs to user relative paths by calling a junifer function: junifer_path('../relative/path/to/data.nii.gz', __FILE__) in custom_mask_extension.py
  2. The queue command then:
    2.1) parses the .py file
    2.2) look for lines with junifer_path
    2.3) computes the absolute path for each resource
    2.4) copies each resource to a data directory inside the corresponding jobdir
  3. the junifer_path function needs to first check inside the data dir relative to the yaml location for the file. Otherwise, looks in Path(__FILE__).parent / '../relative/path/to/data.nii.gz'

I got it correct the previous time and I see what you mean. My proposal was a bit holistic in a way:

  1. Basic single YAML file:
project_dir/
  my_yaml1.yaml  // adds ./custom_mask_extension.py in the "with" 
  custom_mask_extension.py  // registers ./this_custom_mask.nii.gz
  this_custom_mask.nii.gz
  README.md
  .juniferignore

.juniferignore:

README.md
  • Junifer copies everything in project_dir/ to junfier_jobs/my_job/ except README.md.
  1. Basic multi YAML file:
project_dir/
  my_yaml1.yaml  // adds ./custom_mask_extension.py in the "with"
  my_yaml2.yaml
  custom_mask_extension.py  // registers ./this_custom_mask.nii.gz
  this_custom_mask.nii.gz
  README.md
  .juniferignore

.juniferignore:

README.md
  • Junifer copies everything in project_dir/ to junifer_jobs/my_job/ except README.md and my_yaml2.yaml.
  1. Advanced single YAML file:
project_dir/
  README.md
  junifer_yamls/
    my_yaml1.yaml  // adds ../external/juni-farm/masks/custom_mask_extension.py in the "with" section
  external/
    juni-farm/
      data/
        masks/
          this_custom_mask.nii.gz
      masks/
        custom_mask_extension.py  // registers ../data/masks/this_custom_mask.nii.gz

.juniferignore:

README.md
  • Junifer copies everything in project_dir/ to junifer_jobs/my_job/ except README.md maintaining the path structure.
  1. Advanced multi YAML file:
project_dir/
  README.md
  junifer_yamls/
    my_yaml1.yaml  // adds ../external/juni-farm/masks/custom_mask_extension.py in the "with" section
    my_yaml2.yaml
  external/
    juni-farm/
      data/
        masks/
          this_custom_mask.nii.gz
      masks/
        custom_mask_extension.py  // registers ../data/masks/this_custom_mask.nii.gz

.juniferignore:

README.md
  • Junifer copies everything in project_dir/ to junifer_jobs/my_job/ except README.md and my_yaml2.yaml maintaining the path structure.

@fraimondo
Copy link
Contributor Author

But with your approach a junifer config goes from a yaml file to a directory structure. This will mean that the queue command will end up copying the ENTIRE juni-farm. What if the relative path is even outside project_dir? Like from another project?

For me, it makes things too complicated. We need to consider the logic from our side, modifying the user's behaviour as less as possible.

@synchon
Copy link
Member

synchon commented Mar 29, 2023

But with your approach a junifer config goes from a yaml file to a directory structure. This will mean that the queue command will end up copying the ENTIRE juni-farm. What if the relative path is even outside project_dir? Like from another project?

For me, it makes things too complicated. We need to consider the logic from our side, modifying the user's behaviour as less as possible.

I see your point and have no arguments against it. For me, a documented directory structure brings order to chaos and keeps the user and devs on same page, else I have usually found it to not go down well. Those were my two cents.

@synchon synchon added this to the 0.0.5 (alpha 4) milestone Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concept Design/Implementation concepts enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants