Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor codebase into a python package #32

Merged
merged 46 commits into from
Aug 19, 2024

Conversation

leifdenby
Copy link
Member

@leifdenby leifdenby commented May 16, 2024

This PR lays the groundwork for being able to install neural-lam as a package, thereby making it possible to run from anywhere once the package has been installed. This means that it would be possible (in theory) to train neural-lam on a .npy-file based dataset (once #31 has been merged so that the training configuration is moved from neural_lam/constants.py to a yaml-config file) with the neural-lam package installed into a user's site-packages (i.e. in their virtualenv).

I appreciate that currently most of us will be checking out the codebase to make modifications, and then train a model with neural-lam and so these changes might seem superfluous. But making these later will be a lot harder than doing them now.

The primary changes are:

  • move all *.py that are currently outside of neural_lam/ into that folder, but keep the files the same
  • change all examples of running the neural-lam "scripts", e.g. python create_mesh.py by python -m neural_lam.create_mesh in the README
  • change all absolute imports to package-relative imports, i.e. from . import utils rather than from neural_lam import utils
  • add tests that all the CLI entrypoints to neural_lam can be imported and add ci/cd action to run these tests

I still need to resolve the issue around depending on cpu or gpu versions of pytorch. I know @sadamov found a work-around where the cpu or gpu package was automatically picked based on whether CUDA is detected. I haven't had time to look into this yet, but once I have done that I will mark this PR as complete from my side and ask for your thoughts on it :)

This PR is definitely a work-in-progress and is meant to serve as a discussion point.

(also this PR includes changes that PR #29 adds, so this PR definitely shouldn't be merged or even reviewed probably before #29 is in)

@leifdenby
Copy link
Member Author

This looks great! Should the changelog also be updated for your changes?

Thanks! Yes, I thought I would complete the changelog once the reviews were done, but maybe I should just do it now :) I will do that

Copy link
Collaborator

@sadamov sadamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leifdenby, totally agree that refactoring the code into a Python package is the way to go. I looked at the file changes and tested a full reinstall+training based on the README -> successful ✔️. This PR can be merged as is from my side, I do have some suggestions for this or a future PR:

The package can currently not be installed with pip install -e . because of the folder structure. I suggest to make the following changes:

  • Add these lines to pyproject.toml:
[project]
name = "neural-lam"
version = "0.1.0"
  • Move the neural-lam folder to src/neural-lam
  • Update the default path to the config yaml file in all relevant files to src/neural_lam/data_config.yaml
    • plot_graph.py
    • create_grid_features.py
    • create_mesh.py
    • create_parameter_weights.py
    • train_model.py

Now installing the neural-lam package works:
image

@joeloskarsson
Copy link
Collaborator

Great stuff! Happy to give this a full review later this week, but likely I won't have much to add.

  • I agree that being able to install with pip install -e seems important (necessary for us working with developing the package?). Moving most code to src/neural-lam/... seems like annoyingly many sub-directories. Is there some other way to do achieve this?
  • I have also implicitly viewed this as part of the 0.2.0 release.

@joeloskarsson
Copy link
Collaborator

I think with this we should also adjust the installation instructions in the Readme to install as a package.

Copy link
Collaborator

@joeloskarsson joeloskarsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all changes are good 😄 I would propose to add a couple more things with this (summary of the same points I commented above, but here in a proper review):

  1. Update the install instructions in the README to include installing the actual package. These should change even more once Move dependencies to pyproject toml and create ci/cd install and import test #37 is done, but if this PR makes it into a package it makes sense to add that there now.
  2. Enable installing with pip install -e .. I figured out a better way (than putting everything in a src dir) to achieve this is to add to pyproject.toml
[tool.setuptools]
py-modules = ["neural_lam"]
  1. Add __init__.py files to neural_lam and neural_lam/models that exposes submodules and classes directly in the imported neural_lam module. That way you can do something like
import neural_lam as nl

a = nl.utils.load_graph(...)

and it means that you can e.g. import each model directly from neural_lam.models.

If points 2 and 3 above sound good I put those changes already now on the branch package_inits in this repo, so you can just merge them in (diff with this PR: leifdenby/neural-lam@maint/refactor-as-package...mllam:neural-lam:package_inits)

@joeloskarsson
Copy link
Collaborator

I couldn't get the linting all green on package_inits branch. Looks like flake8 does not use the options we have specified in pyproject.toml (https://flake8.pycqa.org/en/latest/user/configuration.html#configuration-locations). Maybe we can use https://github.com/csachs/pyproject-flake8 to fix that?

@leifdenby
Copy link
Member Author

  • change all absolute imports to package-relative imports, i.e. from . import utils rather than from neural_lam import utils

I learnt yesterday from @khintz that I have for years been wrong about relative vs absolute imports. I have for as long as I can remember I thought that relative imports were recommended over absolute imports. But I realise now this is incorrect. I do still prefer them, but I was wondering what people think. I would be ok with sticking with absolute imports if that is the consensus. Maybe a thumbs-up for relative imports and thumbs-down for relative imports on this comment could serve as a vote? @joeloskarsson, @sadamov, @SimonKamuk, @khintz

@leifdenby
Copy link
Member Author

  • Moving most code to src/neural-lam/... seems like annoyingly many sub-directories. Is there some other way to do achieve this?

We don't have to add the src/ path prefix, but neural-lam/ would be enough. How about we stick with that?

@khintz
Copy link
Contributor

khintz commented Aug 14, 2024

I don't have very strong feelings about relative vs absolute imports, but if I have to choose one, I would go for absolute. That is easier to read (IMO) and there is general consensus towards that.

@leifdenby
Copy link
Member Author

3. Add __init__.py files to neural_lam and neural_lam/models that exposes submodules and classes directly in the imported neural_lam module.

I see the merit of this @joeloskarsson, but it does mean that the time it takes to import neural-lam is increased. I think this why people generally don't do this by default, but I am ok with doing this. I don't see it as necessary though to make the codebase into a package.

@leifdenby
Copy link
Member Author

I couldn't get the linting all green on package_inits branch. Looks like flake8 does not use the options we have specified in pyproject.toml (https://flake8.pycqa.org/en/latest/user/configuration.html#configuration-locations). Maybe we can use https://github.com/csachs/pyproject-flake8 to fix that?

Thanks @joeloskarsson, you're right. I tried adding pyproject-flake8 to the pre-commit config, but that didn't work. On the other hand it looks like https://github.com/john-hen/Flake8-pyproject did 😄

@leifdenby
Copy link
Member Author

I think all changes are good 😄 I would propose to add a couple more things with this (summary of the same points I commented above, but here in a proper review):

Thanks @joeloskarsson! It was really helpful to have this list. I have merged in your changes and found a way around the flake8 issues (as you highlighted too). I have simply changed the install instructions to reflect that once torch is installed then the rest can be installed with pip install -e ., that was what you were intending too, right? I think this ready for a last review from you if you have time.

Copy link
Collaborator

@joeloskarsson joeloskarsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have an opinion on relative vs absolute imports in the package. Any way is fine by me.

(about importing sub-modules in __init__.py)

I see the merit of this @joeloskarsson, but it does mean that the time it takes to import neural-lam is increased. I think this why people generally don't do this by default, but I am ok with doing this. I don't see it as necessary though to make the codebase into a package.

Yes that is a good point. I suppose there is a tradeoff here in terms of speed vs convenience. I think we can go with the convenience route, as the package is still fairly small and there is not a lot to import. I am quite used to this as both torch and pyg does import the submodules in __init__.py, so there could also be a point in being consistent with them.

I have simply changed the install instructions to reflect that once torch is installed then the rest can be installed with pip install -e ., that was what you were intending too, right?

Yes, that seems good. No need to over-complicate the installation setup and instructions here when we have more changes around that coming in #37

Great that you could get the flake8 fix to work!

All looks good to me.

@sadamov sadamov mentioned this pull request Aug 19, 2024
3 tasks
@leifdenby
Copy link
Member Author

All looks good to me.

Great! Thank you. I will update the CHANGELOG and then merge

@leifdenby leifdenby merged commit a54c45f into mllam:main Aug 19, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants