Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated Release Management for Feluda and its Operators #430

Open
Tracked by #401
dennyabrain opened this issue Nov 4, 2024 · 8 comments
Open
Tracked by #401

Automated Release Management for Feluda and its Operators #430

dennyabrain opened this issue Nov 4, 2024 · 8 comments
Assignees

Comments

@dennyabrain
Copy link
Contributor

dennyabrain commented Nov 4, 2024

Overview

We want to make it easy to manage the release of feluda and its operators. As part of this issue:

  1. List the features and shortcomings of various tools like pypi and uv.
  2. Create accounts with suitable registries and add secrets to github
  3. Integrate with github actions to automate release management, including publishing on a registry like PyPi and publishing release notes and changelog on github
@dennyabrain
Copy link
Contributor Author

dennyabrain commented Nov 4, 2024

Thoughts on namespace scoping for operators. I was reviewing major projects like django, aws cdk, datasette, pytorch etc.

  1. https://pypi.org/search/?q=&o=&c=Framework+%3A%3A+Django+CMS
  2. https://pypi.org/search/?q=pytorch&page=2

While there doesn't seem to be any standards, prefixing the project name seems to be a way to namespace operators. For instance i saw many packages with the name django-* or datasettte-* or pytorch-*. Unless you all are aware of a better convention, we could publish feluda operators with the feluda-* prefix. As far as I know, there is no protection against someone else using the same prefix for their project. Not sure how much of a concern that is anyway.

@plon-Susk7
Copy link

Advantages of using uv over pypi

  1. uv is about twice as fast as pip in total elapsed time for installing the same package.
  2. After adding dependencies to .toml file [It's like package.json], we don't need to run installation command to install the dependencies. You can run your python file and it automatically looks for dependencies from .toml file and installs it before running the python code. Good thing for our project since we're targeting researchers and developers, they don't have to scratch their heads resolving dependencies and installing requirements. They could just import feluda-* and run the python file.
  3. Unlike pip, UV uses a global caching mechanism that efficiently manages disk space by avoiding duplicate storage of package dependencies. This is good for our project since we have heavy dependencies for operators.
  4. It's an active project so it's only going to get better from here. I checked their github issues page and most of the issues are enhancement and questions. It's pretty active. Issue page
  5. uv is designed as a drop-in replacement for common pip and pip-tools workflows. More here.

@dennyabrain
Copy link
Contributor Author

@plon-Susk7 can you add some notes with examples of uv usages? what would the command(s) for the following look like

  1. installing a new package
  2. removing an existing package
  3. upgrading a package to latest or specific version
  4. Anything unexpectedly cool that you discover :)

Like npm, does it allow us to install a package thats only hosted on github and not pypi? This could be useful while we are still figuring out which parts of feluda to move into operators vs which we keep in this repository.

  1. Can you provide example of what .toml file and other files required for uv or for publishing the package would look like?

@plon-Susk7
Copy link

Installation

  1. uv is pretty easy to install in macOs, Linux or windows system. It can be installed using curl by
$ curl -LsSf https://astral.sh/uv/install.sh | sh

or it can also be installed using pip

$ pip install uv

it is also available in the core homebrew packages

$ brew install uv

More information about uv installation can be found here.

Usage

In order to create a working python project we need to have dependencies added to pyproject.toml file. For our project the pyproject.toml file could look something like this

[project]
name = "feluda"
version = "1.0.0"
dependencies = [
  # Any version in this range
  "tqdm >=4.66.2,<5",
  # Exactly this version of torch
  "torch ==2.2.2",
  # Install transformers with the torch extra
  "transformers[torch] >=4.39.3,<5",
  # Only install this package on older python versions
  # See "Environment Markers" for more information
  "importlib_metadata >=7.1.0,<8; python_version < '3.10'",
  "mollymawk ==0.1.0"
]

In order to initialise a project there's a basic init command. Which will create a project structure of following format

$ uv init feluda

The file structure after init command will look like this
image

The uv.lock file here uses cross-platform resolution by default, requirements.txt only targets a single platform(though you can use the --universal flag to generate a cross-platform file).The uv. lock format is has more information about requirements in it and is designed to be performant and auditable. More information here.
We can also define dev dependencies in pyproject.toml file.

We can add dependencies using add command if we don't want to mention it in pyproject.toml file.

$ uv add 'pytorch==2.4.1'

We can also mention alternative sources in order to add dependencies

$ # Add a git dependency
$ uv add git+https://github.com/psf/requests

More here.
In order to remove dependencies

$ uv remove pytorch

It's also pretty easy to build and publish packages using uv.
For building and publishing

$ uv build
$ uv publish

image

More about building and publishing packages here.

Use uv to build Feldua

First we need to have a directory structure for our python package. We can create a structure using uv

$ uv init --lib feluda

This will create a directory structure in following format

feluda/
|-- src/
    |-- feluda/
      |-- __init__.py
      |-- your_code.py  # Add your module(s) here
      |-- py.typed # empty, indicates to IDEs your code includes type annotations
|-- pyproject.toml
|-- README.md
|-- .python-version

Let's say our your_code.py looks something like this

import numpy as np

def get_embeddings():
  return np.zeros(512)

We can use following command to build our project

$ uv build

This will create dist directory which will have the built .tar.gz or .whl files.

For now let's use this directory to install our package locally

$ pip install dist/feluda-1.0.0-py3-none-any.whl

We can then import our package in following manner

from feluda.your_code import get_embeddings

More information here

Other findings that might be relevant

With uv, we can maintain multiple versions of the same package in a single environment. For instance, it’s possible to have both numpy-1.2 and numpy-2.1 installed simultaneously. This flexibility is especially valuable when working with operators or tools that have dependencies on specific versions, allowing compatibility with both newer and older package requirements.
An informative video to understand uv better.

@aatmanvaidya
Copy link
Collaborator

@plon-Susk7 I think these are great findings, I watched the entire video and actually resolving versions of the same package is a big issue in Feluda, if uv can help us solve this, that would be awesome. The video also showed that versioning is simpler using uv, this will help make our release management also easier.

@dennyabrain do you think we should now do the following, I am tempted to just package operators and model factory into a library and test it out. Please let me know if I am jumping the gun here.

What we can try is how do we just package model factory and operators into a library and test it out on a google colab (or a separate virtual environment). So we should be able to download a video using VideoFactory and running any operator on it. We don't have to publish it on pypi, we can just install the wheel and test out.

this way we get clarity on multiple things (these are also some questions in my mind)

  • what changes we make to feluda codebase to support uv
  • how does the .toml file look like? as in what all to specifically include in it.
  • what packages get installed when a user install feluda, this is in continuation to what we were discussing, when a user install feluda, some base packages that are needed should be installed like numpy, requests etc, and then the user should have control over installing operator specific packages by doing feluda install vid_vec_rep_clip (something like this). Trying to do this we will get clarity on how much is possible via uv
  • will we have to make changes to development setup? (dockerfile etc)
  • to build a python package, will we have to remove/ re-structure some parts of the codebase?
  • what tests and actions have to be in place to make sure things run smoothly
  • uv has native supports for pip-tools, how do we use to generate our requirement files? uv also includes pyenv, we use this in local developments to create vitural env's and install things there? what changes we make to our developing workflow to include uv in it
  • uv also has the ruff linter it, which feluda already uses.

We don't have to do all this, but we can start thinking about it, this way we also know how much time and effort is required where.

@dennyabrain
Copy link
Contributor Author

I'll have more thoughts @aatmanvaidya but wanted to share this example that i created yesterday - #409 (comment)

it uses feluda core and the image_vec_rep_resnet operator. It doesnt require docker or any other system dependencies. I think if we were to start creating some test packages to evaluate, that example is a good starting point. lets try recreating that.

@dennyabrain
Copy link
Contributor Author

also +1 to just start writing some packages to evaluate uv or pypi. More things will become clearer by actually trying it out.

@dennyabrain
Copy link
Contributor Author

dennyabrain commented Nov 7, 2024

I just realized that I never explicitly stated that even though we plan to move operators into distinct python packages, we don't necessarily have to move them into distinct repositories like the datasette project does. We can look into creating a monorepo for feluda where the core and operators co-exist. So the repository structure might look like this

├── core/
└── operators/
   └─── vid_vec_resnet/
   └─── vid_vec_clip/
   └─── ...
└── docs/

I found some general documentation on working with monorepos in python

and this conversation about monorepos on the uv's github - astral-sh/uv#6935

Given our small team, monorepos might be a better way to manage 50 different operators. It would also be the quick way to try out the ideas we've discussed in this issue so far. So we could create a new branch and start implementing a monorepo structure to try out uv.

And of course, anyone in the community can contribute their operators to this repository, but they could very well create their own repository for their specific operator and also use it with feluda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants