Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging end-game. #27

Closed
hjoliver opened this issue May 22, 2019 · 28 comments
Closed

Packaging end-game. #27

hjoliver opened this issue May 22, 2019 · 28 comments

Comments

@hjoliver
Copy link
Member

For cylc-8, the cylc-flow workflow service can now be installed as a Python package using pip (library plus CLI scripts).

In due course we need to consider, test, and document how to install cylc-8 (all components) by other packaging systems too (noting that pip is only for Python packages):

  • conda
  • Linux system package managers: yum, apt, ...
  • Docker (and HPC containers: singularity, ...)
  • other?
@kinow
Copy link
Member

kinow commented May 22, 2019

I think conda and containers are the simplest. I createf a conda recipe for cylc before. Without components, but I think it had gtk. Will search for it tomorrow.

@kinow
Copy link
Member

kinow commented May 23, 2019

Couldn't find it, but I remember doing it while working on Cylc packaging issues.

This probably needs to be done after the GraphQL work is merged.

We can use wheel packages (e.g. tensorflow uses a wheel package). But if we pointed to a branch in GitHub, for instance, the build command would fail with:

RuntimeError: Setuptools downloading is disabled in conda build. Be sure to add all dependencies in the meta.yaml url=https://pypi.org/simple/pytest-runner/

As conda won't download setuptools dependencies automatically. Instead we need to define which dependencies we want in conda - again. e.g. bokeh has everything in a single repo (which makes it easier to write a conda recipe, with other downsides), and its conda recipe calls a function to load data from setup.py. To avoid re-writing all dependencies there.

This is much harder in our case as we have multiple repositories.

There is a general recommendation to upload individual packages to conda, instead of using pip. And then just adding the channel used to upload them, and add each project as a requirement.

NOTE: It is important to pip install only the one desired package. Whenever possible, install dependencies with conda and not pip.

So we would end up having to upload cylc-flow, and cylc-uiserver to PYPI and use their wheel files, or upload both to a channel like cylc in Anaconda cloud.

Some quick draft of a possible reference work for myself or whoever works on this issue: https://github.com/kinow/conda-recipes/tree/master

In this draft, the source for the cylc Anaconda package is the cylc/cylc-ui... that's because this way we can download from a cylc-ui tag and unzip it. As it is a HTML/JS web application, and we don't really have where to upload its final build file.

And cylc-flow and cylc-uiserver would then be dependencies of this package. Only thing missing right now is finding a way to link the decompressed cylc-ui to cylc-uiserver. Then source activate $whatever-env-you-used, conda install cylc -c $some-channel-name, followed by jupyterhub (assuming you have a config file) should start the hub and get everything up and running.

I believe containers will be easier than Conda or RPM's. NB running conda skeleton you can see its default recipe types cpan, cran, luarocks, pypi, and rpm... so we have the option to use bdist_rpm in setup.py to produce an RPM later, or Anaconda recipe.

Bruno

@kinow
Copy link
Member

kinow commented Aug 8, 2019

For the Cylc UI I think we can use GitHub releases instead of NPM. e.g. https://github.com/kinow/cylc-ui/releases/tag/0.6-special

When creating a release in GitHub you can include binary files.

This way our tag in GitHub would include both source code and the result of npm run build.

The issue I have with NPM is that I haven't seen any web application deployed to NPM, only libraries. A quick search and found at least one SO about it.

And if Cylc UI was available in NPM, we - or other users - could run npm install cylc-ui and the web application would be downloaded under ./node_modules/cylc..., which doesn't look quite right to me.

With this, Python dependencies go to PYPI. Any JS/TS component we may develop can go to NPM. Our UI goes to GitHub releases. And solutions like Conda (or Docker, Terraform, Ansible, etc) can be used to orchestrate the setup process.

@hjoliver
Copy link
Member Author

hjoliver commented Aug 8, 2019

Sounds good to me.

@kinow
Copy link
Member

kinow commented Aug 9, 2019

I had started some work on conda recipes while working on setup.py, and have restarted some of it today: https://github.com/kinow/cylc-conda

The recipes are still work-in-progress. The main recipe is cylc, which will install:

  • cylc-flow (from git as alpha 8.0a0 is too limited)
  • cylc-uiserver (from git, not released to PYPI yet)
  • cylc-xtriggers (ditto above)
  • cylc-ui (from github releases)

Doesn't look too complicated. The main issue that I have now is that as we cannot use pip for the conda recipe build (i.e. while building our conda package, everything is either conda, or is locally installed/copied).

Which means we need to have the dependencies that we need - in this case, cylc-flow, cylc-uiserver, etc, our stuff - published not only to PYPI, but to Anaconda Cloud as well.

I've published a few to my channel today: https://anaconda.org/kinow/repo

Will finish uploading the remaining packages until Monday I think. Then the next step will be to do a conda build cylc and see if it downloads all the required dependencies correctly (i.e. if it pulls the conda dependencies and transitive-dependencies, as well as cylc-ui zip), and finally adjust the build.sh script to put everything in place.

@kinow
Copy link
Member

kinow commented Aug 9, 2019

Not too sure what to do about jupyterhub_config from cylc-uiserver. This file will probably go to somewhere like ~/anaconda3/lib/python or to some other folder if using a conda environment.

Meaning that running jupyterhub elsewhere wouldn't load the right configuration... also asking user to find the file and remember to run jupyterhub -c /location/..../jupyterhub_config.py is too tedious.

For now I think the recipe will install everything, but the configuration file will be an extra and manual step.

@matthewrmshin
Copy link
Contributor

Your Conda recipes are looking good. Do you want to move them to live under the cylc/ organisation (new repository?) sooner rather than later? We can then work on improving them if necessary.

Do we need a simple wrapper for jupyterhub -c /location/..../jupyterhub_config.py? (It can probably apply other site/user stuffs as well?)

Do we intend to have (our instance of) JupyterHub serve anything else other than Cylc UIS?

@kinow
Copy link
Member

kinow commented Aug 9, 2019

Your Conda recipes are looking good. Do you want to move them to live under the cylc/ organisation (new repository?) sooner rather than later? We can then work on improving them if necessary.

I wasn't sure if we would keep it under an existing repository in our organisation, or create a new one. My guess is we will need a new one, as it involves multiple projects. If others think this looks to be going in the right direction, +1 from me for a new repo under cylc/ (not sure about name though, I went with cylc-conda for lack of creativity + laziness).

One question for later would be, how to maintain these recipes. I'm reverse-engineering them from PYPI. Conda doesn't reverse engineer setuptools (or at least I couldn't find a way of doing it), so for the artefacts not uploaded I have to manually craft the meta.yaml. But after a release, someone would have to remember to go through this process again and re-release the conda arteracts.

Do we need a simple wrapper for jupyterhub -c /location/..../jupyterhub_config.py? (It can probably apply other site/user stuffs as well?)

Could be. Maybe an alias like cylchub=jupyterhub -c .... We could also check the JupyterHub documentation, and see if they don't support something like a default location for jupyterhub_config.py (or maybe env var?).

Do we intend to have (our instance of) JupyterHub serve anything else other than Cylc UIS?

I don't think so. That's doable I believe. The only scenario where a user could come with such a request, I think, is if they already have JupyterHub in their organisation, and for some reason want to re-use the same instance, but a bit far-fetched I guess.

@matthewrmshin
Copy link
Contributor

That's good. It gives us more freedom to wrap/alias/etc if we are only going to serve Cylc UIS under our own instance of JupyterHub.

@kinow
Copy link
Member

kinow commented Aug 10, 2019

It worked! 🎉

Once installed in my conda environment, the commands cylc-uiserver and cylc were available (*), and cylc-ui was unzipped into: /home/kinow/Development/python/anaconda3/envs/cylc1/work/cylc-ui .

The location for Cylc UI is the same as ${CONDA_PREFIX}/work/cylc-ui. In theory this gives us a Conda environment with everything we need for Cylc 8 (**)

@hjoliver if you have time next week for a quick VC, I should be able to send you the list of commands (less than a handful I think?) to create a Conda env and try to install it from my channel.

If that works, the only pending issue would be the jupyterhub config.

Cheers
Bruno

(*): I messed up isodatetime, and created a package for the old module 😭 so will have to re-build the package next week and re-build other packages that depend on it and so it goes (basically, re-do everything)
(**): I postponed adding cylc-xtriggers to save some time, as it takes some 30 minutes to write the recipe/build it/fix any issues/and upload, but will add it next week

@kinow
Copy link
Member

kinow commented Aug 11, 2019

I tried conda skeleton pypi metomi-isodatetime, but it didn't work, while the same command with isodatetime works.

The reason is that conda skeleton needs a sdist (source distribution) so that it can reverse engineer the script.

@matthewrmshin I think wheel is preferable, as this avoids running the setup/build/test etc on the user environment

wheel is designed primarily as a distribution format, so skipping the installation step also means deliberately avoiding any reliance on features that assume full installation (such as being able to use standard tools like pip and virtualenv to capture and manage dependencies in a way that can be properly tracked for auditing and security update purposes, or integrating fully with the standard build machinery for C extensions by publishing header files in the appropriate place).
https://www.python.org/dev/peps/pep-0427/

Having both, however, I think would be even better, as it gives users the possibility to use the sources if necessary - to generate Conda meta files automatically for example, but there may be other cases. I believe pip defaults to wheel when both are available:

pip can install from either Source Distributions (sdist) or Wheels, but if both are present on PyPI, pip will prefer a compatible wheel.
https://packaging.python.org/tutorials/installing-packages/#source-distributions-vs-wheels

So I think it would be good if we could always release to PYPI with wheel + sources, as you did with isodatetime. What do you think? Not sure what are the release steps necessary for that, but we can possibly try on test.pypi and document that somewhere 👍

@kinow
Copy link
Member

kinow commented Aug 12, 2019

It worked, but I cheated by passing the spawner and other configuration settings via the command line (just so I didn't have to worry about jupyterhub_config.py for now).

Testing instructions at https://github.com/kinow/cylc-conda#testing. If anyone with Anaconda Python 3 could test it, we should be able to at least confirm it works. Then next steps would be

  • choose where to keep our recipes (even if cylc stays in a separate repo, should we have the cylc-uiserver, cylc-flow, graphene-tornado, metomi-isodatetime in the same repo, or move them to their respective repos?)
  • consider whether we want to automate in Conda the copy of jupyterhub_config.py, or if we prefer to ask users to complete this one step manually
  • investigate what we would need to have a channel like cylc in Anaconda cloud, or if it would be best to use conda-forge

Cheers
Bruno

@matthewrmshin
Copy link
Contributor

Not sure why we have missed the source dist for metomi-isodatetime. Now added.

Looking at JupyterHub's recipe in Anaconda Cloud... It simply points back to Conda Forge. Not sure whether we need to do the same or not.

@hjoliver
Copy link
Member Author

@hjoliver if you have time next week for a quick VC, I should be able to send you the list of commands (less than a handful I think?) to create a Conda env and try to install it from my channel.

Yes anytime (if this is still relevant?)

Do you want to go ahead and create the new repo? (cylc-packaging, or cylc-conda? ... could it potentially be used for multiple packaging methods in future, e.g. containers, or does it need to be conda-specific?)

@hjoliver
Copy link
Member Author

One question for later would be, how to maintain these recipes. I'm reverse-engineering them from PYPI. Conda doesn't reverse engineer setuptools (or at least I couldn't find a way of doing it), so for the artefacts not uploaded I have to manually craft the meta.yaml. But after a release, someone would have to remember to go through this process again and re-release the conda arteracts.

See also: #39

@kinow
Copy link
Member

kinow commented Aug 13, 2019

@hjoliver

Yes anytime (if this is still relevant?)

It probably is. It worked on my machine, but nobody besides myself has tested it :) I think it should work. And the solution I found for Cylc 8 with Conda, was to use everything as a Conda package, except Cylc UI... Cylc UI is the "source" of the Cylc 8 Conda package...

So I just wanted to go over this with somebody else that was interested in packaging to see if everything makes sense.

Do you want to go ahead and create the new repo? (cylc-packaging, or cylc-conda? ... could it potentially be used for multiple packaging methods in future, e.g. containers, or does it need to be conda-specific?)

Sure. I hadn't thought about keeping other than Conda recipes in the repository. No strong opinion on this, what do you think would work best here? I think triaging issues might be simple for this new repo, and we probably won't have milestones as this is a multi-project repository anyway.

@kinow
Copy link
Member

kinow commented Aug 13, 2019

See also: #39

Well remembered! 👍

@hjoliver
Copy link
Member Author

No strong opinion on this, what do you think would work best here?

I forgot we already have cylc-docker, so let's go with cylc-conda. It won't stop us considering a combined cylc-packaging repo the future ... but separate is probably best.

@kinow
Copy link
Member

kinow commented Aug 14, 2019

Almost forgot it, sorry. Done https://github.com/cylc/cylc-conda. It might need that permission fix @hjoliver , so that others can assign tickets, etc.

@hjoliver
Copy link
Member Author

It might need that permission fix ...

Done

@hjoliver
Copy link
Member Author

hjoliver commented Sep 24, 2019

update 24 September 2019

We now have:

Conda will probably be our main vector for release distribution

TODO

  • document how task jobs get access to (the right version of) Cylc when the server program is running out of a conda or Python virtual environment
    • do users need to activate the right virtual environment in their login scripts (task job scripts run in bash login shells)?
    • can we co-opt the CYLC_VERSION wrapper script to activate virtual environments somehow?
    • can the deployment process handle this somehow?
  • document how to install cylc-8 where conda and/or internet access is not available
    • just package up a relocatable conda environment where conda and internet are available?
  • investigate (or continue/complete our investigations) into other options, such as containers

@kinow
Copy link
Member

kinow commented Sep 26, 2019

Just finished reading about Conda Pack, and it's really useful. I think we can give both options, and an internal conda repo, as alternatives when there is no Internet available.

  1. Users can host an internal/private channel (similar to users channels like my kinow on conda) and share if via internal network or NFS etc - https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/create-custom-channels.html
  2. Users can install in one environment with Conda Pack and then share that environment with multiple servers locally
  3. Users can host an internal binary/release server like Artifactory and either use it as a proxy to external artefacts, or publish/fetch artefacts manually.

@kinow
Copy link
Member

kinow commented Oct 15, 2019

document how task jobs get access to (the right version of) Cylc when the server program is running out of a conda or Python virtual environment
do users need to activate the right virtual environment in their login scripts (task job scripts run in bash login shells)?

I think so. Or manually as they do in the HPC with some modules like load module abc-software-etc, but instead running the source ~/some/location/cylc8-research/bin/activate, or use Conda, or use virtualenvwrapper, or use pyenv, etc.

Probably having a standard way, and adopting a single solution (probably Conda?) would make things simpler to manage and supports users for site admins.

Unless a) they are able to use the web UI for that, or b) they use something like conda run (more below).

can we co-opt the CYLC_VERSION wrapper script to activate virtual environments somehow?

I missed this one before creating cylc/cylc-flow#3410, sorry, and thanks for pointing it out.

Once we have an environment with multiple versions of Cylc, it might be helpful if the environments are named with some sort of pattern that is easy to understand, like cylc8-ensembles, or cylc-8-research, etc.

If using Conda, users should be able to fire a workflow without activating an environment too, by simply running:

$ conda run -n cylc-8-research cylc run five
$ conda run -n cylc-8-research cylc scan

conda run has limitations, and has slowly been getting better. But things like conda run -n cylc8 cylc run five --no-detach don't work, as the log/output is not written to stdout by conda run, more at conda/conda#2379

Not sure how to do that for venv or virtualenv.

For containers, it's possible for users to create aliases that execute commands against a specific container or image, in which case the wrapper would be redundant I think.

can the deployment process handle this somehow?

I think it depends on the site configuration. If installing via Conda, we can investigate what can be further done in the recipe to customize the environment (I think we can really only alter anything after the environment has been activated). We can't do much with pip.

Otherwise it will be up to site admins to create the extra configuration layer post Cylc installation. Using tools like shell, ansible, puppet, containers, AWS tools, terraform, etc.

@hjoliver
Copy link
Member Author

Even if conda run did work for us (which you say it doesn't) users - if they haven't installed cylc themselves - should not have to know about conda (even if cylc was installed with conda).

Plus on job hosts, where the UI is not needed, pip install (into a venv) is sufficient.

The more I think about this, the more I like the wrapper. It is simple and generic. So long as admins install cylc versions side-by-side in some sensible pattern, and each version is only installed in one place, the site wrapper could easily be made to activate the right environment for the right version. Then it should just work, and normal users don't need to know that cylc is in a virtual environment or where it is installed.

If there is a way to make it "just work" for multiple cylc versions without even needing to modify a wrapper script, then that would be even better of course - but I'm not seeing it yet. And the wrapper is pretty simple.

(I guess this is quite an unusual or even cylc-specific requirement: a program that submits detached jobs - even on remote hosts - that have to invoke the same program again, at the same version).

@kinow
Copy link
Member

kinow commented Jul 17, 2020

@hjoliver should we close this issue and maybe start new issues for things like installation to environments without internet, supporting multiple cylc version, etc? Even though I think we are pretty settled on using Conda for now when Internet is available, and also for multiple versions of Cylc and/or Python.

@hjoliver
Copy link
Member Author

Yes we should. Let me just read through above before creating the new issues though...

@hjoliver
Copy link
Member Author

hjoliver commented Apr 7, 2021

(UPDATE: will close this and create new issues after the 8.0b0 meta-package release)

@oliver-sanders
Copy link
Member

Superseded by #130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants