diff --git a/.github/workflows/test_and_deploy.yml b/.github/workflows/test_and_deploy.yml index 5fec3ba4..d9ccc93d 100644 --- a/.github/workflows/test_and_deploy.yml +++ b/.github/workflows/test_and_deploy.yml @@ -86,14 +86,22 @@ jobs: - uses: actions/setup-python@v5 with: python-version: "3.10" - - name: Install dependencies + - name: Install asv shell: bash run: | python -mpip install --upgrade pip - python -mpip install .[asv_version] + + # We install the project to benchmark because we run `asv check` with the `existing` flag. + python -mpip install . + python -mpip install asv - name: Run asv check shell: bash - run: asv check -v -E existing + run: | + cd benchmarks + + # With `existing`, the benchmarked project must be already installed, including all dependencies. + # see https://asv.readthedocs.io/en/v0.6.3/commands.html#asv-check + asv check -v -E existing build_sdist_wheels: name: Build source distribution diff --git a/.gitignore b/.gitignore index 17f81f32..c0549c7e 100644 --- a/.gitignore +++ b/.gitignore @@ -130,6 +130,6 @@ pip-wheel-metadata/ **/_version.py # Benchmarking with ASV -.asv/ +benchmarks/.asv/* benchmarks/results/* benchmarks/html/* diff --git a/MANIFEST.in b/MANIFEST.in index 899e6960..df7e4a55 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -9,20 +9,16 @@ exclude *.ini recursive-include brainglobe_workflows *.py recursive-include brainglobe_workflows/configs *.json -recursive-include benchmarks *.py -include asv.conf.json recursive-exclude * __pycache__ recursive-exclude * *.py[co] -recursive-exclude benchmarks/results * -recursive-exclude benchmarks/html * global-include *.pxd prune docs prune tests prune resources +prune benchmarks prune .github prune .tox -prune .asv diff --git a/README.md b/README.md index fa2ac9e8..21712979 100644 --- a/README.md +++ b/README.md @@ -73,31 +73,11 @@ These benchmarks are meant to be run regularly, to ensure performance is stable There are three main ways in which these benchmarks can be useful to developers: 1. Developers can run the available benchmarks locally on a small test dataset. - - To do so: - - Install the developer version of the package: - ``` - pip install .[dev] - ``` - This is mostly for convenience: the `[dev]` specification includes `asv` as a dependency, but to run the benchmarks it would be sufficient to use an environment with `asv` only. This is because `asv` creates its own virtual environment for the benchmarks, building and installing the relevant version of the `brainglobe-workflows` package in it. By default, the version at the tip of the currently checked out branch is installed. - - Run the benchmarks: - ``` - asv run - ``` - This will run the locally defined benchmarks with the default parameters defined at `brainglobe_workflows/configs/cellfinder.json`, on a small dataset downloaded from [GIN](https://gin.g-node.org/G-Node/info/wiki). See the [asv docs](https://asv.readthedocs.io/en/v0.6.1/using.html#running-benchmarks) for further guidance on how to run benchmarks. 1. Developers can also run these benchmarks on data they have stored locally. - - To do so: - - Define a config file for the workflow to benchmark. You can use the default one at `brainglobe_workflows/configs/cellfinder.json` for reference. - - Ensure your config file includes an `input_data_dir` field pointing to the data of interest. - - Edit the names of the signal and background directories if required. By default, they are assumed to be in `signal` and `background` subdirectories under `input_data_dir`. However, these defaults can be overwritten with the `signal_subdir` and `background_subdir` fields. - - Run the benchmarks, passing the path to your config file as an environment variable `CONFIG_PATH`. In Unix systems: - ``` - CONFIG_PATH=/path/to/your/config/file asv run - ``` - 1. We also plan to run the benchmarks on an internal runner using a larger dataset, of the scale we expect users to be handling. The result of these benchmarks will be made publicly available. +For further details on how to run the benchmarks, see the [benchmarks README](benchmarks/README.md). + Contributions to BrainGlobe are more than welcome. Please see the [developer guide](https://brainglobe.info/developers/index.html). diff --git a/benchmarks/README.md b/benchmarks/README.md new file mode 100644 index 00000000..2d519289 --- /dev/null +++ b/benchmarks/README.md @@ -0,0 +1,71 @@ +# README + +## Overview +We use [`asv`](https://asv.readthedocs.io) to benchmark some representative brainglobe workflows. The `asv` workflow is roughly as follows: +1. `asv` creates a virtual environment to run the benchmarks on, as defined in the `asv.conf.json` file. +1. It installs the version of the `brainglobe-workflows` package corresponding to the tip of the locally checked-out branch. +1. It runs the benchmarks as defined (locally) under `benchmarks/benchmarks` and saves the results to `benchmarks/results` as json files. +1. With `asv publish`, the output json files are 'published' into an html directory (`benchmarks/html`). +1. With `asv preview` the html directory can be visualised using a local web server. + + +We include code to benchmark the workflows defined under `brainglobe_workflows`. There are three main ways in which these benchmarks can be useful to developers: +1. Developers can run the available benchmarks locally [on a small test dataset](#running-benchmarks-locally-on-default-small-dataset). +1. Developers can run these benchmarks locally on [data they have stored locally](#running-benchmarks-locally-on-custom-data). +1. We also plan to run the benchmarks internally on a large dataset, and make the results publicly available. + +See the `asv` [reference docs](https://asv.readthedocs.io/en/v0.6.3/reference.html) for further details on the tool, and on [how to run benchmarks](https://asv.readthedocs.io/en/stable/using.html#running-benchmarks). + +## Installation + +To run the benchmarks, [install asv](https://asv.readthedocs.io/en/stable/installing.html) in your current environment: +``` +pip install asv +``` + +## Running benchmarks on a default small dataset + +To run the benchmarks on a default small dataset: + +1. Git clone the `brainglobe-workflows` repository: + ``` + git clone https://github.com/brainglobe/brainglobe-workflows.git + ``` +1. Run `asv` from the `benchmarks` directory: + ``` + cd brainglobe-workflows/benchmarks + asv run + ``` + This will benchmark the workflows defined in `brainglobe_workflows/` using a default set of parameters and a default small dataset. The default parameters are defined as config files under `brainglobe_workflows/configs`. The default dataset is downloaded from [GIN](https://gin.g-node.org/G-Node/info/wiki). + +## Running benchmarks on custom data available locally +To run the benchmarks on a custom local dataset: + +1. Git clone the `brainglobe-workflows` repository + ``` + git clone https://github.com/brainglobe/brainglobe-workflows.git + ``` +1. Define a config file for the workflow to benchmark. + - You can use the default config files at `brainglobe_workflows/configs/` as reference. + - You will need to edit/add the fields pointing to the input data. + - For example, for the `cellfinder` workflow, the config file will need to include an `input_data_dir` field pointing to the data of interest. The signal and background data are assumed to be in `signal` and `background` directories, under the `input_data_dir` directory. If they are under directories with a different name, you can specify their names with the `signal_subdir` and `background_subdir` fields. + +1. Benchmark the workflow, passing the path to your custom config file as an environment variable. + - For example, to benchmark the `cellfinder` workflow, you will need to prepend the environment variable definition to the `asv run` command (valid for Unix systems): + ``` + CELLFINDER_CONFIG_PATH=/path/to/your/config/file asv run + ``` + +## Running benchmarks in development +The following flags to `asv run` are often useful in development: +- `--quick`: will only run one repetition per benchmark, and no results to disk. +- `--verbose`: provides further info on intermediate steps. +- `--show-stderr`: will print out stderr. +- `--dry-run`: will not write results to disk. +- `--bench`: to specify a subset of benchmarks (e.g., `TimeFullWorkflow`). Regexp can be used. +- `--python=same`: runs the benchmarks in the same environment that `asv` was launched from + +Example: +``` +asv run --bench TimeFullWorkflow --dry-run --show-stderr --quick +``` diff --git a/asv.conf.json b/benchmarks/asv.conf.json similarity index 97% rename from asv.conf.json rename to benchmarks/asv.conf.json index fead7eb2..ce107551 100644 --- a/asv.conf.json +++ b/benchmarks/asv.conf.json @@ -4,14 +4,14 @@ "version": 1, // The name of the project being benchmarked - "project": "../brainglobe-workflows", + "project": "brainglobe-workflows", // The project's homepage "project_url": "https://github.com/brainglobe/brainglobe-workflows", // The URL or local path of the source code repository for the // project being benchmarked - "repo": ".", + "repo": "..", // "repo": "https://github.com/brainglobe/brainglobe-workflows.git", // The Python project's subdirectory in your repo. If missing or @@ -35,7 +35,7 @@ // Customizable commands for installing and uninstalling the project. // See asv.conf.json documentation. - "install_command": ["in-dir={env_dir} python -mpip install --force-reinstall '{wheel_file}[asv_version]'"], + "install_command": ["in-dir={env_dir} python -mpip install --force-reinstall '{wheel_file}'"], "uninstall_command": ["return-code=any python -mpip uninstall -y {project}"], // List of branches to benchmark. If not provided, defaults to "master" @@ -155,11 +155,11 @@ // The directory (relative to the current directory) that raw benchmark // results are stored in. If not provided, defaults to "results". - "results_dir": "benchmarks/results", + "results_dir": "results", // The directory (relative to the current directory) that the html tree // should be written to. If not provided, defaults to "html". - "html_dir": "benchmarks/html", + "html_dir": "html", // The number of characters to retain in the commit hashes. // "hash_length": 8, diff --git a/benchmarks/__init__.py b/benchmarks/benchmarks/__init__.py similarity index 100% rename from benchmarks/__init__.py rename to benchmarks/benchmarks/__init__.py diff --git a/benchmarks/cellfinder.py b/benchmarks/benchmarks/cellfinder.py similarity index 100% rename from benchmarks/cellfinder.py rename to benchmarks/benchmarks/cellfinder.py diff --git a/brainglobe_workflows/cellfinder/cellfinder.py b/brainglobe_workflows/cellfinder/cellfinder.py index 62852a90..6a0789c5 100644 --- a/brainglobe_workflows/cellfinder/cellfinder.py +++ b/brainglobe_workflows/cellfinder/cellfinder.py @@ -26,7 +26,7 @@ import pooch from brainglobe_utils.IO.cells import save_cells from cellfinder.core.main import main as cellfinder_run -from cellfinder.core.tools.IO import read_z_stack +from cellfinder.core.tools.IO import read_with_dask from cellfinder.core.train.train_yml import depth_type from brainglobe_workflows.utils import ( @@ -352,8 +352,8 @@ def run_workflow_from_cellfinder_run(cfg: CellfinderConfig): The steps are: 1. Read the input signal and background data as two separate - Dask arrays (or in-memory numpy arrays if single file tiff stack). - 2. Run the main cellfinder pipeline on the input arrays, + Dask arrays. + 2. Run the main cellfinder pipeline on the input Dask arrays, with the parameters defined in the input configuration (cfg). 3. Save the detected cells as an xml file to the location specified in the input configuration (cfg). @@ -364,9 +364,9 @@ def run_workflow_from_cellfinder_run(cfg: CellfinderConfig): a class with the required setup methods and parameters for the cellfinder workflow """ - # Read input data as Dask or numpy arrays - signal_array = read_z_stack(str(cfg._signal_dir_path)) - background_array = read_z_stack(str(cfg._background_dir_path)) + # Read input data as Dask arrays + signal_array = read_with_dask(str(cfg._signal_dir_path)) + background_array = read_with_dask(str(cfg._background_dir_path)) # Run main analysis using `cellfinder_run` detected_cells = cellfinder_run( diff --git a/pyproject.toml b/pyproject.toml index 9aa335c9..f828769f 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -58,11 +58,7 @@ dev = [ "setuptools_scm", "asv", ] -# Below, all the dependencies asv needs to run the benchmarks -# (i.e., everything needed to install this package without the CLI tool) -# Once the cellfinder CLI tool is deprecated, these will move to the -# default dependencies. -asv_version = ["asv"] + napari = ["napari[pyqt5]", "brainglobe-napari-io", "cellfinder[napari]>=1.0.0"] diff --git a/tests/cellfinder_core/__init__.py b/tests/cellfinder/__init__.py similarity index 100% rename from tests/cellfinder_core/__init__.py rename to tests/cellfinder/__init__.py diff --git a/tests/cellfinder_core/conftest.py b/tests/cellfinder/conftest.py similarity index 100% rename from tests/cellfinder_core/conftest.py rename to tests/cellfinder/conftest.py diff --git a/tests/cellfinder_core/test_integration/__init__.py b/tests/cellfinder/test_integration/__init__.py similarity index 100% rename from tests/cellfinder_core/test_integration/__init__.py rename to tests/cellfinder/test_integration/__init__.py diff --git a/tests/cellfinder_core/test_integration/test_cellfinder.py b/tests/cellfinder/test_integration/test_cellfinder.py similarity index 100% rename from tests/cellfinder_core/test_integration/test_cellfinder.py rename to tests/cellfinder/test_integration/test_cellfinder.py diff --git a/tests/cellfinder_core/test_unit/__init__.py b/tests/cellfinder/test_unit/__init__.py similarity index 100% rename from tests/cellfinder_core/test_unit/__init__.py rename to tests/cellfinder/test_unit/__init__.py diff --git a/tests/cellfinder_core/test_unit/conftest.py b/tests/cellfinder/test_unit/conftest.py similarity index 100% rename from tests/cellfinder_core/test_unit/conftest.py rename to tests/cellfinder/test_unit/conftest.py diff --git a/tests/cellfinder_core/test_unit/test_cellfinder.py b/tests/cellfinder/test_unit/test_cellfinder.py similarity index 100% rename from tests/cellfinder_core/test_unit/test_cellfinder.py rename to tests/cellfinder/test_unit/test_cellfinder.py diff --git a/tests/cellfinder_core/test_unit/test_utils.py b/tests/cellfinder/test_unit/test_utils.py similarity index 100% rename from tests/cellfinder_core/test_unit/test_utils.py rename to tests/cellfinder/test_unit/test_utils.py