Follow numpy structure and add asv-specific README (#104)

* replicate structure of numpy * start an asv specific readme * update gitignore and manifest * small additions to readme * Fix project name in asv config * Rename cellfinder_core to cellfinder * Rename cellfinder_core in tests * Some more cellfinder_core changes * Fix rebase mess * Fix benchmark check in CI * Update readme * Edit manifest * Fix link to benchmarks README file in README.md * Add link to asv * Add review suggestions * Remove asv extra dependency * Update README * Remove [asv] extra from workflow * Clarify existing flag in workflow * Cosmetic changes
brainglobe · May 13, 2024 · 31b0594 · 31b0594
1 parent 8d5a0f5
commit 31b0594
Show file tree

Hide file tree

Showing 18 changed files with 98 additions and 47 deletions.
diff --git a/.github/workflows/test_and_deploy.yml b/.github/workflows/test_and_deploy.yml
@@ -86,14 +86,22 @@ jobs:
       - uses: actions/setup-python@v5
         with:
           python-version: "3.10"
-      - name: Install dependencies
+      - name: Install asv
         shell: bash
         run: |
           python -mpip install --upgrade pip
-          python -mpip install .[asv_version]
+
+          # We install the project to benchmark because we run `asv check` with the `existing` flag.
+          python -mpip install .
+          python -mpip install asv
       - name: Run asv check
         shell: bash
-        run: asv check -v -E existing
+        run: |
+          cd benchmarks
+
+          # With `existing`, the benchmarked project must be already installed, including all dependencies.
+          # see https://asv.readthedocs.io/en/v0.6.3/commands.html#asv-check
+          asv check -v -E existing
 
   build_sdist_wheels:
     name: Build source distribution

diff --git a/.gitignore b/.gitignore
@@ -130,6 +130,6 @@ pip-wheel-metadata/
 **/_version.py
 
 # Benchmarking with ASV
-.asv/
+benchmarks/.asv/*
 benchmarks/results/*
 benchmarks/html/*
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -9,20 +9,16 @@ exclude *.ini
 
 recursive-include brainglobe_workflows *.py
 recursive-include brainglobe_workflows/configs *.json
-recursive-include benchmarks *.py
-include asv.conf.json
 
 recursive-exclude * __pycache__
 recursive-exclude * *.py[co]
-recursive-exclude benchmarks/results *
-recursive-exclude benchmarks/html *
 
 global-include *.pxd
 
 prune docs
 prune tests
 prune resources
+prune benchmarks
 
 prune .github
 prune .tox
-prune .asv
diff --git a/README.md b/README.md
@@ -73,31 +73,11 @@ These benchmarks are meant to be run regularly, to ensure performance is stable
 
 There are three main ways in which these benchmarks can be useful to developers:
 1. Developers can run the available benchmarks locally on a small test dataset.
-
-    To do so:
-    - Install the developer version of the package:
-        ```
-        pip install .[dev]
-        ```
-        This is mostly for convenience: the `[dev]` specification includes `asv` as a dependency, but to run the benchmarks it would be sufficient to use an environment with `asv` only. This is because `asv` creates its own virtual environment for the benchmarks, building and installing the relevant version of the `brainglobe-workflows` package in it. By default, the version at the tip of the currently checked out branch is installed.
-    - Run the benchmarks:
-        ```
-        asv run
-        ```
-       This will run the locally defined benchmarks with the default parameters defined at `brainglobe_workflows/configs/cellfinder.json`, on a small dataset downloaded from [GIN](https://gin.g-node.org/G-Node/info/wiki). See the [asv docs](https://asv.readthedocs.io/en/v0.6.1/using.html#running-benchmarks) for further guidance on how to run benchmarks.
 1. Developers can also run these benchmarks on data they have stored locally.
-
-    To do so:
-    - Define a config file for the workflow to benchmark. You can use the default one at `brainglobe_workflows/configs/cellfinder.json` for reference.
-    - Ensure your config file includes an `input_data_dir` field pointing to the data of interest.
-    - Edit the names of the signal and background directories if required. By default, they are assumed to be in `signal` and `background` subdirectories under `input_data_dir`. However, these defaults can be overwritten with the `signal_subdir` and `background_subdir` fields.
-    - Run the benchmarks, passing the path to your config file as an environment variable `CONFIG_PATH`. In Unix systems:
-        ```
-        CONFIG_PATH=/path/to/your/config/file asv run
-        ```
-
 1. We also plan to run the benchmarks on an internal runner using a larger dataset, of the scale we expect users to be handling. The result of these benchmarks will be made publicly available.
 
+For further details on how to run the benchmarks, see the [benchmarks README](benchmarks/README.md).
+
 Contributions to BrainGlobe are more than welcome.
 Please see the [developer guide](https://brainglobe.info/developers/index.html).
 

diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,71 @@
+# README
+
+## Overview
+We use [`asv`](https://asv.readthedocs.io) to benchmark some representative brainglobe workflows. The `asv` workflow is roughly as follows:
+1. `asv` creates a virtual environment to run the benchmarks on, as defined in the `asv.conf.json` file.
+1. It installs the version of the `brainglobe-workflows` package corresponding to the tip of the locally checked-out branch.
+1. It runs the benchmarks as defined (locally) under `benchmarks/benchmarks` and saves the results to `benchmarks/results` as json files.
+1. With `asv publish`, the output json files are 'published' into an html directory (`benchmarks/html`).
+1. With `asv preview` the html directory can be visualised using a local web server.
+
+
+We include code to benchmark the workflows defined under `brainglobe_workflows`. There are three main ways in which these benchmarks can be useful to developers:
+1. Developers can run the available benchmarks locally [on a small test dataset](#running-benchmarks-locally-on-default-small-dataset).
+1. Developers can run these benchmarks locally on [data they have stored locally](#running-benchmarks-locally-on-custom-data).
+1. We also plan to run the benchmarks internally on a large dataset, and make the results publicly available.
+
+See the `asv` [reference docs](https://asv.readthedocs.io/en/v0.6.3/reference.html) for further details on the tool, and on [how to run benchmarks](https://asv.readthedocs.io/en/stable/using.html#running-benchmarks).
+
+## Installation
+
+To run the benchmarks, [install asv](https://asv.readthedocs.io/en/stable/installing.html) in your current environment:
+```
+pip install asv
+```
+
+## Running benchmarks on a default small dataset
+
+To run the benchmarks on a default small dataset:
+
+1. Git clone the `brainglobe-workflows` repository:
+    ```
+    git clone https://github.com/brainglobe/brainglobe-workflows.git
+    ```
+1. Run `asv` from the `benchmarks` directory:
+    ```
+    cd brainglobe-workflows/benchmarks
+    asv run
+    ```
+    This will benchmark the workflows defined in `brainglobe_workflows/` using a default set of parameters and a default small dataset. The default parameters are defined as config files under `brainglobe_workflows/configs`. The default dataset is downloaded from [GIN](https://gin.g-node.org/G-Node/info/wiki).
+
+## Running benchmarks on custom data available locally
+To run the benchmarks on a custom local dataset:
+
+1. Git clone the `brainglobe-workflows` repository
+    ```
+    git clone https://github.com/brainglobe/brainglobe-workflows.git
+    ```
+1. Define a config file for the workflow to benchmark.
+    - You can use the default config files at `brainglobe_workflows/configs/` as reference.
+    - You will need to edit/add the fields pointing to the input data.
+        - For example, for the `cellfinder` workflow, the config file will need to include an `input_data_dir` field pointing to the data of interest. The signal and background data are assumed to be in `signal` and `background` directories, under the `input_data_dir` directory. If they are under directories with a different name, you can specify their names with the `signal_subdir` and `background_subdir` fields.
+
+1. Benchmark the workflow, passing the path to your custom config file as an environment variable.
+    - For example, to benchmark the `cellfinder` workflow, you will need to prepend the environment variable definition to the `asv run` command (valid for Unix systems):
+    ```
+    CELLFINDER_CONFIG_PATH=/path/to/your/config/file asv run
+    ```
+
+## Running benchmarks in development
+The following flags to `asv run` are often useful in development:
+- `--quick`: will only run one repetition per benchmark, and no results to disk.
+- `--verbose`: provides further info on intermediate steps.
+- `--show-stderr`: will print out stderr.
+- `--dry-run`: will not write results to disk.
+- `--bench`: to specify a subset of benchmarks (e.g., `TimeFullWorkflow`). Regexp can be used.
+- `--python=same`: runs the benchmarks in the same environment that `asv` was launched from
+
+Example:
+```
+asv run --bench TimeFullWorkflow --dry-run --show-stderr --quick
+```
diff --git a/asv.conf.json → benchmarks/asv.conf.json b/asv.conf.json → benchmarks/asv.conf.json
@@ -4,14 +4,14 @@
     "version": 1,
 
     // The name of the project being benchmarked
-    "project": "../brainglobe-workflows",
+    "project": "brainglobe-workflows",
 
     // The project's homepage
     "project_url": "https://github.com/brainglobe/brainglobe-workflows",
 
     // The URL or local path of the source code repository for the
     // project being benchmarked
-    "repo": ".",
+    "repo": "..",
     // "repo": "https://github.com/brainglobe/brainglobe-workflows.git",
 
     // The Python project's subdirectory in your repo.  If missing or
@@ -35,7 +35,7 @@
 
     // Customizable commands for installing and uninstalling the project.
     // See asv.conf.json documentation.
-    "install_command": ["in-dir={env_dir} python -mpip install --force-reinstall '{wheel_file}[asv_version]'"],
+    "install_command": ["in-dir={env_dir} python -mpip install --force-reinstall '{wheel_file}'"],
     "uninstall_command": ["return-code=any python -mpip uninstall -y {project}"],
 
     // List of branches to benchmark. If not provided, defaults to "master"
@@ -155,11 +155,11 @@
 
     // The directory (relative to the current directory) that raw benchmark
     // results are stored in.  If not provided, defaults to "results".
-    "results_dir": "benchmarks/results",
+    "results_dir": "results",
 
     // The directory (relative to the current directory) that the html tree
     // should be written to.  If not provided, defaults to "html".
-    "html_dir": "benchmarks/html",
+    "html_dir": "html",
 
     // The number of characters to retain in the commit hashes.
     // "hash_length": 8,

diff --git a/benchmarks/__init__.py → benchmarks/benchmarks/__init__.py b/benchmarks/__init__.py → benchmarks/benchmarks/__init__.py
diff --git a/benchmarks/cellfinder.py → benchmarks/benchmarks/cellfinder.py b/benchmarks/cellfinder.py → benchmarks/benchmarks/cellfinder.py
diff --git a/brainglobe_workflows/cellfinder/cellfinder.py b/brainglobe_workflows/cellfinder/cellfinder.py
@@ -26,7 +26,7 @@
 import pooch
 from brainglobe_utils.IO.cells import save_cells
 from cellfinder.core.main import main as cellfinder_run
-from cellfinder.core.tools.IO import read_z_stack
+from cellfinder.core.tools.IO import read_with_dask
 from cellfinder.core.train.train_yml import depth_type
 
 from brainglobe_workflows.utils import (
@@ -352,8 +352,8 @@ def run_workflow_from_cellfinder_run(cfg: CellfinderConfig):
 
     The steps are:
     1. Read the input signal and background data as two separate
-       Dask arrays (or in-memory numpy arrays if single file tiff stack).
-    2. Run the main cellfinder pipeline on the input arrays,
+       Dask arrays.
+    2. Run the main cellfinder pipeline on the input Dask arrays,
        with the parameters defined in the input configuration (cfg).
     3. Save the detected cells as an xml file to the location specified in
        the input configuration (cfg).
@@ -364,9 +364,9 @@ def run_workflow_from_cellfinder_run(cfg: CellfinderConfig):
         a class with the required setup methods and parameters for
         the cellfinder workflow
     """
-    # Read input data as Dask or numpy arrays
-    signal_array = read_z_stack(str(cfg._signal_dir_path))
-    background_array = read_z_stack(str(cfg._background_dir_path))
+    # Read input data as Dask arrays
+    signal_array = read_with_dask(str(cfg._signal_dir_path))
+    background_array = read_with_dask(str(cfg._background_dir_path))
 
     # Run main analysis using `cellfinder_run`
     detected_cells = cellfinder_run(

diff --git a/pyproject.toml b/pyproject.toml
@@ -58,11 +58,7 @@ dev = [
     "setuptools_scm",
     "asv",
 ]
-# Below, all the dependencies asv needs to run the benchmarks
-# (i.e., everything needed to install this package without the CLI tool)
-# Once the cellfinder CLI tool is deprecated, these will move to the
-# default dependencies.
-asv_version = ["asv"]
+
 
 napari = ["napari[pyqt5]", "brainglobe-napari-io", "cellfinder[napari]>=1.0.0"]
 

diff --git a/tests/cellfinder_core/__init__.py → tests/cellfinder/__init__.py b/tests/cellfinder_core/__init__.py → tests/cellfinder/__init__.py
diff --git a/tests/cellfinder_core/conftest.py → tests/cellfinder/conftest.py b/tests/cellfinder_core/conftest.py → tests/cellfinder/conftest.py
diff --git a/...lfinder_core/test_integration/__init__.py → ...s/cellfinder/test_integration/__init__.py b/...lfinder_core/test_integration/__init__.py → ...s/cellfinder/test_integration/__init__.py
diff --git a/..._core/test_integration/test_cellfinder.py → ...inder/test_integration/test_cellfinder.py b/..._core/test_integration/test_cellfinder.py → ...inder/test_integration/test_cellfinder.py
diff --git a/tests/cellfinder_core/test_unit/__init__.py → tests/cellfinder/test_unit/__init__.py b/tests/cellfinder_core/test_unit/__init__.py → tests/cellfinder/test_unit/__init__.py
diff --git a/tests/cellfinder_core/test_unit/conftest.py → tests/cellfinder/test_unit/conftest.py b/tests/cellfinder_core/test_unit/conftest.py → tests/cellfinder/test_unit/conftest.py
diff --git a/...lfinder_core/test_unit/test_cellfinder.py → ...s/cellfinder/test_unit/test_cellfinder.py b/...lfinder_core/test_unit/test_cellfinder.py → ...s/cellfinder/test_unit/test_cellfinder.py
diff --git a/...s/cellfinder_core/test_unit/test_utils.py → tests/cellfinder/test_unit/test_utils.py b/...s/cellfinder_core/test_unit/test_utils.py → tests/cellfinder/test_unit/test_utils.py