Skip to content

Commit

Permalink
Follow numpy structure and add asv-specific README (#104)
Browse files Browse the repository at this point in the history
* replicate structure of numpy

* start an asv specific readme

* update gitignore and manifest

* small additions to readme

* Fix project name in asv config

* Rename cellfinder_core to cellfinder

* Rename cellfinder_core in tests

* Some more cellfinder_core changes

* Fix rebase mess

* Fix benchmark check in CI

* Update readme

* Edit manifest

* Fix link to benchmarks README file in README.md

* Add link to asv

* Add review suggestions

* Remove asv extra dependency

* Update README

* Remove [asv] extra from workflow

* Clarify existing flag in workflow

* Cosmetic changes
  • Loading branch information
sfmig authored May 13, 2024
1 parent 8d5a0f5 commit 31b0594
Show file tree
Hide file tree
Showing 18 changed files with 98 additions and 47 deletions.
14 changes: 11 additions & 3 deletions .github/workflows/test_and_deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,14 +86,22 @@ jobs:
- uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install dependencies
- name: Install asv
shell: bash
run: |
python -mpip install --upgrade pip
python -mpip install .[asv_version]
# We install the project to benchmark because we run `asv check` with the `existing` flag.
python -mpip install .
python -mpip install asv
- name: Run asv check
shell: bash
run: asv check -v -E existing
run: |
cd benchmarks
# With `existing`, the benchmarked project must be already installed, including all dependencies.
# see https://asv.readthedocs.io/en/v0.6.3/commands.html#asv-check
asv check -v -E existing
build_sdist_wheels:
name: Build source distribution
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,6 @@ pip-wheel-metadata/
**/_version.py

# Benchmarking with ASV
.asv/
benchmarks/.asv/*
benchmarks/results/*
benchmarks/html/*
6 changes: 1 addition & 5 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,16 @@ exclude *.ini

recursive-include brainglobe_workflows *.py
recursive-include brainglobe_workflows/configs *.json
recursive-include benchmarks *.py
include asv.conf.json

recursive-exclude * __pycache__
recursive-exclude * *.py[co]
recursive-exclude benchmarks/results *
recursive-exclude benchmarks/html *

global-include *.pxd

prune docs
prune tests
prune resources
prune benchmarks

prune .github
prune .tox
prune .asv
24 changes: 2 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,31 +73,11 @@ These benchmarks are meant to be run regularly, to ensure performance is stable

There are three main ways in which these benchmarks can be useful to developers:
1. Developers can run the available benchmarks locally on a small test dataset.

To do so:
- Install the developer version of the package:
```
pip install .[dev]
```
This is mostly for convenience: the `[dev]` specification includes `asv` as a dependency, but to run the benchmarks it would be sufficient to use an environment with `asv` only. This is because `asv` creates its own virtual environment for the benchmarks, building and installing the relevant version of the `brainglobe-workflows` package in it. By default, the version at the tip of the currently checked out branch is installed.
- Run the benchmarks:
```
asv run
```
This will run the locally defined benchmarks with the default parameters defined at `brainglobe_workflows/configs/cellfinder.json`, on a small dataset downloaded from [GIN](https://gin.g-node.org/G-Node/info/wiki). See the [asv docs](https://asv.readthedocs.io/en/v0.6.1/using.html#running-benchmarks) for further guidance on how to run benchmarks.
1. Developers can also run these benchmarks on data they have stored locally.

To do so:
- Define a config file for the workflow to benchmark. You can use the default one at `brainglobe_workflows/configs/cellfinder.json` for reference.
- Ensure your config file includes an `input_data_dir` field pointing to the data of interest.
- Edit the names of the signal and background directories if required. By default, they are assumed to be in `signal` and `background` subdirectories under `input_data_dir`. However, these defaults can be overwritten with the `signal_subdir` and `background_subdir` fields.
- Run the benchmarks, passing the path to your config file as an environment variable `CONFIG_PATH`. In Unix systems:
```
CONFIG_PATH=/path/to/your/config/file asv run
```

1. We also plan to run the benchmarks on an internal runner using a larger dataset, of the scale we expect users to be handling. The result of these benchmarks will be made publicly available.

For further details on how to run the benchmarks, see the [benchmarks README](benchmarks/README.md).

Contributions to BrainGlobe are more than welcome.
Please see the [developer guide](https://brainglobe.info/developers/index.html).

Expand Down
71 changes: 71 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# README

## Overview
We use [`asv`](https://asv.readthedocs.io) to benchmark some representative brainglobe workflows. The `asv` workflow is roughly as follows:
1. `asv` creates a virtual environment to run the benchmarks on, as defined in the `asv.conf.json` file.
1. It installs the version of the `brainglobe-workflows` package corresponding to the tip of the locally checked-out branch.
1. It runs the benchmarks as defined (locally) under `benchmarks/benchmarks` and saves the results to `benchmarks/results` as json files.
1. With `asv publish`, the output json files are 'published' into an html directory (`benchmarks/html`).
1. With `asv preview` the html directory can be visualised using a local web server.


We include code to benchmark the workflows defined under `brainglobe_workflows`. There are three main ways in which these benchmarks can be useful to developers:
1. Developers can run the available benchmarks locally [on a small test dataset](#running-benchmarks-locally-on-default-small-dataset).
1. Developers can run these benchmarks locally on [data they have stored locally](#running-benchmarks-locally-on-custom-data).
1. We also plan to run the benchmarks internally on a large dataset, and make the results publicly available.

See the `asv` [reference docs](https://asv.readthedocs.io/en/v0.6.3/reference.html) for further details on the tool, and on [how to run benchmarks](https://asv.readthedocs.io/en/stable/using.html#running-benchmarks).

## Installation

To run the benchmarks, [install asv](https://asv.readthedocs.io/en/stable/installing.html) in your current environment:
```
pip install asv
```

## Running benchmarks on a default small dataset

To run the benchmarks on a default small dataset:

1. Git clone the `brainglobe-workflows` repository:
```
git clone https://github.com/brainglobe/brainglobe-workflows.git
```
1. Run `asv` from the `benchmarks` directory:
```
cd brainglobe-workflows/benchmarks
asv run
```
This will benchmark the workflows defined in `brainglobe_workflows/` using a default set of parameters and a default small dataset. The default parameters are defined as config files under `brainglobe_workflows/configs`. The default dataset is downloaded from [GIN](https://gin.g-node.org/G-Node/info/wiki).

## Running benchmarks on custom data available locally
To run the benchmarks on a custom local dataset:

1. Git clone the `brainglobe-workflows` repository
```
git clone https://github.com/brainglobe/brainglobe-workflows.git
```
1. Define a config file for the workflow to benchmark.
- You can use the default config files at `brainglobe_workflows/configs/` as reference.
- You will need to edit/add the fields pointing to the input data.
- For example, for the `cellfinder` workflow, the config file will need to include an `input_data_dir` field pointing to the data of interest. The signal and background data are assumed to be in `signal` and `background` directories, under the `input_data_dir` directory. If they are under directories with a different name, you can specify their names with the `signal_subdir` and `background_subdir` fields.

1. Benchmark the workflow, passing the path to your custom config file as an environment variable.
- For example, to benchmark the `cellfinder` workflow, you will need to prepend the environment variable definition to the `asv run` command (valid for Unix systems):
```
CELLFINDER_CONFIG_PATH=/path/to/your/config/file asv run
```

## Running benchmarks in development
The following flags to `asv run` are often useful in development:
- `--quick`: will only run one repetition per benchmark, and no results to disk.
- `--verbose`: provides further info on intermediate steps.
- `--show-stderr`: will print out stderr.
- `--dry-run`: will not write results to disk.
- `--bench`: to specify a subset of benchmarks (e.g., `TimeFullWorkflow`). Regexp can be used.
- `--python=same`: runs the benchmarks in the same environment that `asv` was launched from

Example:
```
asv run --bench TimeFullWorkflow --dry-run --show-stderr --quick
```
10 changes: 5 additions & 5 deletions asv.conf.json → benchmarks/asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@
"version": 1,

// The name of the project being benchmarked
"project": "../brainglobe-workflows",
"project": "brainglobe-workflows",

// The project's homepage
"project_url": "https://github.com/brainglobe/brainglobe-workflows",

// The URL or local path of the source code repository for the
// project being benchmarked
"repo": ".",
"repo": "..",
// "repo": "https://github.com/brainglobe/brainglobe-workflows.git",

// The Python project's subdirectory in your repo. If missing or
Expand All @@ -35,7 +35,7 @@

// Customizable commands for installing and uninstalling the project.
// See asv.conf.json documentation.
"install_command": ["in-dir={env_dir} python -mpip install --force-reinstall '{wheel_file}[asv_version]'"],
"install_command": ["in-dir={env_dir} python -mpip install --force-reinstall '{wheel_file}'"],
"uninstall_command": ["return-code=any python -mpip uninstall -y {project}"],

// List of branches to benchmark. If not provided, defaults to "master"
Expand Down Expand Up @@ -155,11 +155,11 @@

// The directory (relative to the current directory) that raw benchmark
// results are stored in. If not provided, defaults to "results".
"results_dir": "benchmarks/results",
"results_dir": "results",

// The directory (relative to the current directory) that the html tree
// should be written to. If not provided, defaults to "html".
"html_dir": "benchmarks/html",
"html_dir": "html",

// The number of characters to retain in the commit hashes.
// "hash_length": 8,
Expand Down
File renamed without changes.
File renamed without changes.
12 changes: 6 additions & 6 deletions brainglobe_workflows/cellfinder/cellfinder.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
import pooch
from brainglobe_utils.IO.cells import save_cells
from cellfinder.core.main import main as cellfinder_run
from cellfinder.core.tools.IO import read_z_stack
from cellfinder.core.tools.IO import read_with_dask
from cellfinder.core.train.train_yml import depth_type

from brainglobe_workflows.utils import (
Expand Down Expand Up @@ -352,8 +352,8 @@ def run_workflow_from_cellfinder_run(cfg: CellfinderConfig):
The steps are:
1. Read the input signal and background data as two separate
Dask arrays (or in-memory numpy arrays if single file tiff stack).
2. Run the main cellfinder pipeline on the input arrays,
Dask arrays.
2. Run the main cellfinder pipeline on the input Dask arrays,
with the parameters defined in the input configuration (cfg).
3. Save the detected cells as an xml file to the location specified in
the input configuration (cfg).
Expand All @@ -364,9 +364,9 @@ def run_workflow_from_cellfinder_run(cfg: CellfinderConfig):
a class with the required setup methods and parameters for
the cellfinder workflow
"""
# Read input data as Dask or numpy arrays
signal_array = read_z_stack(str(cfg._signal_dir_path))
background_array = read_z_stack(str(cfg._background_dir_path))
# Read input data as Dask arrays
signal_array = read_with_dask(str(cfg._signal_dir_path))
background_array = read_with_dask(str(cfg._background_dir_path))

# Run main analysis using `cellfinder_run`
detected_cells = cellfinder_run(
Expand Down
6 changes: 1 addition & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,7 @@ dev = [
"setuptools_scm",
"asv",
]
# Below, all the dependencies asv needs to run the benchmarks
# (i.e., everything needed to install this package without the CLI tool)
# Once the cellfinder CLI tool is deprecated, these will move to the
# default dependencies.
asv_version = ["asv"]


napari = ["napari[pyqt5]", "brainglobe-napari-io", "cellfinder[napari]>=1.0.0"]

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit 31b0594

Please sign in to comment.