These files can be used to run detailed benchmarking on Brian2's C++ standalone mode and on Brian2GeNN. It is meant to run with Brian 2.2, Brian2GeNN 1.2, and GeNN 3.2.
A simple way to run it if you have nvidia-docker installed is to just run via Docker:
$ bash prepare-docker.sh
$ bash run-benchmarks-with-docker.sh
Currently, there are two benchmarks:
COBAHH.py
, a recurrent network of Hodgkin-Huxley type neurons with conductance-based synapses, basically the same model as the example in the Brian 2 documentation. This example has been slightly changed to make scaling its size easier: neurons connect on average with 1000 synapses to other neurons (this means that the number of synapses scales linearly with the number of neurons), but the synaptic weights are set to 0. The neurons still spike because of their constant input currents, and the simulation calculates synaptic updates, but we don't have to worry about a change in firing rates when scaling the size of the network.Mbody_example
, a model of the locust olfactory system (Nowotny et al. 2005), with several different synapse types including synapses with spike-timing-dependent plasticity. This models scales the number of neurons in the mushroom body only, and keeps the number of plastic synapses in the mushroom body at 10000 per neuron (for networks with > 10000 neurons).
Each of these examples can be run manually by running it with certain command-line arguments:
$ python [benchmark] [scale] [device] [n_threads] [runtime] [monitor] [float_dtype] [label]
where:
- benchmark
- the name of the benchmark file
- scale
- The scaling factor to use for the simulation (1 means the "usual" number of neurons, 2 twice this number, etc.)
- device
- Either
genn
orcpp_standalone
- n_threads
- The number of threads to use in C++ standalone mode. For Brian2GeNN, either -1 to run the simulation on the CPU, or 0 to run it on the GPU
- runtime
- The biological runtime of the simulation in seconds
- monitor
- Whether to use monitors in the simulation (spike monitors for both benchmarks, additional a monitor recording the membrane potential of 1% of the neurons in the COBAHH benchmark)
- float_dtype
- The datatype for floating point variables, either
float32
for single precision orfloat64
for double precision. - label
- A label that will be used to decide in which directory to put the benchmark (i.e., use the same label for benchmarks run on the same machine)
The provided bash files can be used to run benchmarks for combinations of these arguments (these files are meant for use on Linux or OS X, but should be easily adaptable for Windows).
Benchmarking is performed by using Brian2's insert_code
mechanism. This injected code will use a C++ high_resolution_clock
and write
the time elapsed since the start at various points to a
results/benchmark.time
file in the model's code directory (output
for C++
standalone, GeNNworkspace
for Brian2GeNN). To calculate the time spend in
Brian2's Python code (code generation etc.) as well as for the model
compilation, the total time is also measured in the benchmark script itself via
Python -- the difference between this measured time and the time spend executing
the generated code gives the time spent for all preparatory work.
Brian2's C++ standalone code and the code generated by Brian2GeNN are slightly different in the way they enchain the various operations. The benchmarking script takes care of creating comparable time points.
The following sketch of a simulation's main
function shows the time points at
which measurements are taken, together with the names used for these points in
the analysis script (plot_benchmarks.py
):
int main(int argc, char **argv)
{
// <-- *** Start timer for benchmarking
brian_start() // reserve memory for arrays and initialize them to 0
// load values provided in Python code from disk
// <-- *** time point: t_after_load
// housekeeping code: set number of threads, initialize some variables to
// scalar values != 0
// <-- *** time point: t_before_synapses
// create synapses
_run_synapses_synapses_create_generator_codeobject();
// ...
// <-- *** time point: t_after_synapses
// initialize state variables with user-provided expressions (e.g. randomly)
_run_neurongroup_group_variable_set_conditional_codeobject();
// ...
// <-- *** time point: t_after_init
// more housekeeping, copying over previously loaded arrays
// <-- *** time point: t_before_run
// Setting up simulation (scheduling, etc.)
// Run the network:
magicnetwork.run(1.0, report_progress, 10.0);
// <-- *** time point: t_after_run
// <-- *** time point: t_before_end
brian_end(); // Write results to disk, free memory
// <-- *** time point: t_after_end
}
The generated code for Brian2GeNN is very similar:
int main(int argc, char **argv)
{
// <-- *** Start timer for benchmarking
_init_arrays(); // reserve memory for arrays and initialize them to 0
_load_arrays(); // load values provided in Python code from disk
// <-- *** time point: t_after_load
// initialize some variables to scalar values != 0
// <-- *** time point: t_before_synapses
// create synapses
_run_synapses_synapses_create_generator_codeobject();
// ...
// <-- *** time point: t_after_synapses
// initialize state variables with user-provided expressions (e.g. randomly)
_run_neurongroup_group_variable_set_conditional_codeobject();
// ...
// <-- *** time point: t_after_init
// housekeeping: copying over previously loaded arrays
// convert variables from Brian 2 to GeNN format
// copy variables to GPU
// <-- *** time point: t_before_run
// run the simulation:
eng.run(totalTime, which);
// <-- *** time point: t_after_run
// housekeeping: copy over data from GPU
// convert variables from Brian2 to GeNN format
// <-- *** time point: t_before_end
_write_arrays(); // Write results to disk
_dealloc_arrays(); // free memory
// <-- *** time point: t_after_end
The main difference between the two codes is that Brian2GeNN does more "housekeeping", in particular it has to convert array data structures between Brian2's and GeNN's format, and copy things back and forth between CPU and GPU.
The data from the time points is summarized in the following measurements, again
using the names given in plot_benchmarks.py
:
duration_before
- General preparation time, excluding synapse creation and variable initialization
(time between start and
t_after_load
+ time betweent_after_init
andt_before_run
) duration_synapses
- Synapse creation time
(time between
t_before_synapses
andt_after_synapses
) duration_init
- Variable initialization time
(time between
t_after_synapses
andt_after_init
) duration_run
- Simulation time
(time between
t_before_run
andt_after_run
) duration_init
- Variable initialization time
(time between
t_after_synapses
andt_after_init
) duration_after
- Cleanup time
(time between
t_after_run
andt_after_write
) duration_compile
- Code generation and compilation time
(difference between the total time measured in Python, and the total time
measured within the generated code, i.e.
t_after_write
)
In the plots, duration_before
and duration_after
are summed and called
"overhead", in the same way duration_synapses
and duration_init
are summed
to give the total time for "synapse creation & initialization". "Simulation"
directly to corresponds to duration_run
, and "code gen & compile" corresponds
to duration_compile
.
All measured times are the minimum times across repeats (but variation between runs is very small).