This code is an implementation of Parallel RRT*--an asymptoptically optimal robotic motion planning algorithm. It utilizes multi-core/multi-processor SMP threading and atomic operations to implement a RRT* that scales linearly with the number of cores in use.
To enable super-linear speedup, each thread may be given a partition of the configuation space to sample. This approach keeps the working-set of each thread smaller than the non-partitioned approach, allowing for more effective use of CPU caches. It's not without its downside however, it is unknown how partitioning will effect the speedup when given certain edge-case robotic systems, such as ones that would have partitions in a completed obstructed portion of configuration space.
Note on Turbo Boost: On some SMP configurations, having fewer cores in use allows the hardware to run at faster clock speeds (see Intel 'Turbo Boost'). In this case, linear and super-linear speedup may not be seen, however there will still be an overall speedup benefit.
On some SMP configurations, logical processor threads share computational cores (e.g. Intel Hyper-Threading), which allows more threads to run simultenously at the expense of stalling for competing access to shared computational units. The net effect is usually that 2 threads on 1 core do not run as fast on 2 threads on 2 cores. PRRT* has been tested to work well in the presense of SMT, but the (super-)linear-speedup is not easily observed or for that matter computed. As an example, a 2-core processor might experience a 2.1x speedup on one simulation with SMT disabled. Enabling SMT (for a total of 4 logical threads in this example) and running 4-threads in parallel might see a 3.2x speedup. In terms of threads the speedup is sub-linear, but in terms of cores, we'll... we're doing pretty well.
As a cautionary note however, partitioned sampling may run unevenly with SMT enabled, since threads will run slower or faster depending on what other thread is scheduled on the same core. In our experience it is best to run with all available threads (e.g. 4-core w/ 2x SMT should use 8 threads), in the presense of SMT when using partitioned sampling.
Tested implementation:
This implementation has been tested on a Intel and AMD x86_64-based SMP machines in various configurations. The memory model of the these machines is strong enough to enable certain assumptions about the memory access patterns. This implementation may not run well on other SMP architectures. If you are interested in testing/porting the code to run on another architecture, please contact the authors.
Note: this build requires GNU make. It make be available as "gmake" on your system, if "make" below does not work, then try "gmake" where it says "make".
% ./configure
% make
Optionally you can specify after "./configure":
--disable-openmp to use pthreads directly
--enable-crc to enable runtime crc checking of data-structures
If changing options, you should "make clean" before running make again.
For example:
% ./configure --disable-openmp --enable-crc
% make clean
% make
To run the example program:
% src/prrts -t4 -n20000 -r
Runs (-n=)20000 samples with 4(-t)hreads with partitioned (-r)egional sampling.