simpleCache
is an R package providing functions for caching R objects. Its
purpose is to encourage writing reusable, restartable, and reproducible analysis
pipelines for projects with massive data and computational requirements.
Like its name indicates, simpleCache
is intended to be simple. You choose a
location to store your caches, and then provide the function with nothing more
than a cache name and instructions (R code) for how to produce the R object.
While simple, simpleCache
also provides some advanced options like environment
assignments, recreating caches, reloading caches, and even cluster compute
bindings (using the batchtools
package) making it flexible enough for use in
large-scale data analysis projects.
simpleCache
is on
CRAN and can
be installed as usual:
install.packages("simpleCache")
simpleCache
comes with a single primary function (simpleCache()
) that will do almost
everything you need. In short, you run it with a few lines like this:
library(simpleCache)
setCacheDir(tempdir())
simpleCache("normSample", { rnorm(1e7, 0,1) }, recreate=TRUE)
simpleCache("normSample", { rnorm(1e7, 0,1) })
simpleCache
also interfaces with the batchtools
package to let you build
caches on any cluster resource manager.
simpleCache()
: Creates and caches or reloads cached results of provided R instruction codelistCaches()
: Lists all of the caches available in thecacheDir
deleteCaches()
: Deletes cache(s) from thecacheDir
setCacheDir()
: Sets a global option for a cache directory so you don't have to specify one in eachsimpleCache
callsimpleCacheOptions()
: Views all of thesimpleCache
global options that have been set
The use case I had in mind for simpleCache
is that you find yourself
constantly recalculating the same R object in several different scripts, or
repeatedly in the same script, every time you open it and want to continue that
project. SimpleCache is well-suited for interactive analysis, allowing you to
pick up right where you left off in a new R session, without having to
recalculate everything. It is equally useful in automatic pipelines, where
separate scripts may benefit from loading, instead of recalculating, the same R
objects produced by other scripts.
R provides some base functions (save
, serialize
, and load
) to let you save
and reload such objects, but these low-level functions are a bit cumbersome.
simpleCache
simply provides a convenient, user-friendly interface to these
functions, streamlining the process. For example, a single simpleCache
call
will check for a cache and load it if it exists, or create it if it does not.
With the base R save
and load
functions, you can't just write a single
function call and then run the same thing every time you start the script --
even this simple use case requires additional logic to check for an existing
cache. simpleCache
just does all this for you.
The thing to keep in mind with simpleCache
is that the cache name is
paramount. simpleCache
assumes that your name for an object is a perfect
identifier for that object; in other words, don't cache things that you plan to
change.
simpleCache
is licensed under the 2-Clause BSD License. Questions, feature requests and bug reports are welcome via the issue queue. The maintainer will review pull requests and incorporate contributions at his discretion.
For more information refer to the contributing document and pull request / issue templates in the .github folder of this repository.