While renv can help capture the state of your R library at some point in time, there are still other aspects of the system that can influence the runtime behavior of your R application. In particular, the same R code can produce different results depending on:
And so on. Docker is a tool that helps solve this problem through the use of containers. Very roughly speaking, one can think of a container as a small, self-contained system within which different applications can be run. Using Docker, one can declaratively state how a container should be built (what operating system it should use, and what system software should be installed within), and use that system to run applications. (For more details, please see https://environments.rstudio.com/docker.)
Using Docker and renv together, one can then ensure that both the underlying system, alongside the required R packages, are fixed and constant for a particular application.
The main challenges in using Docker with renv are:
renv cache is visible to Docker containers, andrenv restores the R packages as required when the container is run.This vignette will assume you are already familiar with Docker; if you are not yet familiar with Docker, the Docker Documentation provides a thorough introduction. To learn more about using Docker to manage R environments, visit environments.rstudio.com. We’ll discuss two strategies for using renv with Docker:
renv to install packages when the Docker image is generated;renv to install packages when Docker containers are run.We’ll explore the pros and cons of each strategy.
With Docker, Dockerfiles are used to define new images. Dockerfiles can be used to declaratively specify how a Docker image should be created. A Docker image captures the state of a machine at some point in time – e.g., an Ubuntu operating system after downloading and installing R 3.5. Docker containers can be created using that image as a base, allowing isolated applications to run using the same pre-defined machine state.
First, you’ll need to get renv installed on your Docker image. The easiest way to accomplish this is with the remotes package. For example:
ENV RENV_VERSION 0.9.3
RUN R -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))"
RUN R -e "remotes::install_github('rstudio/renv@${RENV_VERSION}')"Now, renv can be used to install packages on the image. If you’d like the renv.lock lockfile to be used to install R packages when the Docker image is built, you can include something of the form:
WORKDIR /project
COPY renv.lock renv.lock
RUN R -e 'renv::restore()'With this, renv will download and install packages from CRAN and other external sources as appropriate when the image is created.
There are two main downsides to this approach:
The set of R packages used is pre-baked into the image, so different applications or containers built from this image will either have to re-use the aforementioned set of packages, or reinstall the packages they need to update as required.
With this approach, the renv package cache will not be used. This implies that package installation through renv::restore() may be very slow, as all packages will have to be installed.
Both of these issues can be solved if package installation can be deferred to container runtime.
If you’d like to leverage the renv package cache alongside Docker, then you’ll need to alter how your containers are created so that renv can ensure the project library is initialized before your application is run.
One can control the renv cache directory with the environment variable RENV_PATHS_CACHE. For example:
Sys.setenv(RENV_PATHS_CACHE = "~/path/to/cache")
renv:::renv_paths_cache()
#> [1] "~/path/to/cache/v5/R-3.6/x86_64-apple-darwin15.6.0"Note that the platform and R version in use are appended to the requested cache directory. This ensures that a single directory can act a base of cached packages for multiple different platforms and R versions.
Next, we need to figure out how to tell the Docker containers we create to use this cache. The most common option here is to mount a directory in the container that maps to persistent storage on the host system, and then set the aforementioned RENV_PATHS_CACHE environment variable to point to this mount. You can specify this when the container is launched. For example, if you had a container running a Shiny application:
RENV_PATHS_CACHE_HOST=/opt/local/renv/cache
RENV_PATHS_CACHE_CONTAINER=/renv/cache
docker run --rm \
    -e "RENV_PATHS_CACHE=${RENV_PATHS_CACHE_CONTAINER}" \
    -v "${RENV_PATHS_CACHE_HOST}:${RENV_PATHS_CACHE_CONTAINER}" \
    -p 14618:14618 \
    R --slave -e 'renv::restore(); shiny::runApp(host = "0.0.0.0", port = 14618)'With this, any calls to renv APIs within the created docker container will have access to the mounted cache. The first time you run a container, renv will likely need to populate the cache, and so some time will be spent downloading and installing the required packages. Subsequent runs should be much faster, as renv will be able to reuse the global package cache.
The primary downside with this approach compared to the image-based approach is that it requires you to modify how containers are created, and requires a bit of extra orchestration in how containers are launched. However, once the renv cache is active, newly-created containers will launch very quickly, and a single image can then be used as a base for a myriad of different containers and applications, each with their own private R library.