Adhering to sound versioning practices is crucial for ensuring the reproducibility of software. Despite the expertise in software engineering, the ever-growing complexity and continuous development of new, potentially disruptive features present significant challenges in maintaining code functionality over time. This pertains not only to backward compatibility but also to future-proofing. When code handles critical production loads and relies on numerous external software libraries, it’s likely that these dependencies will evolve. Infrastructure-as-code and other DevOps principles shine in addressing these challenges. However, they may appear less approachable and more labor-intensive to set up for the average R developer.
Are you ready to test your custom R functions and system commands in a a different environment with isolated software builds that are both pure at build and at runtime, without leaving the R console?
Let’s introduce with_nix(). with_nix() will
evaluate custom R code or shell commands with command line interfaces
provided by Nixpkgs in a Nix environment, and thereby bring the
read-eval-print-loop feeling. Not only can you evaluate custom R
functions or shell commands in Nix environments, but you can also bring
the results back to your current R session as R objects.
We aim to accommodate various use cases, considering a gradient of declarativity in individual or sets of software environments based on personal preferences. There are two main modes for defining and comparing code running through R and system commands (command line interfaces; CLIs)
with_nix() from,
too. You are probably on the way of getting a passionate Nix user.Carefully curated software improves over time, so does R. We pick an example from the R changelog, the following literal entry in R 4.2.0:
as.vector() gains a data.frame method
which returns a simple named list, also clearing a long standing ‘FIXME’
to enable as.vector(<data.frame>, mode ="list"). This
breaks code relying on as.vector(<data.frame>) to
return the unchanged data frame.”The goal is to illustrate this change in behavior from R versions 4.1.3 and before to R versions 4.2.0 and later.
First, we write a default.nix file containing Nix
expressions that pin R version 4.1.3 from Nixpkgs.
library("rix")
path_env_1 <- file.path(".", "_env_1_R-4-1-3")
rix(
  r_ver = "4.1.3",
  overwrite = TRUE,
  project_path = path_env_1
)
#> 
#> ### Bootstrapping isolated, project-specific, and runtime-pure R setup via Nix ###
#> 
#> ==> Existing isolated nix-R project folder:
#>  /tmp/RtmppQ9Eym/Rbuild1d5c63c838c8/rix/vignettes/_env_1_R-4-1-3 
#> 
#> * current R session running inside Nix environment and not from RStudio
#> 
#> * Keep existing `.Rprofile`. in `project_path`:
#>  /tmp/RtmppQ9Eym/Rbuild1d5c63c838c8/rix/vignettes/_env_1_R-4-1-3/ 
#> 
#> 
#> ==> Also adjusting `PATH` via `Sys.setenv()`, so that system commands can invoke key Nix commands like `nix-build` in this RStudio session outside Nix
#> 
#> 
#> ### Successfully generated `default.nix` and `.Rprofile` ###The following expression is written to default.nix in the subfolder
./_env_1_R-4-1-3/.
#> # This file was generated by the {rix} R package v0.12.1 on 2024-09-24
#> # with following call:
#> # >rix(r_ver = "6e3a86f2f73a466656a401302d3ece26fba401d9",
#> #  > project_path = path_env_1,
#> #  > overwrite = TRUE)
#> # It uses nixpkgs' revision 6e3a86f2f73a466656a401302d3ece26fba401d9 for reproducibility purposes
#> # which will install R version 4.1.3.
#> # Report any issues to https://github.com/ropensci/rix
#> let
#>  pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/6e3a86f2f73a466656a401302d3ece26fba401d9.tar.gz") {};
#>      
#>   system_packages = builtins.attrValues {
#>     inherit (pkgs) 
#>       glibcLocales
#>       nix
#>       R;
#>   };
#>   
#> in
#> 
#> pkgs.mkShell {
#>   LOCALE_ARCHIVE = if pkgs.system == "x86_64-linux" then "${pkgs.glibcLocales}/lib/locale/locale-archive" else "";
#>   LANG = "en_US.UTF-8";
#>    LC_ALL = "en_US.UTF-8";
#>    LC_TIME = "en_US.UTF-8";
#>    LC_MONETARY = "en_US.UTF-8";
#>    LC_PAPER = "en_US.UTF-8";
#>    LC_MEASUREMENT = "en_US.UTF-8";
#> 
#>   buildInputs = [    system_packages   ];
#>   
#> }This also includes a custom .Rprofile file that ensure
that this subshell will not load any packages installed to the user’s
library of packages.
We now have set up the configuration for R 4.1.3 set up in a
default.nix file in the folder
./_env_1_R-4-1-3. Since you are sure you are using an R
version higher 4.2.0 available on your system, you can check what that
as.vector.data.frame() S3 method returns a list.
df <- data.frame(a = 1:3, b = 4:6)
as.vector(x = df, mode = "list")
#> $a
#> [1] 1 2 3
#> 
#> $b
#> [1] 4 5 6This is different for R versions 4.1.3 and below, where you should get an identical data frame back.
To formally validate in a ‘System-to-Nix’ approach that the object
returned from as.vector.data.frame() is before
R < 4.2.0, we define a function that runs the
computation above.
df_as_vector <- function(x) {
  out <- as.vector(x = x, mode = "list")
  return(out)
}
(out_system_1 <- df_as_vector(x = df))
#> $a
#> [1] 1 2 3
#> 
#> $b
#> [1] 4 5 6Then, we will evaluate this test code through a
nix-shell R session. This adds both build-time and run-time
purity with the declarative Nix software configuration we have made
earlier. with_nix() leverages the following principles
under the hood:
Computing on the Language: Manipulating language objects using code.
Static Code Analysis: Detecting global objects and package environments in the function call stack of ‘expr’. This involves utilizing essential functionality from the ‘codetools’ package, which is recursively iterated.
Serialization of Dependent R objects: Saving
them to disk and deserializing them back into the R session’s RAM via a
temporary folder. This process establishes isolation between two
distinct computational environments, accommodating both ‘System-to-Nix’
and ‘Nix-to-Nix’ computational modes. Simultaneously, it facilitates the
transfer of input arguments, dependencies across the call stack, and
outputs of expr between the Nix-R and the system’s R
sessions.
This approach guarantees reproducible side effects and effectively
streams messages and errors into the R session. Thereby, the {sys}
package facilitates capturing standard outputs and errors as text output
messages. Please be aware that with_nix() will invoke
nix-shell, which will itself run nix-build in
case the Nix derivation (package) for R version 4.1.3 is not yet in your
Nix store. This will take a bit of time to get the cache. You will see
in your current R console the specific Nix paths that will be downloaded
and copied into your Nix store automatically.
# now run it in `nix-shell`; `with_nix()` takes care
# of exporting global objects of `df_as_vector` recursively
out_nix_1 <- with_nix(
  expr = function() df_as_vector(x = df), # wrap to avoid evaluation
  program = "R",
  project_path = path_env_1,
  message_type = "simple" # you can do `"verbose"`, too
)
# compare results of custom codebase with indentical
# inputs and different software environments
identical(out_system_1, out_nix_1)
# should return `FALSE` if your system's R versions in
# current interactive R session is R >= 4.2.0expr argument
of with_nix()In the previous code snippet we wrapped the top-level
expr function with function() or
function(){}. As an alternative, you can also provide
default arguments when assigning the function used as expr
input like this:
Then, you just supply the name of the function to evaluate with default arguments.
out_nix_1_b <- with_nix(
  expr = df_as_vector, # provide name of function
  program = "R",
  project_path = path_env_1,
  message_type = "simple" # you can do `"verbose"`, too
)It yields the same results.
as.vector.data.frame() for both R versions
4.1.3 and 4.2.0 from NixpkgsHere follows an example a Nix-to-Nix solution, with two
subshells to track the evolution of base R in this specific case. We can
verify the breaking changes in case study 1 in more declarative manner
when we use both R 4.1.3 and R 4.2.0 from Nixpkgs. Since we already have
defined R 4.1.3 in the env_1_R-4-1-3
subshell, we can use it as a source environment where with_nix() is
launched from. Accordingly, we define the R 4.2.0 environment in a
env_1_2_R-4-2-0using Nix via
rix::rix(). The latter environment will be the target
environment where df_as_vector() will be evaluated in.
library("rix")
path_env_1_2 <- file.path(".", "_env_1_2_R-4-2-0")
rix(
  r_ver = "4.2.0",
  overwrite = TRUE,
  project_path = path_env_1_2,
  shell_hook = "R"
)
list.files(path_env_1_2)#> [1] "default.nix"Now, initiate a new R session as development environment using
nix-shell. Open a new terminal at the current working
directory of your R session. The provided expression
default.nix. defines R 4.1.3 in a “subfolder per subshell”
approach. nix-shell will use the expression by
default.nix and prefer it over any other .nix
files, except when you put a shell.nix file in that folder,
which takes precedence.
After some time downloading caches and doing builds, you will enter
an R console session with R 4.1.3. You did not need to type in R first,
because we set up a R shell hook via rix::rix(). Next, we
define again the target function to test in R 4.2.0, too.
# current Nix-R session with R 4.1.3
df_as_vector <- function(x) {
  out <- as.vector(x = x, mode = "list")
  return(out)
}
(out_nix_1 <- df_as_vector(x = df))out_nix_1_2 <- with_nix(
  expr = function() df_as_vector(x = df),
  program = "R",
  project_path = path_env_1_2,
  message_type = "simple" # you can do `"verbose"`, too
)You can now formally compare the outputs of the computation of the same code in R 4.1.3 vs. R 4.2.0 environments controlled by Nix.
We add one more layer to the reproducibility of the R ecosystem. User libraries from CRAN or GitHub, one thing that makes R shine is the huge collection of software packages available from the community.
There was a change introduce in {stringr} 1.5.0; in earlier versions, this line of code:
would return the character "a". However, this behaviour
is unexpected: it really should return an error. This was addressed in
versions after 1.5.0:
out_system_stringr <- tryCatch(
  expr = {
    stringr::str_subset(c("", "a"), "")
  },
  error = function(e) NULL
)Since the code returns an error, we wrap it inside
tryCatch() and return NULL instead of an error
(if we wouldn’t do that, this vignette could not compile!).
Let’s build a subshell with the latest version of R, but an older
version of {stringr}:
library("rix")
path_env_stringr <- file.path(".", "_env_stringr_1.4.1")
rix(
  r_ver = "4.3.1",
  r_pkgs = "stringr@1.4.1",
  overwrite = TRUE,
  project_path = path_env_stringr
)
list.files(path = path_env_stringr, all.files = TRUE)#> [1] "."           ".."          ".Rprofile"   "default.nix"We can now run the code in the subshell
out_nix_stringr <- with_nix(
  expr = function() stringr::str_subset(c("", "a"), ""),
  program = "R",
  project_path = path_env_stringr,
  message_type = "simple"
)Here are the last few lines printed on screen:
==> `expr` succeeded!
### Finished code evaluation in `nix-shell` ###
* Evaluating `expr` in `nix-shell` returns:
[1] "a"Not only do we see the result of evaluating the code in the subshell,
we also have access to it: out_nix_stringr holds this
result.
We can now compare the two: the result of the code running in our
main session with the latest version of {stringr} and the
result of the code running in the subshell with the old version of
{stringr}:
As expected, the result is FALSE.
Nix subshells are quite useful in cases where you need to use a
package that might be difficult to install, such as
{arrow}, or other packages that must be compiled. Depending
on your operating system you need to compile {arrow} from
source, which can be a frustrating experience, especially if you only
need it to load data and bring it down to a manageable size (using
select() and filter() for instance). This use
cases illustrates how to achieve this.
Let’s start by building a subshell that is based on a distinct
revision of nixpkgs, for which we know that arrow compiles
on both linux and macOS (darwin).
library("rix")
path_env_arrow <- file.path("env_arrow")
rix(
  r_ver = "4.1.1",
  r_pkgs = c("dplyr", "arrow"),
  overwrite = TRUE,
  project_path = path_env_arrow
)This specific revision of R contains {arrow} 13. Let’s
now suppose that you already have a script with some code to load and
transform some data using {arrow}. It may look something
like this:
library(arrow)
library(dplyr)
arrow_cars <- arrow_table(cars)
arrow_cars %>%
  filter(speed > 10) %>%
  as.data.frame()To run this code in a subshell, we recommend wrapping it inside a function:
arrow_script <- function() {
  library(arrow)
  library(dplyr)
  arrow_cars <- arrow_table(cars)
  arrow_cars %>%
    filter(speed > 10) %>%
    as.data.frame()
}Which we can then run in the subshell:
out_nix_arrow <- with_nix(
  expr = function() arrow_script(),
  program = "R",
  project_path = path_env_arrow,
  message_type = "simple"
)This will run the function in the subshell, and its output will be
saved in the out_nix_arrow variable, for further
manipulation in your main shell/session.