toolero grew out of a recurring observation made while
teaching and supporting researchers at UW-Madison: the habits that make
a project reproducible, shareable, and maintainable are easiest to adopt
at the very beginning — and hardest to retrofit once a project is
already underway.
The package is heavily influenced by the workflows taught in
workshops run by The Carpentries
and the UW-Madison
Libraries. Those workshops emphasize consistent project
organization, version control, and reproducible data practices as
foundational skills — not advanced topics. toolero tries to
operationalize those principles into a small set of functions that
reduce the friction of doing the right thing from the start.
The theming and branding support in toolero is
specifically tailored to UW-Madison’s Research Computing and
Instrumentation (RCI) unit, whose Quarto-based reporting templates are
baked into the package as defaults. If you are not at UW-Madison, the
branding files are optional — the rest of the package works
independently of them.
toolero is designed for researchers and analysts
who:
The package is intentionally small. It does not try to be comprehensive. It tries to make the right defaults easy to reach for from the first line of code.
You can install toolero from CRAN:
Or install the development version from GitHub:
init_project() and
create_qmd()These two functions are designed to be used together, in order.
init_project() creates the scaffold;
create_qmd() populates it with a working Quarto
document.
init_project()Starting a new R project usually means the same manual steps every
time: create a folder, set up an RStudio project, create subdirectories
for data and scripts, initialize renv, initialize
git. None of these steps is hard on its own, but skipping
any of them — especially early on — tends to create friction later.
init_project() handles all of this in a single call:
This creates a new RStudio project at the specified path with the following folder structure already in place:
my-project/
├── data/ # input data
├── data-raw/ # original, unprocessed data
├── R/ # reusable functions
├── scripts/ # analysis scripts
├── plots/ # generated visualizations
├── images/ # static images and assets
├── results/ # processed outputs and tables
└── docs/ # notes, manuscripts, Quarto documents
Why this structure? The folder layout is opinionated but not arbitrary. Separating
data/fromdata-raw/makes it clear which files are original and which have been processed. KeepingR/distinct fromscripts/encourages moving reusable logic into functions over time, which is a natural step toward more maintainable code.
By default, init_project() also initializes
renv and git. This means the project is
reproducible and version-controlled from the first commit.
Why
renvandgitby default?renvensures that the packages your project depends on are recorded and reproducible.gitprovides a full history of changes. Both are much easier to set up at the start than to retrofit later.
If your project needs folders beyond the defaults:
To apply UW-Madison RCI branding assets to the project:
This creates an assets/ folder and populates it with
styles.css, header.html, and
rci-banner.png — the same assets used in the Quarto
template scaffolded by create_qmd().
create_qmd()Once the project exists, create_qmd() adds a working
Quarto document to it:
This creates:
analysis.qmd — a Quarto document with a fully populated
YAML header, three-context input resolution via
detect_execution_context(), and a sample analysis using the
Palmer Penguins datasetdata/sample.csv — sample data to develop against
immediatelyassets/styles.css and assets/header.html —
UW-Madison RCI branding_quarto.yml — a project file with a post-render hook
that runs purl.Rpurl.R — extracts R code from the rendered document
into a companion .R file automatically on every renderWhy the purl hook? Having a plain
.Rcompanion to your.qmdis useful for sharing the analysis as a script, running it on a remote cluster, or archiving the code independently of the document. The hook runs automatically so you never have to remember to extract it manually.
To pre-populate the YAML header with your own metadata:
create_qmd(
path = "~/Documents/my-project",
filename = "analysis.qmd",
yaml_data = "~/my-metadata.yml"
)Where my-metadata.yml might look like:
Any keys present in the YAML file overwrite the corresponding placeholders in the template. Keys not present are left as-is.
read_clean_csv() and
write_by_group()These two functions address common friction points in day-to-day data
work. They are general-purpose utilities — useful in any R project, not
just ones set up with toolero.
read_clean_csv()read_clean_csv() combines readr::read_csv()
and janitor::clean_names() into a single call:
Column names are automatically converted to lowercase with
underscores — consistent, predictable, and tidyverse-friendly. A column
called First Name becomes first_name.
Q1 Revenue ($) becomes q1_revenue.
By default, column type messages from readr are
suppressed. Set verbose = TRUE to see them:
write_by_group()When a data frame contains multiple groups that need to be written to
separate files, write_by_group() handles the split and the
write in a single call:
Output filenames are derived from the group values and sanitized for
use as file names — converted to lowercase with spaces and special
characters replaced by dashes. A group called Chinstrap
becomes chinstrap.csv. Palmer Penguins would
become palmer-penguins.csv.
To also write a manifest listing the output files, group values, and row counts:
write_by_group(
data = penguins,
group_col = "species",
output_dir = "results/by-species",
manifest = TRUE
)detect_execution_context()R code often needs to behave differently depending on where it is
running — interactively in RStudio, during a quarto render,
or as a batch Rscript job on a remote cluster.
detect_execution_context() identifies which of these three
environments is active and returns one of "interactive",
"quarto", or "rscript".
The canonical use case is resolving input file paths portably:
context <- detect_execution_context()
input_file <- switch(context,
interactive = "data/sample.csv",
quarto = params$input_file,
rscript = commandArgs(trailingOnly = TRUE)[1]
)This pattern is built into the template scaffolded by
create_qmd(), so you get it for free without having to
write it yourself.
generate_kb_xml()This section is relevant only if you publish content to the UW-Madison Knowledge Base. If you do not, you can safely skip it.
The UW-Madison Knowledge Base requires content to be submitted as XML
with all visual assets embedded in the HTML body.
generate_kb_xml() automates this process entirely.
The function:
.qmd from the HTML path (or accepts
it explicitly via qmd_path)embed-resources: true so
all CSS, images, and JavaScript are self-contained.qmd YAML header —
title → kb_title, description →
kb_summary, categories →
kb_keywords.xml file ready for direct KB importThis is why the description and categories
fields in the create_qmd() template matter — they flow
through automatically into the KB article metadata without any extra
work.
When importing into the KB, check the Decode HTML entity in body content option.