Pipeline integration (targets / drake)

Pipeline integration

vennDiagramLab is library-first and tidyverse-friendly. The broom-compatible S3 methods on RegionResult make it trivial to plug into targets / drake workflows or any pipeline that expects tidy data.

library(vennDiagramLab)
result <- analyze(load_sample("dataset_real_cancer_drivers_4"))

broom methods

Three methods convert a RegionResult to a tibble at three different levels of aggregation:

broom::glance(result)
head(broom::tidy(result))
head(broom::augment(result))

Combining with dplyr

If you want to filter to only the highly significant pairs:

broom::tidy(result) |>
    dplyr::filter(highly_significant) |>
    dplyr::arrange(dplyr::desc(jaccard)) |>
    dplyr::select(set_a, set_b, intersection, jaccard, p_adjusted)

Or count items per region:

broom::augment(result) |>
    dplyr::count(region_label, sort = TRUE)

targets pipeline (sketch)

A simple _targets.R file:

library(targets)

list(
    tar_target(ds,        load_sample("dataset_real_cancer_drivers_4")),
    tar_target(result,    analyze(ds)),
    tar_target(stats_df,  broom::tidy(result)),
    tar_target(genes_df,  broom::augment(result)),
    tar_target(venn_svg,  render_venn_svg(result)),
    tar_target(venn_path,
               { writeLines(venn_svg, "venn.svg"); "venn.svg" },
               format = "file")
)

Run with targets::tar_make(). Each step caches independently, so re-running after only changing the sort order in a downstream report does not re-run the analysis.

Caching tip

statistics(result) recomputes on every call (no S4 lazy-property equivalent). If you call it many times, cache it once:

stats <- statistics(result)
str(stats@jaccard, max.level = 1)

Inside a targets pipeline, this is a non-issue because tar_target(stats, statistics(result)) caches it for you.

What’s next