This vignette uses the bundled dataset_real_cancer_drivers_4 dataset to illustrate a real biological analysis: how do four canonical cancer driver catalogs overlap?
The four sources are:
library(vennDiagramLab)
ds <- load_sample("dataset_real_cancer_drivers_4")
ds@set_namessapply(ds@items, length)The lists are very different in size — Vogelstein is the smallest curated set; OncoKB is the most permissive at this annotation tier.
The dataset was built from a 20,000-gene background (universe_size):
ds@universe_sizeThis is the population N used in the hypergeometric over-representation tests (see vignette("v05_statistics_deep_dive")).
result <- analyze(ds)
result@model
length(result@regions)The default model for 4 sets is venn-4-set (Edwards-style).
result@set_sizesbroom::glance() returns a one-row tibble with the headline numbers:
broom::glance(result)The default render uses the dataset’s set names as labels. To shorten them for the diagram, pass a per-letter override:
svg <- render_venn_svg(
result,
set_names = c(A = "Vogelstein", B = "COSMIC", C = "OncoKB", D = "IntOGen"),
title = "Cancer driver overlap (4 sources)"
)
nchar(svg)(See vignette("v08_custom_styling_and_export") for color overrides and post-render SVG manipulation.)
For 4+ sets, an UpSet plot is often easier to read than the Venn diagram — each intersection size is a bar, sorted by cardinality.
upset_plot <- render_upset(result, sort_by = "size")
upset_plot(The chunk above is gated on R >= 4.6 because the CRAN release of ComplexUpset (1.3.3) is incompatible with ggplot2 >= 4.0 on older R — see ?vennDiagramLab::render_upset for context.)
broom::tidy() returns one row per set pair, with all five pairwise metrics plus the BH-FDR-adjusted hypergeometric p-value:
top_pairs <- broom::tidy(result)
top_pairs[order(top_pairs$p_adjusted), c("set_a", "set_b", "intersection",
"jaccard", "p_adjusted",
"significant")]Every pair is significant at FDR < 0.05 (as expected — these catalogs are designed to overlap on biology).
broom::augment() returns one row per gene with set-membership flags and the region label.
gene_table <- broom::augment(result)
head(gene_table)
nrow(gene_table) # total unique genes across all four sets
table(gene_table$region_label) # how many genes in each regionto_region_summary_tsv(result, "cancer_drivers_regions.tsv")vignette("v05_statistics_deep_dive") — interpret the Jaccard / Dice / hypergeometric numbers in detail.vignette("v07_pdf_reports") — turn this analysis into a multi-page PDF.vignette("v08_custom_styling_and_export") — customize colors, embed in a ggplot, export to PDF/PNG.