Real cancer drivers walkthrough

Real cancer drivers walkthrough

This vignette uses the bundled dataset_real_cancer_drivers_4 dataset to illustrate a real biological analysis: how do four canonical cancer driver catalogs overlap?

The four sources are:

library(vennDiagramLab)
ds <- load_sample("dataset_real_cancer_drivers_4")
ds@set_names

Set sizes

sapply(ds@items, length)

The lists are very different in size — Vogelstein is the smallest curated set; OncoKB is the most permissive at this annotation tier.

Universe

The dataset was built from a 20,000-gene background (universe_size):

ds@universe_size

This is the population N used in the hypergeometric over-representation tests (see vignette("v05_statistics_deep_dive")).

Analyze

result <- analyze(ds)
result@model
length(result@regions)

The default model for 4 sets is venn-4-set (Edwards-style).

Set sizes (inclusive) and intersection layout

result@set_sizes

A summary at a glance

broom::glance() returns a one-row tibble with the headline numbers:

broom::glance(result)

Render the venn diagram

The default render uses the dataset’s set names as labels. To shorten them for the diagram, pass a per-letter override:

svg <- render_venn_svg(
    result,
    set_names = c(A = "Vogelstein", B = "COSMIC", C = "OncoKB", D = "IntOGen"),
    title = "Cancer driver overlap (4 sources)"
)
nchar(svg)

(See vignette("v08_custom_styling_and_export") for color overrides and post-render SVG manipulation.)

UpSet view

For 4+ sets, an UpSet plot is often easier to read than the Venn diagram — each intersection size is a bar, sorted by cardinality.

upset_plot <- render_upset(result, sort_by = "size")
upset_plot

(The chunk above is gated on R >= 4.6 because the CRAN release of ComplexUpset (1.3.3) is incompatible with ggplot2 >= 4.0 on older R — see ?vennDiagramLab::render_upset for context.)

Top significant intersections

broom::tidy() returns one row per set pair, with all five pairwise metrics plus the BH-FDR-adjusted hypergeometric p-value:

top_pairs <- broom::tidy(result)
top_pairs[order(top_pairs$p_adjusted), c("set_a", "set_b", "intersection",
                                          "jaccard", "p_adjusted",
                                          "significant")]

Every pair is significant at FDR < 0.05 (as expected — these catalogs are designed to overlap on biology).

Item-level annotation

broom::augment() returns one row per gene with set-membership flags and the region label.

gene_table <- broom::augment(result)
head(gene_table)
nrow(gene_table)        # total unique genes across all four sets
table(gene_table$region_label)   # how many genes in each region

Save the region summary

to_region_summary_tsv(result, "cancer_drivers_regions.tsv")

What’s next