OptSurvCutR: Validated Cut-point Selection for Survival Analysis

R-CMD-check Lifecycle: Stable Codecov License: GPL-3 OptSurvCutR logo

OptSurvCutR (Optimal Survival Cut-points R) provides a rigorous, reproducible, and rOpenSci-compliant framework for discovering the optimal number and location of patient stratification thresholds in time-to-event (survival) data. Designed specifically for continuous predictors (such as gene expression measurements, microbiome abundance, or clinical biomarkers), this package moves beyond arbitrary median splits to deliver fully data-driven, covariate-adjusted stratification.

What’s New in Version 0.10.0

We have significantly overhauled the validation and diagnostic engines to ensure your discovered thresholds are mathematically stable, rOpenSci-compliant, and ready for publication:

Why OptSurvCutR?

Feature Benefit
Optimal number of cuts Uses AIC, AICc, or BIC to mathematically select between \(0\) and \(k\) cut-points.
Covariate adjustment Demonstrates independent prognostic value by controlling for clinical confounders.
Four-tier bootstrap validation Generates 95% confidence intervals and automated stability grading for thresholds.
Schoenfeld diagnostics Features a two-tier warning system to verify proportional hazards assumptions.
Flexible search engines Supports a deterministic systematic grid or a multithreaded Genetic Algorithm (rgenoud).
Publication-ready plots Renders Kaplan–Meier curves, distribution splits, forest plots, and 2D topology surfaces.

Installation

You can install the development version of OptSurvCutR directly from GitHub. Note that the genetic algorithm optimisation (method = "genetic") requires the rgenoud package, which should be installed separately from CRAN if you plan to search for multiple cut-points.

# Core dependencies
install.packages(c("remotes", "survival"))

# Optional but highly recommended for multi-cut genetic optimisation
install.packages("rgenoud")

# Install the package from GitHub
remotes::remotes::install_github("paytonyau/OptSurvCutR")

Example: Quick Workflow with Simulated Cohort Data

library(OptSurvCutR)
library(survival)
library(dplyr)

# Generate a reproducible, synthetic survival dataset
set.seed(123)
n <- 200
crc <- tibble(
  abundance = rnorm(n, mean = 5, sd = 2),
  age = rnorm(n, mean = 60, sd = 10),
  # Generate survival times influenced by biomarker abundance
  time = rexp(n, rate = 0.05 * exp(0.3 * (abundance > 5.5) + 0.02 * age)),
  event = sample(c(0, 1), n, replace = TRUE, prob = c(0.3, 0.7))
) %>% filter(time > 0)

# Step 1: Determine the optimal number of cut-points
num_res <- find_cutpoint_number(
  data = crc, predictor = "abundance",
  outcome_time = "time", outcome_event = "event",
  method = "genetic", criterion = "BIC",
  max_cuts = 3, nmin = 0.25, seed = 123
)
summary(num_res)

# Step 2: Find the precise cut-point locations
cut_res <- find_cutpoint(
  data = crc, predictor = "abundance",
  outcome_time = "time", outcome_event = "event",
  method = "genetic", criterion = "logrank",
  num_cuts = num_res$optimal_num_cuts,  # Dynamically pass the result from Step 1
  nmin = 0.275,                         # Fine-tuned for stability
  n_perm = 50, n_cores = 2, seed = 123
)
summary(cut_res)  # Automatically reports Hazard Ratios & Schoenfeld Diagnostics!

# Step 3: Validate threshold stability with bootstrapping
val_res <- validate_cutpoint(
  cutpoint_result = cut_res,
  num_replicates = 150, n_cores = 2, seed = 123
)
summary(val_res)  # Automatically grades threshold stability (Tiers 1-4)!

# Step 4: Visualise outcomes via native S3 plot routes
plot(cut_res, type = "distribution")   # Continuous Predictor Density Split Map
plot(cut_res, type = "outcome")        # Premium Custom Kaplan-Meier Survival Curves
plot(cut_res, type = "forest")         # Hazard Ratio Forest Plot with Cohort Sample Sizes
plot(cut_res, type = "diagnostic")     # Schoenfeld Residual Proportional Hazards Check
plot_validation(val_res, focus_cuts = c(1, 2)) # 2D Contour Elevation Stability Topology

Workflow Summary

OptSurvCutR establishes a structured, three-step workflow for cut-point analysis:

  1. find_cutpoint_number(): Identifies the statistically optimal number of thresholds using information criteria.
  2. find_cutpoint(): Localises exact cut-point coordinates via systematic or genetic search, and reports Schoenfeld diagnostics.
  3. validate_cutpoint(): Evaluates threshold stability using bootstrap resampling and assigns an automated four-tier stability grade.

Resources

Citation

@article{yau2025optsurvcutr,
  author    = {Yau, Payton T. O.},
  title     = {OptSurvCutR: Validated Cut-point Selection for Survival Analysis},
  year      = {2025},
  doi       = {10.1101/2025.10.08.681246},
  publisher = {Cold Spring Harbor Laboratory},
  journal   = {bioRxiv},
  url       = {[https://www.biorxiv.org/content/10.1101/2025.10.08.681246](https://www.biorxiv.org/content/10.1101/2025.10.08.681246)}
}

License

Licensed under the GPL-3 License.

Contact

For questions, feature suggestions, or bug reports, please open an issue tracking ticket: https://github.com/paytonyau/OptSurvCutR/issues