The survsup package provides straightforward R functions for plotting Kaplan-Meier curves with the inclusion of numbers at risk tables. Although other methods exist, they either do not fit my workflow well or produce quite ugly plots.
This package is based upon ggplot2, which comes in handy in case you’d wish to tweak anything in the plot, since a ggplot object is returned.
Furthermore, this package is very easy to use using the pipe operator, see examples below.
We will here give some short introduction on how to use this package.
Note that this package is under active development; additional features may be added. Check at the project’s GitHub site for the latest version (http://github.com/dlindholm/survsup/).
Let’s start with the most basic thing, plotting a plain old Kaplan-Meier curve:
library(survsup)
library(ggplot2)
library(survival)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
fit <- survfit(Surv(time, status) ~ 1, data = lung)
plot_survfit(fit)As you will see later, it is even more convenient to use the pipe operator (%>%):
lung %>%
  survfit(Surv(time, status) ~ 1, data = .) %>%
  plot_survfit()See that the plot by default plots the cumulative incidence? (this is the most common practice in my area of work, hence it being the default) Let’s try to plot survival instead:
lung %>%
  survfit(Surv(time, status) ~ 1, data = .) %>%
  plot_survfit(cuminc = FALSE)Now let’s see if the survival differs between men and women:
lung %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit(cuminc = FALSE)Well, maybe, but let’s add the 95% confidence intervals!
lung %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit(cuminc = FALSE, ci = TRUE) Now let’s change the axis labels:
lung %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit(cuminc = FALSE, ci = TRUE) + # <--- NOTE!
  labs(x = "Time (days)", y = "Survival (%)")What happened here? Well, as the plotting function actually returns a ggplot object, then it’s easy to just add some ggplot stuff to the plot, in this case using the function labs to add some labels. Note the plus sign! (this is common practice in ggplot2)
If we wish to add a number at risk table, we just put the function nar into the pipeline:
lung %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit(cuminc = FALSE) %>%
  nar()Perhaps we want to change the font size in the numbers at risk table? Then we just provide the size argument:
lung %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit(cuminc = FALSE) %>%
  nar(size = 3)Since we’re dealing with ggplot objects it is very straightforward to change the color scale using standard ggplot2 syntax. In this case we’re looking at another example (namely effects of different treatment strategies for colon cancer):
colon %>%
  survfit(Surv(time, status) ~ rx, data = .) %>%
  plot_survfit() %>%
  nar() +
  scale_color_manual(values = c("darkorange", "steelblue", "darkred"))We’ve also included a few convenience functions to get commonly used presets of color schemes. For some reason, the color guides used in skiing slopes are popular also in Kaplan-Meier curves (i.e. green, blue, red, black):
colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit() %>%
  nar() %>%
  skislopes()You could, if you wanted to, reverse the order of the colors thus:
colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit() %>%
  nar() %>%
  skislopes(reverse = TRUE)Another common color scheme is similar to the skislopes but with orange/yellow instead of black, called cat4 in this package:
colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit() %>%
  nar() %>%
  cat4()However, you’d probably don’t want any color stick out too much. The colorspace package (https://CRAN.r-project.org/package=colorspace) by Zeileis, Hornik, and Murrell provides a nice qualitative color palettes based on the HCL colors. We provide a convenience function, that provides colors for the number of strata based on the rainbow_hcl function. You have the options to tweak parameters such as chroma and luminance to taste. The convenience function is called hcl_rainbow:
colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit() %>%
  nar() %>%
  hcl_rainbow()Like in the previous convenience functions, you could also easily reverse the order of the colors:
colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit() %>%
  nar() %>%
  hcl_rainbow(reverse = TRUE)You could also tweak the width of the lines easily using the lwd argument:
colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(lwd = 2)colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(lwd = 0.5)If you want to change the legend title to something more informative, use the legend.title argument:
colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(legend.title = "Extent of disease")By default, axis and axis breaks are determined by ggplot2 to represent the data optimally. However, you may have other demands, such that for instance locking the Y axis so that multiple plots will have the same range to allow for comparison, or you might be interested only in a shorter period of time than the full period under study. Furthermore, you might wish more or fewer timepoints at which the number at risk figures are shown. In the following passage, we will go through these tweaks.
Say that you want your Y scale to go all the way from 0 to 100. This is easily achieved by the ylim argument:
colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(ylim = c(0, 100)) %>%
  nar()In this example, say that you’re interested only up to the timepoint 2000:
colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(xmax = 2000) %>%
  nar()Notice how the number at risk table was expanded to include numbers at risk at the new X axis breaks selected by ggplot. Perhaps you only want numbers at risk at, say, 0, 1000, 2000. Then we can use the xbreaks argument:
colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(xmax = 2000, xbreaks = c(0, 1000, 2000)) %>%
  nar()Consider this figure:
colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar()Here you see that the red line is (ever so slightly) above the green line, whereas the green row in the numbers at risk table is above the red row. If you want to fix this, you can use the flip argument as follows:
colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(flip = TRUE)Voila!
Use the size argument to change font size:
colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(size = 5, flip = TRUE)colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(size = 2, flip = TRUE)The argument y_offset determines how far below the main plot the numbers at risk table will be. By default, this takes the value 0.05; but could be changed to taste:
colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(y_offset = 0.1, flip = TRUE)colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(y_offset = 0.03, flip = TRUE)By default, a thin horizontal line separates the main plotting area from the numbers at risk table. This could be turned off:
colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(separator = FALSE)You could also change some characteristics of the line, such as color and line width:
colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(sep_color = "grey90", sep_lwd = 1.5)Perhaps you’d want to combine multiple plots. Enter gridExtra (https://CRAN.r-project.org/package=gridExtra)!
library(gridExtra)
#> 
#> Attaching package: 'gridExtra'
#> The following object is masked from 'package:dplyr':
#> 
#>     combineLet’s combine two plots:
p <- list(
  p1 = colon %>%
    survfit(Surv(time, status) ~ sex, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar() +
    labs(tag = "A"),
  p2 = colon %>%
    survfit(Surv(time, status) ~ node4, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar() +
    labs(tag = "B")
)
grid.arrange(grobs = p, ncol = 2)gridExtra provides ways to create even more complex plots:
# Store plots in a list
p <- list(
  p1 = colon %>%
    survfit(Surv(time, status) ~ 1, data = .) %>%
    plot_survfit(ylim = c(0, 100)) +
    labs(tag = "A"),
  p2 = colon %>%
    survfit(Surv(time, status) ~ rx, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar(2, separator = FALSE) +
    labs(tag = "B"),
  p3 = colon %>%
    survfit(Surv(time, status) ~ extent, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar(2, separator = FALSE) +
    labs(tag = "C"),
  p4 = colon %>%
    survfit(Surv(time, status) ~ sex, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar(2, separator = FALSE) +
    labs(tag = "D"),
  p5 = colon %>%
    survfit(Surv(time, status) ~ node4, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar(2, separator = FALSE) +
    labs(tag = "E"),
  p6 = colon %>%
    survfit(Surv(time, status) ~ surg, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar(2, separator = FALSE) +
    labs(tag = "F")
)
#Define layout matrix
lay <- rbind(c(1,1,2),
             c(1,1,3),
             c(4,5,6))
#Plot it all!
grid.arrange(grobs = p, layout_matrix = lay)And, of course, if you wanted to extract only one of your plots, you can easily do so:
p[["p3"]]