Estimating ITS parameters from pre-intervention data

PITS package

2026-06-23

Overview

Before running a power simulation, you need three nuisance parameters estimated from your pre-intervention time-series data:

Parameter Symbol Meaning
Baseline \(\beta_0\) Mean outcome at the start of the pre-intervention period
Residual SD \(\sigma\) Outcome variability from one time point to the next
Autocorrelation \(\rho\) AR(1) correlation between consecutive observations

A fourth quantity, trend_pre, checks whether the outcome was stable before the intervention. It should be close to zero.

These parameters are not the intervention effect — that is your clinical hypothesis (level_change), which must be set separately based on clinical judgement or published evidence.


Using individual estimation functions

PITS provides individual functions for each parameter, useful when you want fine-grained control or wish to estimate only one quantity.

data("example_cfr_data")
outcome <- example_cfr_data$outcome

# Baseline: intercept of a linear trend fit at t = 1
b <- estimate_baseline(outcome)
cat("Baseline:", round(b, 3), "\n")
#> Baseline: 14.696

# Sigma: residual SD after detrending
s <- estimate_sigma(outcome)
cat("Sigma:   ", round(s, 3), "\n")
#> Sigma:    1.514

# Rho: lag-1 autocorrelation of residuals
r <- estimate_rho(outcome)
cat("Rho:     ", round(r, 3), "\n")
#> Rho:      0.371

# Pre-trend: slope per time unit (should be near zero)
t_pre <- estimate_trend(outcome)
cat("Trend:   ", round(t_pre, 4), "per month\n")
#> Trend:    0.0019 per month

When to use individual functions


Using the all-in-one wrapper

For most use cases, estimate_its_params() is the most convenient entry point. It accepts either a data frame or a plain numeric vector.

# From a data frame with 'time' and 'outcome' columns
params <- estimate_its_params(example_cfr_data)
#> 
#> ============================================================
#>   PITS - ITS Parameter Estimation
#> ============================================================ 
#> 
#>   Estimation method:      GLS-ML
#>   Observations (n_pre):   24
#>   Baseline:               14.5853
#>   Sigma (residual SD):    1.4876  (10.2% of baseline)
#>   Rho (AR1):              0.3832
#>   Pre-trend (per unit):   0.0039
#> 
#>  ------------------------------------------------------------ 
#>   Copy these into calculate_power():
#> 
#>     n_pre        = 24
#>     baseline     = 14.5853
#>     sigma        = 1.4876
#>     rho          = 0.3832
#>     level_change = ???  # YOUR clinical hypothesis
#>     slope_change = 0    # set if testing slope
#> 
#> ============================================================
# Or from a plain numeric vector (time index auto-generated)
params2 <- estimate_its_params(example_cfr_data$outcome, verbose = FALSE)
identical(params$sigma, params2$sigma)
#> [1] TRUE

The returned list plugs directly into calculate_power():

result <- calculate_power(
  n_pre        = params$n_pre,
  n_post       = 30,
  baseline     = params$baseline,
  level_change = -3,          # <-- your clinical hypothesis
  sigma        = params$sigma,
  rho          = params$rho
)

Custom column names

If your data uses different column names, pass them explicitly:

# Rename for illustration
alt_data <- example_cfr_data
names(alt_data) <- c("month", "cfr_pct")

params_alt <- estimate_its_params(
  alt_data,
  outcome_col = "cfr_pct",
  time_col    = "month",
  verbose     = FALSE
)
cat("Sigma:", round(params_alt$sigma, 3), "\n")
#> Sigma: 1.488

Diagnostic plots

Always inspect the pre-intervention data before trusting parameter estimates. diagnose_params() produces a 2×2 panel covering the four key checks:

{r diag-full, fig.width = 8, fig.height = 6, fig.cap = "Diagnostic panel: (1) observed series with fitted trend — checks for outliers and non-linearity; (2) residuals over time — checks for heteroscedasticity; (3) Q-Q plot — checks for normality of residuals; (4) residual ACF — quantifies autocorrelation structure."} out <- diagnose_params(example_cfr_data)

What to look for:

  1. Observed + trend: a roughly flat or very gently sloped trend line. A steep pre-trend (> 5% of baseline per period) may indicate confounding.
  2. Residuals over time: random scatter around zero, no funnelling or systematic patterns.
  3. Q-Q plot: points close to the diagonal. Heavy tails are acceptable for monthly rates; severe departures suggest the Gaussian model may not hold and counts-based modelling (Poisson/Negative Binomial) should be considered.
  4. Residual ACF: a single large bar at lag 1 followed by rapid decay is consistent with AR(1). Multiple large bars suggest higher-order autocorrelation; consider a longer pre-period or seasonal adjustment.

What if I have no pre-intervention data?

If historical data are not available, use simulate_predata() to generate a synthetic pre-intervention series and explore how different parameter assumptions affect power. Then use the literature or expert elicitation to justify your choices.

# Simulate pre-intervention data with assumed parameters
pre_synthetic <- simulate_predata(
  n        = 24,
  baseline = 15,
  sigma    = 2.0,
  rho      = 0.40,
  seed     = 42
)
head(pre_synthetic)
#>   time  outcome
#> 1    1 17.51301
#> 2    2 14.97009
#> 3    3 15.65366
#> 4    4 16.42152
#> 5    5 16.30964
#> 6    6 15.32933
plot(pre_synthetic$time, pre_synthetic$outcome,
     type = "o", pch = 16, col = "steelblue",
     xlab = "Time", ylab = "Outcome",
     main = "Synthetic pre-intervention data",
     las = 1, bty = "l")
abline(h = 15, lty = 2, col = "grey50")
grid(NULL, NULL, lwd = 0.6, col = rgb(0, 0, 0, 0.1))
Synthetic pre-intervention series (baseline = 15, sigma = 2, rho = 0.40).
Synthetic pre-intervention series (baseline = 15, sigma = 2, rho = 0.40).

You can then recover the estimated parameters from this synthetic series to verify your simulation is self-consistent:

recovered <- estimate_its_params(pre_synthetic, verbose = FALSE)
cat(sprintf("Target  → baseline: 15.00  sigma: 2.000  rho: 0.400\n"))
#> Target  → baseline: 15.00  sigma: 2.000  rho: 0.400
cat(sprintf("Recovered → baseline: %.2f  sigma: %.3f  rho: %.3f\n",
            recovered$baseline, recovered$sigma, recovered$rho))
#> Recovered → baseline: 17.52  sigma: 2.246  rho: 0.279

Small discrepancies are expected from a single 24-point realisation. Longer pre-periods (n = 48–60) yield more reliable estimates.


Handling non-standard situations

Date-indexed time column

estimate_its_params() accepts date columns and converts them to a sequential integer index automatically:

# If your data has a 'date' column of class Date:
params <- estimate_its_params(
  my_data,
  outcome_col = "cfr",
  time_col    = "date"     # Date objects are handled automatically
)

Missing values

Missing values in the outcome are removed with a warning before fitting:

data_with_na <- example_cfr_data
data_with_na$outcome[c(5, 12)] <- NA

params_na <- suppressWarnings(
  estimate_its_params(data_with_na, verbose = FALSE)
)
cat("n_pre after removing NAs:", params_na$n_pre, "\n")
#> n_pre after removing NAs: 22

Very short pre-periods

Fewer than 12 observations produces a warning. Estimates become increasingly unreliable as the pre-period shrinks, and are essentially meaningless below n = 8.

params_short <- estimate_its_params(
  example_cfr_data$outcome[1:9],
  verbose = FALSE
)
#> Warning in estimate_its_params(example_cfr_data$outcome[1:9], verbose = FALSE):
#> Only 9 observations available. Estimates will be unreliable; aim for >= 24.

If you genuinely have fewer than 12 pre-intervention observations, consider whether the ITS design is appropriate, or whether a controlled ITS (with a comparison series) would be more robust.


Typical parameter ranges for monthly hospital data

As a reference when pre-intervention data are unavailable:

Parameter Typical range Notes
sigma 10–20% of baseline Higher for small hospitals or rare outcomes
rho (monthly) 0.30–0.50 Use 0.40 as a conservative default
rho (weekly) 0.50–0.80 Consider aggregating to monthly
rho (daily) 0.70–0.90 Strong case for monthly aggregation

References