This vignette covers the three conformal-prediction interfaces
exported by MetaHunt. The validity of all three rests on the
exchangeability assumption A3 in
vignette("metahunt-intro", package = "MetaHunt") §“Key
assumptions”; we do not re-derive it here.
Conformal prediction wraps any black-box prediction rule and produces
a band around its forecast that, on average across new studies, will
contain the truth at least (1 - alpha) of the time. The key
word is marginal: the guarantee is over the random draw of the
new study, not conditional on a specific covariate value. All you need
is for the calibration data to be exchangeable with the new study
(assumption A3) — no distributional assumptions on the noise or on the
weight model.
# m = 80 is large enough that with cal_frac = 0.5 and alpha = 0.05 the conformal quantile is finite.
m <- 80; G <- 20; K_true <- 3
x <- seq(0, 1, length.out = G)
basis <- rbind(sin(pi * x), cos(pi * x), x)
W <- data.frame(w1 = rnorm(m), w2 = rnorm(m))
beta <- cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0))
pi_true <- exp(as.matrix(W) %*% beta); pi_true <- pi_true / rowSums(pi_true)
F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G)All three return an object you can plot directly with
plot().
| Function | When to use |
|---|---|
split_conformal() |
Default. One train/calibration split. Fastest; some variance from the random split. |
cross_conformal() |
Many studies, want lower split-induced variance. Refits the pipeline
n_folds + 1 times. |
conformal_from_fit() |
You’ve already fit a pipeline (e.g. after tuning K) and
want intervals without refitting. See
?conformal_from_fit. |
split_conformal() does a single train / calibration
split. With small m, set cal_frac = 0.5 so the
calibration set is large enough for the chosen alpha.
res_pw <- split_conformal(F_hat, W, W_new, K = K_true, alpha = 0.05,
cal_frac = 0.5, seed = 1,
dfspa_args = list(denoise = FALSE))
plot(res_pw, target_idx = 1, x_axis = x)The shaded region is the pointwise 95% conformal band; it has finite
width because n_cal = 40 is large enough for
α = 0.05.
res_scalar <- split_conformal(F_hat, W, W_new, K = K_true,
wrapper = mean, alpha = 0.05,
cal_frac = 0.5, seed = 1,
dfspa_args = list(denoise = FALSE))
data.frame(prediction = res_scalar$prediction,
lower = res_scalar$lower,
upper = res_scalar$upper)
#> prediction lower upper
#> 1 0.33312276 0.27629349 0.3899520
#> 2 0.50923298 0.45240371 0.5660622
#> 3 0.08545421 0.02862494 0.1422835cross_conformal() reduces the variance of the band that
comes from the random split, at the cost of refitting
n_folds + 1 times.
res_cross <- cross_conformal(F_hat, W, W_new, K = K_true, n_folds = 4,
wrapper = mean, alpha = 0.1, seed = 1,
dfspa_args = list(denoise = FALSE))
res_cross
#> MetaHunt conformal prediction
#> method: cross
#> alpha: 0.1
#> n calibration: 80
#> mode: scalar (via wrapper)
#> n targets: 3
#> quantile: 0.04072If you have already run metahunt() (for instance after
tuning K) and do not want to refit,
conformal_from_fit() recycles the existing fit to produce
calibrated intervals. The example below re-uses the training data as the
calibration set for demonstration only; in real use, hold out a
separate calibration set so the exchangeability argument applies to
genuinely unseen studies.
fit <- metahunt(F_hat, W, K = K_true, dfspa_args = list(denoise = FALSE))
pi_hat <- project_to_simplex(F_hat, fit$dfspa_fit$bases)
res_pre <- conformal_from_fit(
dfspa_fit = fit$dfspa_fit, weight_model = fit$weight_model,
F_cal = F_hat, W_cal = W, W_new = W_new,
wrapper = mean, alpha = 0.1
)
res_pre
#> MetaHunt conformal prediction
#> method: split
#> alpha: 0.1
#> n calibration: 80
#> mode: scalar (via wrapper)
#> n targets: 3
#> quantile: 0.033A pointwise band returns a (1 - alpha) interval at each
grid point but does not give a joint guarantee across grid points: the
probability that the truth lies inside the entire band simultaneously is
generally lower than 1 - alpha. A scalar wrapper
(e.g. wrapper = mean) collapses the function to a single
number and gives one calibrated interval, which is the right tool for
joint inferential claims. If you need a joint coverage statement across
the grid, either apply a multiple-testing correction (e.g. divide α by
G) or replace the pointwise band with a scalar wrapper —
see vignette('wrapper-scalar', package = 'MetaHunt').
m warning on cal_fracWith too-few calibration studies for the chosen
alpha, the conformal quantile isInfand intervals are unbounded. The finite-sample formula needsn_cal >= ceiling((1 - alpha)(n_cal + 1))calibration studies; below that threshold the package warns and the bands degenerate. Either raisecal_frac, raisealpha, or switch tocross_conformal().
Below we deliberately reuse only the first 30 of our 80 studies so
the calibration set is too small for α = 0.05. The package
issues a warning and returns Inf quantiles; the
corresponding intervals are unbounded. The fix is to either supply more
studies, raise α, or raise cal_frac.
m_small <- 30 # too small for alpha = 0.05 with cal_frac = 0.5
F_small <- F_hat[1:m_small, , drop = FALSE]
W_small <- W[1:m_small, , drop = FALSE]
res_inf <- split_conformal(F_small, W_small, W_new, K = K_true,
alpha = 0.05, cal_frac = 0.5, seed = 1,
dfspa_args = list(denoise = FALSE))
#> Warning in .build_conformal_output(obs_cal = F_cal, pred_cal = pred_cal, : With
#> n_cal = 15 and alpha = 0.05, the conformal quantile is infinite at 20 of 20
#> grid points; intervals are unbounded there. Increase calibration size (raise
#> `cal_frac` or supply more studies) or use a larger `alpha`.
res_inf$quantile # Inf — quantile is unbounded
#> [1] Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf
#> [20] Inf
range(res_inf$lower) # -Inf
#> [1] -Inf -Inf
range(res_inf$upper) # Inf
#> [1] Inf Infvignette("metahunt-intro", package = "MetaHunt") —
pipeline context and the A3 exchangeability assumption.?split_conformal — single-split conformal
calibration.?cross_conformal — cross-fitting conformal
calibration.?conformal_from_fit — calibration using an existing
fit.?coverage — empirical coverage diagnostics for
conformal bands.?plot.metahunt_conformal — plotting method for the
returned objects.