Theoretical Background

This article provides the mathematical foundation for the bias-bound approach implemented in rbbnp, based on Schennach (2020).

The Bias-Variance Tradeoff

The Challenge

In nonparametric estimation, we face a fundamental tradeoff:

Large bandwidth: Low variance but high bias
Small bandwidth: Low bias but high variance

Traditional approaches either:

Undersmooth: Use smaller bandwidths to reduce bias, but this inflates variance and produces inefficient confidence intervals
Ignore bias: Use optimal MSE bandwidths but produce invalid confidence intervals

The Solution

The bias-bound approach takes a different path: instead of eliminating or ignoring bias, we bound it. This allows us to:

Use optimal (MSE-minimizing) bandwidths
Construct valid confidence intervals that explicitly account for potential bias
Achieve better coverage without sacrificing efficiency

Mathematical Framework

Kernel Density Estimation

For a sample \(X_1, \ldots, X_n\) from density \(f\), the kernel density estimator is:

\[\hat{f}_h(x) = \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x - X_i}{h}\right)\]

where \(K\) is the kernel function and \(h\) is the bandwidth.

Decomposing the Error

The estimation error decomposes as:

\[\hat{f}_h(x) - f(x) = \underbrace{[\hat{f}_h(x) - E[\hat{f}_h(x)]]}_{\text{variance term}} + \underbrace{[E[\hat{f}_h(x)] - f(x)]}_{\text{bias term}}\]

The variance term is random with known distribution. The bias term is deterministic but unknown.

Fourier Representation

Key Insight

The bias-bound approach exploits the Fourier representation of the bias. For kernel estimators:

\[E[\hat{f}_h(x)] - f(x) = \int_{-\infty}^{\infty} [K^{FT}(h\xi) - 1] f^{FT}(\xi) e^{i\xi x} d\xi\]

where \(K^{FT}\) and \(f^{FT}\) are Fourier transforms.

Smoothness Detection

The Fourier transform of a smooth function decays polynomially:

\[|f^{FT}(\xi)| \leq A |\xi|^{-r}\]

where: - \(A\) is an amplitude constant - \(r\) measures the smoothness (larger = smoother)

The package automatically detects \((A, r)\) from the data by fitting the empirical Fourier transform.

# Generate sample data
X <- gen_sample_data(size = 500, dgp = "2_fold_uniform", seed = 42)

# Estimate density
fit <- biasBound_density(X, h = 0.08, kernel.fun = "Schennach2004")

# View detected smoothness parameters
coef(fit)
#>        A        r        h 
#> 5.946712 2.271750 0.080000

# Visualize Fourier transform fit
plot(fit, type = "ft")

The plot shows (the legend labels each line): - Empirical |phi|: the empirical Fourier transform magnitude - Fitted envelope: the fitted envelope \(A|\xi|^{-r}\), drawn over the selected window - The shaded band: the frequency range used for fitting

Constructing Bias Bounds

The Bias Bound Formula

Given the smoothness envelope, the maximum possible bias is:

\[\bar{b}(x) = \int_{-\infty}^{\infty} |K^{FT}(h\xi) - 1| \cdot A |\xi|^{-r} d\xi\]

This integral can be computed analytically for many kernel functions.

Interpretation

The bias bound \(\bar{b}\) represents the worst-case bias consistent with the detected smoothness. The true bias satisfies:

\[|E[\hat{f}_h(x)] - f(x)| \leq \bar{b}(x)\]

Confidence Interval Construction

Standard CI (Ignoring Bias)

Traditional confidence intervals:

\[CI_{\text{naive}} = \hat{f}(x) \pm z_{\alpha/2} \hat{\sigma}(x)\]

These have incorrect coverage when bias is non-negligible.

Bias-Bound CI

The bias-bound approach constructs:

\[CI_{\text{bias-bound}} = [\hat{f}(x) - \bar{b}(x) - z_{\alpha/2}\hat{\sigma}(x), \quad \hat{f}(x) + \bar{b}(x) + z_{\alpha/2}\hat{\sigma}(x)]\]

This accounts for the worst-case bias in both directions.

Visualization

# The plot shows both bands
plot(fit)

In the plot (labeled in the legend): - Bias bound: the bias range \([\hat{f} - \bar{b}, \hat{f} + \bar{b}]\) - 95% CI: the full confidence interval including sampling uncertainty

Kernel Functions

Infinite-Order Kernels

For the bias-bound approach, infinite-order kernels are recommended because they satisfy:

\[K^{FT}(\xi) = 1 \text{ for } |\xi| \leq 1\]

This means no bias from frequencies below \(1/h\), simplifying the bias bound calculation.

Available Kernels

Kernel	Order	Fourier Transform
Schennach2004	\(\infty\)	Smooth transition at \(\|\xi\|=1\)
sinc	\(\infty\)	Sharp cutoff at \(\|\xi\|=1\)
normal	2	Gaussian decay
epanechnikov	2	Finite support

library(gridExtra)

fit_sch <- biasBound_density(X, kernel.fun = "Schennach2004")
fit_sinc <- biasBound_density(X, kernel.fun = "sinc")

grid.arrange(
  plot(fit_sch) + ggtitle("Schennach2004 (recommended)"),
  plot(fit_sinc) + ggtitle("Sinc kernel"),
  ncol = 1
)

Extension to Regression

Conditional Expectation

For regression \(E[Y|X=x]\), the same principles apply. The Nadaraya-Watson estimator:

\[\hat{m}(x) = \frac{\sum_{i=1}^{n} K_h(x - X_i) Y_i}{\sum_{i=1}^{n} K_h(x - X_i)}\]

has bias that can be bounded using the Fourier representation of the conditional expectation function.

Implementation

# Generate regression data
Y <- sin(2 * pi * X) + rnorm(500, sd = 0.3)

# Estimate with bias bounds
fit_reg <- biasBound_condExpectation(Y, X, h = 0.1)

# View smoothness parameters
coef(fit_reg)
#>          A          r          B          h 
#> 22.9169202  2.0000000  0.6374611  0.1000000

Bandwidth Selection

Cross-Validation

The package uses leave-one-out cross-validation to select the MSE-optimal bandwidth:

\[h_{CV} = \arg\min_h \sum_{i=1}^{n} (\hat{f}_{-i,h}(X_i))^2 - 2\hat{f}_h(X_i)\]

h_cv <- select_bandwidth(X, method = "cv", kernel.fun = "Schennach2004")
h_silv <- select_bandwidth(X, method = "silverman", kernel.fun = "normal")

cat("CV bandwidth:", round(h_cv, 4), "\n")
#> CV bandwidth: 0.2508
cat("Silverman bandwidth:", round(h_silv, 4))
#> Silverman bandwidth: 0.1045

Optimal vs. Undersmoothing

Unlike traditional methods, the bias-bound approach uses optimal bandwidths without sacrificing valid inference:

result_opt <- biasBound_density(X, h = h_cv, kernel.fun = "Schennach2004")
result_under <- biasBound_density(X, h = h_cv * 0.5, kernel.fun = "Schennach2004")

grid.arrange(
  plot(result_opt) + ggtitle(paste0("Optimal bandwidth (h = ", round(h_cv, 3), ")")),
  plot(result_under) + ggtitle(paste0("Undersmoothed (h = ", round(h_cv/2, 3), ")")),
  ncol = 1
)

The optimal bandwidth produces narrower confidence intervals while maintaining valid coverage.