This article provides the mathematical foundation for the bias-bound approach implemented in rbbnp, based on Schennach (2020).
In nonparametric estimation, we face a fundamental tradeoff:
Traditional approaches either:
The bias-bound approach takes a different path: instead of eliminating or ignoring bias, we bound it. This allows us to:
For a sample \(X_1, \ldots, X_n\) from density \(f\), the kernel density estimator is:
\[\hat{f}_h(x) = \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x - X_i}{h}\right)\]
where \(K\) is the kernel function and \(h\) is the bandwidth.
The estimation error decomposes as:
\[\hat{f}_h(x) - f(x) = \underbrace{[\hat{f}_h(x) - E[\hat{f}_h(x)]]}_{\text{variance term}} + \underbrace{[E[\hat{f}_h(x)] - f(x)]}_{\text{bias term}}\]
The variance term is random with known distribution. The bias term is deterministic but unknown.
The bias-bound approach exploits the Fourier representation of the bias. For kernel estimators:
\[E[\hat{f}_h(x)] - f(x) = \int_{-\infty}^{\infty} [K^{FT}(h\xi) - 1] f^{FT}(\xi) e^{i\xi x} d\xi\]
where \(K^{FT}\) and \(f^{FT}\) are Fourier transforms.
The Fourier transform of a smooth function decays polynomially:
\[|f^{FT}(\xi)| \leq A |\xi|^{-r}\]
where: - \(A\) is an amplitude constant - \(r\) measures the smoothness (larger = smoother)
The package automatically detects \((A, r)\) from the data by fitting the empirical Fourier transform.
# Generate sample data
X <- gen_sample_data(size = 500, dgp = "2_fold_uniform", seed = 42)
# Estimate density
fit <- biasBound_density(X, h = 0.08, kernel.fun = "Schennach2004")
# View detected smoothness parameters
coef(fit)
#> A r h
#> 5.946712 2.271750 0.080000The plot shows (the legend labels each line): - Empirical |phi|: the empirical Fourier transform magnitude - Fitted envelope: the fitted envelope \(A|\xi|^{-r}\), drawn over the selected window - The shaded band: the frequency range used for fitting
Given the smoothness envelope, the maximum possible bias is:
\[\bar{b}(x) = \int_{-\infty}^{\infty} |K^{FT}(h\xi) - 1| \cdot A |\xi|^{-r} d\xi\]
This integral can be computed analytically for many kernel functions.
The bias bound \(\bar{b}\) represents the worst-case bias consistent with the detected smoothness. The true bias satisfies:
\[|E[\hat{f}_h(x)] - f(x)| \leq \bar{b}(x)\]
Traditional confidence intervals:
\[CI_{\text{naive}} = \hat{f}(x) \pm z_{\alpha/2} \hat{\sigma}(x)\]
These have incorrect coverage when bias is non-negligible.
The bias-bound approach constructs:
\[CI_{\text{bias-bound}} = [\hat{f}(x) - \bar{b}(x) - z_{\alpha/2}\hat{\sigma}(x), \quad \hat{f}(x) + \bar{b}(x) + z_{\alpha/2}\hat{\sigma}(x)]\]
This accounts for the worst-case bias in both directions.
For the bias-bound approach, infinite-order kernels are recommended because they satisfy:
\[K^{FT}(\xi) = 1 \text{ for } |\xi| \leq 1\]
This means no bias from frequencies below \(1/h\), simplifying the bias bound calculation.
| Kernel | Order | Fourier Transform |
|---|---|---|
| Schennach2004 | \(\infty\) | Smooth transition at \(|\xi|=1\) |
| sinc | \(\infty\) | Sharp cutoff at \(|\xi|=1\) |
| normal | 2 | Gaussian decay |
| epanechnikov | 2 | Finite support |
library(gridExtra)
fit_sch <- biasBound_density(X, kernel.fun = "Schennach2004")
fit_sinc <- biasBound_density(X, kernel.fun = "sinc")
grid.arrange(
plot(fit_sch) + ggtitle("Schennach2004 (recommended)"),
plot(fit_sinc) + ggtitle("Sinc kernel"),
ncol = 1
)For regression \(E[Y|X=x]\), the same principles apply. The Nadaraya-Watson estimator:
\[\hat{m}(x) = \frac{\sum_{i=1}^{n} K_h(x - X_i) Y_i}{\sum_{i=1}^{n} K_h(x - X_i)}\]
has bias that can be bounded using the Fourier representation of the conditional expectation function.
The package uses leave-one-out cross-validation to select the MSE-optimal bandwidth:
\[h_{CV} = \arg\min_h \sum_{i=1}^{n} (\hat{f}_{-i,h}(X_i))^2 - 2\hat{f}_h(X_i)\]
Unlike traditional methods, the bias-bound approach uses optimal bandwidths without sacrificing valid inference:
result_opt <- biasBound_density(X, h = h_cv, kernel.fun = "Schennach2004")
result_under <- biasBound_density(X, h = h_cv * 0.5, kernel.fun = "Schennach2004")
grid.arrange(
plot(result_opt) + ggtitle(paste0("Optimal bandwidth (h = ", round(h_cv, 3), ")")),
plot(result_under) + ggtitle(paste0("Undersmoothed (h = ", round(h_cv/2, 3), ")")),
ncol = 1
)The optimal bandwidth produces narrower confidence intervals while maintaining valid coverage.
The bias-bound approach provides:
Schennach, S. M. (2020). A Bias Bound Approach to Non-parametric Inference. The Review of Economic Studies, 87(5), 2439-2472. doi:10.1093/restud/rdz065