---
title: "Theoretical Background"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Theoretical Background}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  fig.align = "center",
  out.width = "80%",
  warning = FALSE,
  message = FALSE
)
library(rbbnp)
library(ggplot2)
```

This article provides the mathematical foundation for the bias-bound approach implemented in **rbbnp**, based on [Schennach (2020)](https://doi.org/10.1093/restud/rdz065).

## The Bias-Variance Tradeoff

### The Challenge

In nonparametric estimation, we face a fundamental tradeoff:

- **Large bandwidth**: Low variance but high bias
- **Small bandwidth**: Low bias but high variance

Traditional approaches either:

1. **Undersmooth**: Use smaller bandwidths to reduce bias, but this inflates variance and produces inefficient confidence intervals
2. **Ignore bias**: Use optimal MSE bandwidths but produce invalid confidence intervals

### The Solution

The bias-bound approach takes a different path: instead of eliminating or ignoring bias, we **bound** it. This allows us to:

- Use optimal (MSE-minimizing) bandwidths
- Construct valid confidence intervals that explicitly account for potential bias
- Achieve better coverage without sacrificing efficiency

## Mathematical Framework

### Kernel Density Estimation

For a sample $X_1, \ldots, X_n$ from density $f$, the kernel density estimator is:

$$\hat{f}_h(x) = \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x - X_i}{h}\right)$$

where $K$ is the kernel function and $h$ is the bandwidth.

### Decomposing the Error

The estimation error decomposes as:

$$\hat{f}_h(x) - f(x) = \underbrace{[\hat{f}_h(x) - E[\hat{f}_h(x)]]}_{\text{variance term}} + \underbrace{[E[\hat{f}_h(x)] - f(x)]}_{\text{bias term}}$$

The variance term is random with known distribution. The bias term is deterministic but unknown.

## Fourier Representation

### Key Insight

The bias-bound approach exploits the **Fourier representation** of the bias. For kernel estimators:

$$E[\hat{f}_h(x)] - f(x) = \int_{-\infty}^{\infty} [K^{FT}(h\xi) - 1] f^{FT}(\xi) e^{i\xi x} d\xi$$

where $K^{FT}$ and $f^{FT}$ are Fourier transforms.

### Smoothness Detection

The Fourier transform of a smooth function decays polynomially:

$$|f^{FT}(\xi)| \leq A |\xi|^{-r}$$

where:
- $A$ is an amplitude constant
- $r$ measures the smoothness (larger = smoother)

The package **automatically detects** $(A, r)$ from the data by fitting the empirical Fourier transform.

```{r fourier-demo}
# Generate sample data
X <- gen_sample_data(size = 500, dgp = "2_fold_uniform", seed = 42)

# Estimate density
fit <- biasBound_density(X, h = 0.08, kernel.fun = "Schennach2004")

# View detected smoothness parameters
coef(fit)
```

```{r ft-plot}
# Visualize Fourier transform fit
plot(fit, type = "ft")
```

The plot shows (the legend labels each line):
- **Empirical |phi|**: the empirical Fourier transform magnitude
- **Fitted envelope**: the fitted envelope $A|\xi|^{-r}$, drawn over the selected window
- The **shaded band**: the frequency range used for fitting

## Constructing Bias Bounds

### The Bias Bound Formula

Given the smoothness envelope, the maximum possible bias is:

$$\bar{b}(x) = \int_{-\infty}^{\infty} |K^{FT}(h\xi) - 1| \cdot A |\xi|^{-r} d\xi$$

This integral can be computed analytically for many kernel functions.

### Interpretation

The bias bound $\bar{b}$ represents the **worst-case bias** consistent with the detected smoothness. The true bias satisfies:

$$|E[\hat{f}_h(x)] - f(x)| \leq \bar{b}(x)$$

## Confidence Interval Construction

### Standard CI (Ignoring Bias)

Traditional confidence intervals:

$$CI_{\text{naive}} = \hat{f}(x) \pm z_{\alpha/2} \hat{\sigma}(x)$$

These have incorrect coverage when bias is non-negligible.

### Bias-Bound CI

The bias-bound approach constructs:

$$CI_{\text{bias-bound}} = [\hat{f}(x) - \bar{b}(x) - z_{\alpha/2}\hat{\sigma}(x), \quad \hat{f}(x) + \bar{b}(x) + z_{\alpha/2}\hat{\sigma}(x)]$$

This accounts for the worst-case bias in both directions.

### Visualization

```{r ci-visualization}
# The plot shows both bands
plot(fit)
```

In the plot (labeled in the legend):
- **Bias bound**: the bias range $[\hat{f} - \bar{b}, \hat{f} + \bar{b}]$
- **95% CI**: the full confidence interval including sampling uncertainty

## Kernel Functions

### Infinite-Order Kernels

For the bias-bound approach, **infinite-order kernels** are recommended because they satisfy:

$$K^{FT}(\xi) = 1 \text{ for } |\xi| \leq 1$$

This means no bias from frequencies below $1/h$, simplifying the bias bound calculation.

### Available Kernels

| Kernel | Order | Fourier Transform |
|--------|-------|-------------------|
| Schennach2004 | $\infty$ | Smooth transition at $|\xi|=1$ |
| sinc | $\infty$ | Sharp cutoff at $|\xi|=1$ |
| normal | 2 | Gaussian decay |
| epanechnikov | 2 | Finite support |

```{r kernel-comparison, fig.width=6, fig.height=10.5, out.width="100%"}
library(gridExtra)

fit_sch <- biasBound_density(X, kernel.fun = "Schennach2004")
fit_sinc <- biasBound_density(X, kernel.fun = "sinc")

grid.arrange(
  plot(fit_sch) + ggtitle("Schennach2004 (recommended)"),
  plot(fit_sinc) + ggtitle("Sinc kernel"),
  ncol = 1
)
```

## Extension to Regression

### Conditional Expectation

For regression $E[Y|X=x]$, the same principles apply. The Nadaraya-Watson estimator:

$$\hat{m}(x) = \frac{\sum_{i=1}^{n} K_h(x - X_i) Y_i}{\sum_{i=1}^{n} K_h(x - X_i)}$$

has bias that can be bounded using the Fourier representation of the conditional expectation function.

### Implementation

```{r regression-theory}
# Generate regression data
Y <- sin(2 * pi * X) + rnorm(500, sd = 0.3)

# Estimate with bias bounds
fit_reg <- biasBound_condExpectation(Y, X, h = 0.1)

# View smoothness parameters
coef(fit_reg)
```

## Bandwidth Selection

### Cross-Validation

The package uses leave-one-out cross-validation to select the MSE-optimal bandwidth:

$$h_{CV} = \arg\min_h \sum_{i=1}^{n} (\hat{f}_{-i,h}(X_i))^2 - 2\hat{f}_h(X_i)$$

```{r bw-selection}
h_cv <- select_bandwidth(X, method = "cv", kernel.fun = "Schennach2004")
h_silv <- select_bandwidth(X, method = "silverman", kernel.fun = "normal")

cat("CV bandwidth:", round(h_cv, 4), "\n")
cat("Silverman bandwidth:", round(h_silv, 4))
```

### Optimal vs. Undersmoothing

Unlike traditional methods, the bias-bound approach uses **optimal bandwidths** without sacrificing valid inference:

```{r optimal-vs-under, fig.width=6, fig.height=10.5, out.width="100%"}
result_opt <- biasBound_density(X, h = h_cv, kernel.fun = "Schennach2004")
result_under <- biasBound_density(X, h = h_cv * 0.5, kernel.fun = "Schennach2004")

grid.arrange(
  plot(result_opt) + ggtitle(paste0("Optimal bandwidth (h = ", round(h_cv, 3), ")")),
  plot(result_under) + ggtitle(paste0("Undersmoothed (h = ", round(h_cv/2, 3), ")")),
  ncol = 1
)
```

The optimal bandwidth produces narrower confidence intervals while maintaining valid coverage.

## Summary

The bias-bound approach provides:

1. **Valid inference** with optimal bandwidths
2. **Automatic smoothness detection** via Fourier analysis
3. **Explicit bias accounting** in confidence intervals
4. **Efficiency gains** over undersmoothing

## References

Schennach, S. M. (2020). A Bias Bound Approach to Non-parametric Inference. *The Review of Economic Studies*, 87(5), 2439-2472. [doi:10.1093/restud/rdz065](https://doi.org/10.1093/restud/rdz065)

## See Also

- [Get Started](rbbnp.html): Quick introduction
- [Density Estimation](density-estimation.html): Detailed density guide
- [Regression](regression.html): Conditional expectation estimation