Package {rbbnp}


Type: Package
Title: A Bias Bound Approach to Non-Parametric Inference
Version: 1.1.0
Description: A novel bias-bound approach for non-parametric inference is introduced, focusing on both density and conditional expectation estimation. It constructs valid confidence intervals that account for the presence of a non-negligible bias and thus make it possible to perform inference with optimal mean squared error minimizing bandwidths. This package is based on Schennach (2020) <doi:10.1093/restud/rdz065>.
License: GPL (≥ 3)
Encoding: UTF-8
LazyData: true
URL: https://xinyudai.net/rbbnp-dev/
Imports: dplyr, ggplot2 (≥ 3.4.0), gridExtra, pracma, purrr, Rglpk, tidyr
RoxygenNote: 7.3.3
Depends: R (≥ 3.5)
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0)
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-06-07 05:53:15 UTC; clawdbot
Author: Xinyu DAI [aut, cre], Susanne M Schennach [aut]
Maintainer: Xinyu DAI <xinyu_dai@brown.edu>
Repository: CRAN
Date/Publication: 2026-06-09 02:30:02 UTC

rbbnp: Bias-Bounded Nonparametric Inference

Description

The rbbnp package provides functions for bias-bounded nonparametric density and conditional expectation estimation based on the methodology of Schennach (2020).

Main Functions

The package provides two main estimation functions:

S3 Object System

Both main functions return S3 objects with the following classes:

S3 Methods

The following generic methods are available for working with estimation results:

print()

Concise summary of estimation results

summary()

Detailed statistics and parameter estimates

plot()

Visualization of results (density/regression plots, FT plots)

coef()

Extract bias bound parameters (A, r, h) and B for regression

fitted()

Extract fitted conditional expectation values (regression only)

confint()

Extract confidence intervals

Usage Example

# Density estimation workflow
X <- rnorm(100)
fit <- biasBound_density(X, h = 0.1)
print(fit)
summary(fit)
plot(fit)              # Density plot
plot(fit, type = "ft") # Fourier transform plot
coef(fit)              # Bias bound parameters
confint(fit)           # Confidence intervals

# Regression estimation workflow
Y <- X^2 + rnorm(100)
fit_reg <- biasBound_condExpectation(Y, X, h = 0.1)
plot(fit_reg)          # Regression plot
fitted(fit_reg)        # Fitted values

Author(s)

Maintainer: Xinyu DAI xinyu_dai@brown.edu

Authors:

References

Schennach, S. M. (2020). "Long Story Short: Omitted Variable Bias in Causal Machine Learning." NBER Working Paper.

See Also

Useful links:


Check if LP solver is available

Description

Check if LP solver is available

Usage

.has_lp_solver()

Value

Logical indicating if Rglpk is available


Uniform Fourier-transform error bound Delta-phi (Schennach 2020, Theorem 1)

Description

Uniform Fourier-transform error bound Delta-phi (Schennach 2020, Theorem 1)

Usage

.noise_floor_avg_dphi(Y, n, noise_floor = "compact")

Arguments

Y

Sample vector (or 1 for density).

n

Sample size.

noise_floor

"compact" (the ⁠sqrt(7) * (ln n)^(1/2)⁠ form, default), "general" (the ⁠(7 * sqrt(2) / 3) * ln n⁠ form), or "auto" (general when Y is non-constant, else compact).


Prepare LP problem matrices for envelope optimization

Description

Prepare LP problem matrices for envelope optimization

Usage

.prepare_lp_problem(ln_xi_range, avg_phi_log_values, r_max = 50)

Arguments

ln_xi_range

Numeric vector of log-frequencies

avg_phi_log_values

Numeric vector of log-Fourier-transform values

r_max

Maximum allowed value for r parameter (default 50)

Value

List with components: obj (objective coefficients), mat (constraint matrix), rhs (RHS vector), bounds (variable bounds)


Signal-to-noise frequency-window selection rule

Description

Selects the upper frequency cutoff as the point where the pointwise signal-to-noise ratio of the empirical Fourier transform first drops (on a sustained run) below tau. Using the pointwise (delta-method) standard error in place of Schennach (2020)'s worst-case uniform error bound keeps the rule usable at realistic sample sizes, and a minimum-window guard ensures it never returns an empty interval.

Usage

.snr_window(Y = 1, X, tau = NULL, run = 5L, per_log = 200)

Arguments

Y

Numeric vector (or 1 for density).

X

Numeric vector of sample data.

tau

SNR threshold: NULL (default) uses max(2, sqrt(2 * log(n))); a numeric value, or a function of n. Letting tau grow faster than ⁠sqrt(log n)⁠ preserves the asymptotic rate condition of Schennach (2020) while keeping the rule usable in finite samples.

run

Minimum consecutive below-threshold points for a sustained crossing.

per_log

Grid resolution (points per log-unit of frequency).

Value

A list with xi_lb and xi_ub.


Solve LP problem for envelope optimization

Description

Solve LP problem for envelope optimization

Usage

.solve_lp_envelope(lp_problem)

Arguments

lp_problem

List from .prepare_lp_problem()

Value

List with status, solution vector, and objective value


The Path to the Data Folder

Description

This variable provides the path to the data folder within the package.

Value

The path to the package's internal data folder as a character string.


The Path to the External Data Folder for Non-R Data Files

Description

This variable provides the path to the extdata folder within the package, where non-standard R data files are stored.

Value

The path to the package's external data folder (for non-standard R data files) as a character string.


Define the inverse Fourier transform function of W

Description

Define the inverse Fourier transform function of W

Usage

W_kernel(u, L = 10)

Arguments

u

A numerical value or vector representing the time or space domain.

L

The limit for numerical integration, defines the range of integration as [-L, L]. Defaults to 10.

Value

A numerical value or vector representing the inverse Fourier transform of the infinite order kernel at the given time or space point(s).


Define the Fourier transform of a infinite kernel proposed in Schennach 2004

Description

Define the Fourier transform of a infinite kernel proposed in Schennach 2004

Usage

W_kernel_ft(xi, xi_lb = 0.5, xi_ub = 1.5)

Arguments

xi

A numerical value or vector representing the frequency domain.

xi_lb

The lower bound for the frequency domain. Defaults to 0.5.

xi_ub

The upper bound for the frequency domain. Defaults to 1.5.

Value

A numerical value or vector representing the Fourier transform of the infinite order kernel at the given frequency/frequencies.


Bias bound approach for conditional expectation estimation

Description

Estimates the density at a given point or across a range, and provides visualization options for density, bias, and confidence intervals.

Usage

biasBound_condExpectation(
  Y,
  X,
  x = NULL,
  h = NULL,
  h_method = "cv",
  alpha = 0.05,
  est_Ar = NULL,
  resol = 100,
  xi_lb = NULL,
  xi_ub = NULL,
  methods_get_xi = "snr",
  noise_floor = "auto",
  envelope_use_Y = TRUE,
  integer_r = TRUE,
  ora_Ar = NULL,
  kernel.fun = "Schennach2004",
  if_approx_kernel = TRUE,
  kernel.resol = 1000
)

Arguments

Y

A numerical vector of sample data.

X

A numerical vector of sample data.

x

Optional. A scalar or range of points where the density is estimated. If NULL, a range is automatically generated.

h

A scalar bandwidth parameter. If NULL, the bandwidth is automatically selected using the method specified in 'h_method'.

h_method

Method for automatic bandwidth selection when h is NULL. Options are "cv" (cross-validation) and "silverman" (Silverman's rule of thumb). Default is "cv".

alpha

Confidence level for intervals. Default is 0.05.

est_Ar

Optional list of estimates for A and r. If NULL, they are computed using get_est_Ar().

resol

Resolution for the estimation range. Default is 100.

xi_lb

Optional. Lower bound for the interval of Fourier Transform frequency xi. Used for determining the range over which A and r is estimated. If NULL, it is automatically determined based on the methods_get_xi.

xi_ub

Optional. Upper bound for the interval of Fourier Transform frequency xi. Similar to xi_lb, it defines the upper range for A and r estimation. If NULL, the upper bound is determined based on the methods_get_xi.

methods_get_xi

A string selecting the frequency-window rule used when xi_lb/xi_ub are NULL: "snr" (default; a signal-to-noise cutoff that selects a valid window at realistic sample sizes), "Schennach" (the data-driven rule of Schennach 2020, Theorem 2), or "Schennach_loose" (the initial, un-refined interval).

noise_floor

Noise-floor form for the Schennach test: "auto" (default), "compact", or "general".

envelope_use_Y

If TRUE (default), fit the regression envelope to the cross-spectrum ⁠|phi_YX|⁠; if FALSE, fit it to the marginal spectrum ⁠|phi_X|⁠.

integer_r

If TRUE (default), clamp the fitted envelope slope up to r = 2 when it falls below the minimum smoothness assumed by Schennach (2020, Definition 2), i.e. r < 2, and refit A; this keeps the bias-bound integral finite. Slopes >= 2 are left unchanged.

ora_Ar

Optional list of oracle values for A and r (for research/comparison purposes).

kernel.fun

A string specifying the kernel function to be used. Options are "Schennach2004", "sinc", "normal", "epanechnikov".

if_approx_kernel

Logical. If TRUE, uses approximations for the kernel function.

kernel.resol

The resolution for kernel function approximation. See fun_approx.

Value

An object of class bbnp_regression with components:

fitted_values

E[Y|X=x] estimates (for range estimation)

x

Evaluation points

estimate

Point estimate (for single x)

conf_int

List containing lower, upper bounds and conf_level. Note that the confidence interval can be unbounded (i.e., contain -Inf or Inf) in regions where the estimated marginal density \hat f(x) is very close to zero, because the estimator is formed as a ratio involving 1/\hat f(x).

bias_bound

List containing b1x, byx, est_A, est_r, est_B, xi_interval

std_error

Standard errors

marginal_density

f(x) estimates

joint_density

f_YX estimates

call

The function call

bandwidth

Bandwidth used

n

Sample size

kernel

Kernel type

data

Original data (X, Y)

Use plot(), summary(), coef(), fitted(), and confint() methods to work with the result.

Examples


# Example 1: Point estimation at x = 1
X <- rnorm(100)
Y <- X^2 + rnorm(100)
fit <- biasBound_condExpectation(Y = Y, X = X, x = 1, h = 0.09)
print(fit)
fitted(fit)

# Example 2: Range estimation with plots
fit2 <- biasBound_condExpectation(Y = Y, X = X, h = NULL, h_method = "cv")
plot(fit2)              # Regression plot
plot(fit2, type = "ft") # Fourier transform plot
summary(fit2)


Bias bound approach for density estimation

Description

Estimates the density at a given point or across a range, and provides visualization options for density, bias, and confidence intervals.

Usage

biasBound_density(
  X,
  x = NULL,
  h = NULL,
  h_method = "cv",
  alpha = 0.05,
  resol = 100,
  xi_lb = NULL,
  xi_ub = NULL,
  methods_get_xi = "snr",
  noise_floor = "auto",
  envelope_use_Y = TRUE,
  integer_r = TRUE,
  ora_Ar = NULL,
  kernel.fun = "Schennach2004",
  if_approx_kernel = TRUE,
  kernel.resol = 1000
)

Arguments

X

A numerical vector of sample data.

x

Optional. A scalar or range of points where the density is estimated. If NULL, a range is automatically generated.

h

A scalar bandwidth parameter. If NULL, the bandwidth is automatically selected using the method specified in 'h_method'.

h_method

Method for automatic bandwidth selection when h is NULL. Options are "cv" (cross-validation) and "silverman" (Silverman's rule of thumb). Default is "cv".

alpha

Confidence level for intervals. Default is 0.05.

resol

Resolution for the estimation range. Default is 100.

xi_lb

Optional. Lower bound for the interval of Fourier Transform frequency xi. Used for determining the range over which A and r is estimated. If NULL, it is automatically determined based on the methods_get_xi.

xi_ub

Optional. Upper bound for the interval of Fourier Transform frequency xi. Similar to xi_lb, it defines the upper range for A and r estimation. If NULL, the upper bound is determined based on the methods_get_xi.

methods_get_xi

A string selecting the frequency-window rule used when xi_lb/xi_ub are NULL: "snr" (default; a signal-to-noise cutoff that selects a valid window at realistic sample sizes), "Schennach" (the data-driven rule of Schennach 2020, Theorem 2), or "Schennach_loose" (the initial, un-refined interval).

noise_floor

Noise-floor form for the Schennach test: "auto" (default), "compact", or "general".

envelope_use_Y

If TRUE (default), fit the regression envelope to the cross-spectrum ⁠|phi_YX|⁠; if FALSE, fit it to the marginal spectrum ⁠|phi_X|⁠.

integer_r

If TRUE (default), clamp the fitted envelope slope up to r = 2 when it falls below the minimum smoothness assumed by Schennach (2020, Definition 2), i.e. r < 2, and refit A; this keeps the bias-bound integral finite. Slopes >= 2 are left unchanged.

ora_Ar

Optional list of oracle values for A and r (for research/comparison purposes).

kernel.fun

A string specifying the kernel function to be used. Options are "Schennach2004", "sinc", "normal", "epanechnikov".

if_approx_kernel

Logical. If TRUE, uses approximations for the kernel function.

kernel.resol

The resolution for kernel function approximation. See fun_approx.

Value

An object of class bbnp_density with components:

density

Density estimates (for range estimation)

x

Evaluation points

estimate

Point estimate (for single x)

conf_int

List containing lower, upper bounds and conf_level

bias_bound

List containing b1x, est_A, est_r, xi_interval

std_error

Standard errors

call

The function call

bandwidth

Bandwidth used

n

Sample size

kernel

Kernel type

data

Original data

Use plot(), summary(), coef(), and confint() methods to work with the result.

Examples


# Example 1: Point estimation at x = 1
X <- rnorm(100)
fit <- biasBound_density(X = X, x = 1, h = 0.09)
print(fit)
coef(fit)

# Example 2: Range estimation with automatic bandwidth
fit2 <- biasBound_density(X = X, h = NULL, h_method = "cv")
plot(fit2)           # Density plot
plot(fit2, type = "ft")  # Fourier transform plot
summary(fit2)


Extract Coefficients from bbnp_density Object

Description

Extracts the estimated bias bound parameters and bandwidth

Usage

## S3 method for class 'bbnp_density'
coef(object, ...)

Arguments

object

An object of class bbnp_density

...

Additional arguments (unused)

Value

Named numeric vector with A (amplitude), r (decay rate), and h (bandwidth)

Examples


X <- rnorm(100)
fit <- biasBound_density(X, h = 0.1)
coef(fit)


Extract Coefficients from bbnp_regression Object

Description

Extracts the estimated bias bound parameters and bandwidth

Usage

## S3 method for class 'bbnp_regression'
coef(object, ...)

Arguments

object

An object of class bbnp_regression

...

Additional arguments (unused)

Value

Named numeric vector with A (amplitude), r (decay rate), B (Y bound), and h (bandwidth)

Examples


X <- rnorm(100)
Y <- X^2 + rnorm(100)
fit <- biasBound_condExpectation(Y, X, h = 0.1)
coef(fit)


Extract Confidence Intervals from bbnp_density Object

Description

Extracts confidence intervals for density estimates

Usage

## S3 method for class 'bbnp_density'
confint(object, parm = NULL, level = 0.95, ...)

Arguments

object

An object of class bbnp_density

parm

Not used (included for S3 generic compatibility)

level

Confidence level (default: 0.95). Note: this parameter is not used as the confidence level is fixed at object creation time.

...

Additional arguments (unused)

Value

For range estimation: a matrix with columns "lower" and "upper" For point estimation: a named vector with elements "lower" and "upper"

Examples


X <- rnorm(100)
fit <- biasBound_density(X, h = 0.1)
confint(fit)


Extract Confidence Intervals from bbnp_regression Object

Description

Extracts confidence intervals for conditional expectation estimates

Usage

## S3 method for class 'bbnp_regression'
confint(object, parm = NULL, level = 0.95, ...)

Arguments

object

An object of class bbnp_regression

parm

Not used (included for S3 generic compatibility)

level

Confidence level (default: 0.95). Note: this parameter is not used as the confidence level is fixed at object creation time.

...

Additional arguments (unused)

Value

For range estimation: a matrix with columns "lower" and "upper" For point estimation: a named vector with elements "lower" and "upper"

Examples


X <- rnorm(100)
Y <- X^2 + rnorm(100)
fit <- biasBound_condExpectation(Y, X, h = 0.1)
confint(fit)


Create a configuration object for bias bound estimations

Description

Create a configuration object for bias bound estimations

Usage

create_biasBound_config(
  X,
  Y = NULL,
  h = NULL,
  h_method = "cv",
  use_fft = TRUE,
  alpha = 0.05,
  resol = 100,
  xi_lb = NULL,
  xi_ub = NULL,
  methods_get_xi = "snr",
  noise_floor = "auto",
  envelope_use_Y = TRUE,
  integer_r = TRUE,
  kernel.fun = "Schennach2004",
  if_approx_kernel = TRUE,
  kernel.resol = 1000
)

Arguments

X

A numerical vector of sample data.

Y

Optional. A numerical vector of sample data for conditional expectation.

h

A scalar bandwidth parameter. If NULL, the bandwidth is automatically selected using the method specified in 'h_method'.

h_method

Method for automatic bandwidth selection when h is NULL. Options are "cv" (cross-validation) and "silverman" (Silverman's rule of thumb). Default is "cv".

use_fft

Ignored. Maintained for backward compatibility.

alpha

Confidence level for intervals.

resol

Resolution for the estimation range.

xi_lb

Lower bound for the interval of Fourier Transform frequency.

xi_ub

Upper bound for the interval of Fourier Transform frequency.

methods_get_xi

Method to determine the xi interval: "snr" (default), "Schennach", or "Schennach_loose".

noise_floor

Noise-floor form for the Schennach test: "auto" (default), "compact", or "general".

envelope_use_Y

If TRUE (default), fit the regression envelope to the cross-spectrum ⁠|phi_YX|⁠; if FALSE, fit it to the marginal spectrum ⁠|phi_X|⁠.

integer_r

If TRUE (default), when the fitted envelope slope falls below the minimum smoothness assumed by Schennach (2020, Definition 2), i.e. r < 2, clamp it up to r = 2 and refit A; this keeps the bias-bound integral finite. Slopes >= 2 are left unchanged.

kernel.fun

Kernel function to be used. Options include "normal", "epanechnikov", "Schennach2004", and "sinc".

if_approx_kernel

Use approximations for the kernel function.

kernel.resol

Resolution for kernel approximation.

Value

A configuration object (list) with all parameters


Create kernel functions based on configuration

Description

Create kernel functions based on configuration

Usage

create_kernel_functions(
  kernel.fun = "Schennach2004",
  if_approx_kernel = TRUE,
  kernel.resol = 1000
)

Arguments

kernel.fun

A string specifying the kernel function to be used.

if_approx_kernel

Logical. If TRUE, uses approximations for the kernel function.

kernel.resol

The resolution for kernel function approximation.

Value

A list containing kernel function, its Fourier transform, and the kernel type


Cross-Validation for Bandwidth Selection

Description

Implements least-squares cross-validation for bandwidth selection with any kernel function. Uses FFT-based algorithm for n >= 100 (fast, O(m log m) complexity) and exact computation for n < 100 (accurate). The FFT method bins data onto a regular grid and computes the LSCV objective via convolution in the frequency domain.

Usage

cv_bandwidth(
  X,
  h_grid = NULL,
  kernel_func,
  kernel_type = "normal",
  grid_size = 512
)

Arguments

X

A numerical vector of sample data.

h_grid

A numerical vector of bandwidth values to evaluate. If NULL (default), a grid is automatically generated based on the range and distribution of the data.

kernel_func

The kernel function to use for cross-validation.

kernel_type

A string identifying the kernel type, used only for reference bandwidth.

grid_size

Number of grid points for FFT-based evaluation (used when n >= 100). Default is 512. Larger values increase accuracy but reduce speed. Automatically rounded up to the next power of 2.

Value

A scalar representing the optimal bandwidth that minimizes the cross-validation score.

Examples

# Generate sample data
X <- rnorm(100)
# Get optimal bandwidth using cross-validation with a normal kernel
kernel_functions <- create_kernel_functions("normal")
h_opt <- cv_bandwidth(X, kernel_func = kernel_functions$kernel,
                     kernel_type = kernel_functions$kernel_type)

Epanechnikov Kernel

Description

Epanechnikov Kernel

Usage

epanechnikov_kernel(u)

Arguments

u

A numerical value or vector representing the input to the kernel function.

Value

Returns the value of the Epanechnikov kernel function at the given input.


Fourier Transform Epanechnikov Kernel

Description

Fourier Transform Epanechnikov Kernel

Usage

epanechnikov_kernel_ft(xi)

Arguments

xi

A numerical value or vector representing the frequency domain.

Value

Returns the value of the Fourier transform of the Epanechnikov kernel at the given frequency/frequencies.


Extract Fitted Values from bbnp_regression Object

Description

Extracts the fitted conditional expectation values E[Y|X=x]

Usage

## S3 method for class 'bbnp_regression'
fitted(object, ...)

Arguments

object

An object of class bbnp_regression

...

Additional arguments (unused)

Value

Numeric vector of fitted values for range estimation, or a single numeric value for point estimation

Examples


X <- rnorm(100)
Y <- X^2 + rnorm(100)
fit <- biasBound_condExpectation(Y, X, h = 0.1)
fitted(fit)


Approximation Function for Intensive Calculations

Description

This function provides a lookup-based approximation for calculations that are computationally intensive. Once computed, it stores the results in an environment and uses linear interpolation for new data points to speed up subsequent computations.

Usage

fun_approx(u, u_lb = -100, u_ub = 100, resol = 10000, fun = W_kernel)

Arguments

u

A vector of values where the function should be evaluated.

u_lb

Lower bound for the precomputed range. Defaults to -10.

u_ub

Upper bound for the precomputed range. Defaults to 10.

resol

The resolution or number of sample points in the precomputed range. Defaults to 1000.

fun

A function for which the approximation is computed. Defaults to the W function.

Details

The fun_approx function works by initially creating a lookup table of function values based on the range specified by u_lb and u_ub and the resolution resol. This precomputation only happens once for a given set of parameters (u_lb, u_ub, resol, and fun). Subsequent calls to fun_approx with the same parameters use the lookup table to find the closest precomputed points to the requested u values and then return an interpolated result.

Linear interpolation is used between the two closest precomputed points in the lookup table. This ensures a smooth approximation for values in between sample points.

This function is especially useful for computationally intensive functions where recalculating function values is expensive or time-consuming. By using a combination of precomputation and interpolation, fun_approx provides a balance between accuracy and speed.

Value

A vector of approximated function values corresponding to u.


Generate Sample Data

Description

This function used for generate some sample data for experiment

Usage

gen_sample_data(size, dgp, seed = NULL)

Arguments

size

control the sample size.

dgp

data generating process, have options "normal", "chisq", "mixed", "poly", "2_fold_uniform".

seed

random seed number.

Value

A numeric vector of length size. The elements of the vector are generated according to the specified dgp:

normal

Normally distributed values with mean 0 and standard deviation 2.

chisq

Chi-squared distributed values with df = 10.

mixed

Half normally distributed (mean 0, sd = 2) and half chi-squared distributed (df = 10) values.

poly

Values from a polynomial cumulative distribution function on [0,1].

2_fold_uniform

Sum of two uniformly distributed random numbers.

Examples

# Generate 500 samples from 2-fold uniform distribution
X <- gen_sample_data(500, "2_fold_uniform", seed = 123)


Kernel point estimation

Description

Computes the point estimate using the specified kernel function.

Usage

get_avg_f1x(X, x, h, inf_k)

Arguments

X

A numerical vector of sample data.

x

A scalar representing the point where the density is estimated.

h

A scalar bandwidth parameter.

inf_k

Kernel function used for the computation.

Value

A scalar representing the kernel density estimate at point x.


Kernel point estimation

Description

Computes the point estimate using the specified kernel function.

Usage

get_avg_fyx(Y, X, x, h, inf_k)

Arguments

Y

A numerical vector representing the sample data of variable Y.

X

A numerical vector representing the sample data of variable X.

x

A scalar representing the point where the density is estimated.

h

A scalar bandwidth parameter.

inf_k

Kernel function used for the computation.

Value

A scalar representing the kernel density estimate at point x.


Compute Sample Average of Fourier Transform Magnitude

Description

Compute Sample Average of Fourier Transform Magnitude

Usage

get_avg_phi(Y = 1, X, xi)

Arguments

Y

A numerical vector representing the sample data of variable Y.

X

A numerical vector representing the sample data of variable X.

xi

A single numerical value representing the frequency at which the Fourier transform is computed.

Value

Returns the sample estimation of expected Fourier transform at frequency xi.


Compute log sample average of fourier transform and get mod

Description

Compute log sample average of fourier transform and get mod

Usage

get_avg_phi_log(Y = 1, X, ln_xi)

Arguments

Y

A numerical vector representing the sample data of variable Y.

X

A numerical vector representing the sample data of variable X.

ln_xi

A single numerical value representing the log frequency at which the Fourier transform is computed.

Value

Returns the log sample estimation of expected Fourier transform at frequency xi.


get the conditional variance of Y on X for given x

Description

Vectorized implementation using matrix operations for improved performance. For large datasets (n > 10,000), memory usage is approximately 8n^2 bytes.

Usage

get_conditional_var(X, Y, x, h, kernel_func)

Arguments

X

A numerical vector representing the sample data of variable X.

Y

A numerical vector representing the sample data of variable Y.

x

The specific point at which the conditional variance is to be calculated.

h

A bandwidth parameter used in the kernel function for smoothing.

kernel_func

A kernel function used to weigh observations in the neighborhood of point x.

Details

Performance: Achieves >10x speedup for n=1,000 and >20x speedup for n=10,000 compared to the original loop-based implementation.

Memory: Allocates an n x n distance matrix. For n=10,000, requires ~800 MB. Issues a warning when estimated memory exceeds 4 GB and stops with an error when estimated memory exceeds 10 GB.

Value

Returns a non-negative scalar representing the estimated conditional variance of Y given X at the point x. Returns 0 if the computed variance is negative.


Original loop-based implementation of conditional variance (preserved for testing)

Description

Original loop-based implementation of conditional variance (preserved for testing)

Usage

get_conditional_var_loop(X, Y, x, h, kernel_func)

Arguments

X

A numerical vector representing the sample data of variable X.

Y

A numerical vector representing the sample data of variable Y.

x

The specific point at which the conditional variance is to be calculated.

h

A bandwidth parameter used in the kernel function for smoothing.

kernel_func

A kernel function used to weigh observations in the neighborhood of point x.

Value

Returns a non-negative scalar representing the estimated conditional variance of Y given X at the point x. Returns 0 if the computed variance is negative.


Estimate A and r parameters for Fourier envelope

Description

Estimates parameters A and r that bound the Fourier transform decay. Uses linear programming by default, with automatic fallback to grid search.

Usage

get_est_Ar(
  Y = 1,
  X,
  xi_interval,
  r_stepsize = 150,
  method = "auto",
  use_Y = FALSE,
  integer_r = FALSE
)

Arguments

Y

Numeric vector (default 1 for density estimation)

X

Numeric vector of sample data

xi_interval

List with xi_lb and xi_ub

r_stepsize

Grid resolution for fallback method (default 150). Only used when method="grid". Ignored for LP method.

method

Character: "lp" (default), "grid", or "auto"

Details

The LP-based method (default) is typically 1-5x faster than grid search, with speedup depending on sample size and Fourier transform computation cost. The grid search method is preserved for validation and as fallback if LP solver is unavailable.

The function finds the minimal-area envelope line in log-log space by minimizing the integral subject to envelope constraints on the Fourier transform decay.

Value

Named numeric vector with est_A and est_r


Estimate A and r parameters via grid search (LEGACY)

Description

This is the original grid search implementation, kept for fallback and validation.

Usage

get_est_Ar_grid(Y = 1, X, xi_interval, r_stepsize = 150, use_Y = FALSE)

Arguments

Y

A numerical vector representing the sample data of variable Y.

X

A numerical vector representing the sample data of variable X.

xi_interval

A list with elements xi_lb and xi_ub representing the lower and upper bounds of the frequency interval.

r_stepsize

An integer value representing the number of steps in the r range. This controls the granularity of the estimation. Higher values lead to finer granularity but increase computation time.

Details

The function internally defines a range for the natural logarithm of frequency values (ln_xi_range) and a range for the parameter r (r_range). It then defines an optimization function optim_ln_A to minimize the integral of a given function over the ln_xi_range. The actual estimation is done by finding the r and A value that minimizes the the area of the line \ln A - r \ln \xi under the constraint that the line should not go below the Fourier transform curve.

Value

A named vector with elements est_A and est_r representing the estimated values of A and r, respectively.


Estimate A and r parameters via linear programming

Description

Finds the minimal-area envelope line in log-log space using linear programming.

Usage

get_est_Ar_lp(Y = 1, X, xi_interval, r_max = 50, use_Y = FALSE)

Arguments

Y

Numeric vector (default 1 for density estimation)

X

Numeric vector of sample data

xi_interval

List with xi_lb and xi_ub

r_max

Maximum allowed value for r (default 50)

Value

Named numeric vector with est_A and est_r


get the estimation of B

Description

get the estimation of B

Usage

get_est_B(Y)

Arguments

Y

A numerical vector representing the sample data of variable Y.

Value

The mean of the absolute values of the elements in Y, representing the estimated value of B.


Estimation of bias b1x

Description

Computes the bias estimate for given parameters.

Usage

get_est_b1x(X, h, est_Ar, inf_k_ft, ...)

Arguments

X

A numerical vector representing the sample data of variable X.

h

A scalar bandwidth parameter.

est_Ar

A vector containing the estimated A and r parameters.

inf_k_ft

A kernel Fourier transform function.

...

Additional arguments passed to the quadgk integration function.

Value

A scalar representing the bias b1x estimate.


Estimation of bias byx

Description

Estimation of bias byx

Usage

get_est_byx(Y, X, ...)

Arguments

Y

A numerical vector representing the sample data of variable Y.

X

A numerical vector representing the sample data of variable X.

...

Additional arguments passed to other methods.

Value

A scalar representing the bias byx estimate.


get the estimation of Vy

Description

get the estimation of Vy

Usage

get_est_vy(Y)

Arguments

Y

A numerical vector representing the sample data of variable Y.


Estimation of sigma

Description

Computes the sigma estimate for given parameters.

Usage

get_sigma(X, x, h, inf_k)

Arguments

X

A numerical vector of sample data.

x

A scalar representing the point where the density is estimated.

h

A scalar bandwidth parameter.

inf_k

Kernel function used for the computation.

Value

A scalar representing the sigma estimate at point x. Returns 0 if the density estimate is negative.


Estimation of sigma_yx

Description

Estimation of sigma_yx

Usage

get_sigma_yx(Y, X, x, h, inf_k)

Arguments

Y

A numerical vector representing the sample data of variable Y.

X

A numerical vector representing the sample data of variable X.

x

The specific point at which sigma_yx is to be estimated.

h

A bandwidth parameter used in the kernel function for smoothing.

inf_k

A kernel function used to weigh observations in the neighborhood of point x.

Value

Returns a scalar representing the estimated value of sigma_yx at the point x. Returns 0 if either fyx or conditional variance is negative.


get xi interval

Description

get xi interval

Usage

get_xi_interval(
  Y = 1,
  X,
  methods = "snr",
  noise_floor = c("auto", "compact", "general")
)

Arguments

Y

A numerical vector representing the sample data of variable Y.

X

A numerical vector representing the sample data of variable X.

methods

A character string: "snr" (default; a signal-to-noise cutoff that selects a valid frequency window at realistic sample sizes), "Schennach" (the data-driven rule of Schennach 2020, Theorem 2), or "Schennach_loose" (the initial, un-refined interval).

noise_floor

Noise-floor form for the feasibility test: "auto" (default; general for non-constant Y, compact otherwise), "compact", or "general". See .noise_floor_avg_dphi().

Details

The "Schennach" method computes the xi interval by performing a test based on the Schennach's theorem, adjusting the upper bound xi_ub if the test condition is met. The "Schennach_loose" method provides a looser calculation of the xi interval without performing the Schennach's test. The "snr" method selects the upper cutoff from the pointwise signal-to-noise ratio of the empirical Fourier transform (see .snr_window()); unlike "Schennach" it always selects a valid cutoff at realistic sample sizes and never returns an empty window.

Value

A list containing the lower (xi_lb) and upper (xi_ub) bounds of the xi interval.


Kernel Regression function

Description

Kernel Regression function

Usage

kernel_reg(X, Y, x, h, kernel_func)

Arguments

X

A numerical vector representing the sample data of variable X.

Y

A numerical vector representing the sample data of variable Y.

x

The point at which the regression function is to be estimated.

h

A bandwidth parameter that determines the weight assigned to each observation in X.

kernel_func

A function that computes the weight of each observation based on its distance to x.

Value

Returns a scalar representing the estimated value of the regression function at the point x.


S3 Constructor for bbnp_density Objects

Description

Creates a bbnp_density S3 object with validation

Usage

new_bbnp_density(
  density = NULL,
  x = NULL,
  estimate = NULL,
  conf_int = list(),
  bias_bound = list(),
  std_error = NULL,
  call = NULL,
  bandwidth = NULL,
  n = NULL,
  kernel = NULL,
  data = list(),
  internals = list()
)

Arguments

density

Numeric vector of density estimates (for range estimation)

x

Numeric vector of evaluation points

estimate

Numeric scalar point estimate (for point estimation)

conf_int

List containing lower, upper, and conf_level

bias_bound

List containing b1x, est_A, est_r, xi_interval

std_error

Numeric vector of standard errors

call

The original function call

bandwidth

Numeric scalar bandwidth used

n

Integer sample size

kernel

Character string kernel type

data

List containing original data (X)

internals

List containing internal objects (config, kernel_functions)

Value

An object of class c("bbnp_density", "bbnp")


S3 Constructor for bbnp_regression Objects

Description

Creates a bbnp_regression S3 object with validation

Usage

new_bbnp_regression(
  fitted_values = NULL,
  x = NULL,
  estimate = NULL,
  conf_int = list(),
  bias_bound = list(),
  std_error = NULL,
  marginal_density = NULL,
  joint_density = NULL,
  call = NULL,
  bandwidth = NULL,
  n = NULL,
  kernel = NULL,
  data = list(),
  internals = list()
)

Arguments

fitted_values

Numeric vector of E[Y|X=x] estimates

x

Numeric vector of evaluation points

estimate

Numeric scalar point estimate (for point estimation)

conf_int

List containing lower, upper, and conf_level

bias_bound

List containing b1x, byx, est_A, est_r, est_B, xi_interval

std_error

Numeric vector of standard errors

marginal_density

Numeric vector of f(x) estimates

joint_density

Numeric vector of f_YX estimates

call

The original function call

bandwidth

Numeric scalar bandwidth used

n

Integer sample size

kernel

Character string kernel type

data

List containing original data (X, Y)

internals

List containing internal objects (config, kernel_functions)

Value

An object of class c("bbnp_regression", "bbnp")


Normal Kernel Function

Description

Normal Kernel Function

Usage

normal_kernel(u)

Arguments

u

A numerical value or vector representing the input to the kernel function.

Value

Returns the value of the Normal kernel function at the given input.


Fourier Transform of Normal Kernel

Description

Fourier Transform of Normal Kernel

Usage

normal_kernel_ft(xi)

Arguments

xi

A numerical value or vector representing the frequency domain.

Value

Returns the value of the Fourier transform of the Normal kernel at the given frequency/frequencies.


Plot Method for bbnp_density Objects

Description

Creates visualizations of bias-bounded density estimation results

Usage

## S3 method for class 'bbnp_density'
plot(
  x,
  type = c("density", "ft"),
  fill_ci = .bbnp_pal[["ci"]],
  fill_bias = .bbnp_pal[["bias"]],
  alpha_ci = 0.55,
  alpha_bias = 0.55,
  ft_resol = 500,
  xi_range = NULL,
  expand = 2.5,
  ...
)

Arguments

x

An object of class bbnp_density

type

Character string specifying plot type. Options are:

  • "density" (default): Density estimate with bias bounds and confidence intervals

  • "ft": Fourier transform plot with estimated envelope

fill_ci

Color for confidence interval ribbon (default: a muted blue).

fill_bias

Color for bias bound ribbon (default: a muted terracotta).

alpha_ci

Transparency for confidence interval ribbon (default: 0.30)

alpha_bias

Transparency for bias bound ribbon (default: 0.45)

ft_resol

Resolution for Fourier transform plot (default: 500)

xi_range

Optional numeric c(lower, upper) giving the frequency range to display in the "ft" plot. If NULL (default) a wide range around the selected window is shown (see expand). This controls only what is drawn; it does not change the fitting window [xi_lb, xi_ub].

expand

For the "ft" plot when xi_range is NULL: how far past the selected window to display, as a multiple (default 2.5).

...

Additional arguments (unused)

Value

A ggplot2 object

Examples


X <- rnorm(100)
fit <- biasBound_density(X, h = 0.1)
plot(fit)
plot(fit, type = "ft")


Plot Method for bbnp_regression Objects

Description

Creates visualizations of bias-bounded conditional expectation estimation results

Usage

## S3 method for class 'bbnp_regression'
plot(
  x,
  type = c("regression", "ft"),
  fill_ci = .bbnp_pal[["ci"]],
  alpha_ci = 0.55,
  point_alpha = 0.28,
  point_color = .bbnp_pal[["point"]],
  ft_resol = 500,
  xi_range = NULL,
  expand = 2.5,
  ...
)

Arguments

x

An object of class bbnp_regression

type

Character string specifying plot type. Options are:

  • "regression" (default): Conditional expectation with confidence interval

  • "ft": Fourier transform plot with estimated envelope

fill_ci

Color for confidence interval ribbon (default: a muted blue).

alpha_ci

Transparency for confidence interval ribbon (default: 0.35)

point_alpha

Transparency for data points (default: 0.28)

point_color

Color for data points (default: a soft grey).

ft_resol

Resolution for Fourier transform plot (default: 500)

xi_range

Optional numeric c(lower, upper) giving the frequency range to display in the "ft" plot. If NULL (default) a wide range around the selected window is shown (see expand). This controls only what is drawn; it does not change the fitting window [xi_lb, xi_ub].

expand

For the "ft" plot when xi_range is NULL: how far past the selected window to display, as a multiple (default 2.5).

...

Additional arguments (unused)

Value

A ggplot2 object

Examples


X <- rnorm(100)
Y <- X^2 + rnorm(100)
fit <- biasBound_condExpectation(Y, X, h = 0.1)
plot(fit)
plot(fit, type = "ft")


Internal Helper: Plot Density Estimation

Description

Creates density plot with bias bounds and confidence intervals

Usage

plot_density_internal(x, fill_ci, fill_bias, alpha_ci, alpha_bias)

Arguments

x

An object of class bbnp_density

fill_ci

Color for confidence interval ribbon

fill_bias

Color for bias bound ribbon

alpha_ci

Transparency for confidence interval ribbon

alpha_bias

Transparency for bias bound ribbon

Value

A ggplot2 object


Plot the Fourier Transform (Deprecated)

Description

Deprecated: This function is deprecated and will be removed in a future version.

Use plot(fit, type = "ft") instead, where fit is a bbnp_density or bbnp_regression object.

Usage

plot_ft(X, xi_interval, ft_plot.resol = 500)

Arguments

X

A numerical vector of sample data.

xi_interval

A list containing the lower (xi_lb) and upper (xi_ub) bounds of the xi interval.

ft_plot.resol

An integer representing the resolution of the plot, specifically the number of points used to represent the Fourier transform. Defaults to 500.

Details

C = 1, the parameter in O(1/n^{0.25}), see more details in in Schennach (2020).

Value

A ggplot object representing the plot of the Fourier transform.

Examples

## Not run: 
# Old (deprecated):
# plot_ft(sample_data$X, xi_interval = list(xi_lb = 1, xi_ub = 50))

# New (recommended):
fit <- biasBound_density(sample_data$X, h = 0.1)
plot(fit, type = "ft")

## End(Not run)

Internal Helper: Plot Fourier Transform

Description

Creates a Fourier transform plot over a wide frequency range with the rule-selected fitting window shaded and the estimated envelope overlaid. For regression objects the cross-spectrum ⁠|phi_YX|⁠ is shown (matching the fitted envelope); for density objects the marginal ⁠|phi_X|⁠ is shown.

Usage

plot_ft_internal(x, ft_resol, xi_range = NULL, expand = 2.5)

Arguments

x

An object of class bbnp_density or bbnp_regression

ft_resol

Resolution for Fourier transform plot

xi_range

Optional display range c(lower, upper); NULL = wide auto range

expand

How far past the window to display when xi_range is NULL

Value

A ggplot2 object


Internal Helper: Plot Conditional Expectation

Description

Creates regression plot with confidence interval and data points

Usage

plot_regression_internal(x, fill_ci, alpha_ci, point_alpha, point_color)

Arguments

x

An object of class bbnp_regression

fill_ci

Color for confidence interval ribbon

alpha_ci

Transparency for confidence interval ribbon

point_alpha

Transparency for data points

point_color

Color for data points

Value

A ggplot2 object


Print Method for bbnp_density Objects

Description

Print Method for bbnp_density Objects

Usage

## S3 method for class 'bbnp_density'
print(x, digits = 4, ...)

Arguments

x

An object of class bbnp_density

digits

Number of digits to display (default: 4)

...

Additional arguments (unused)

Value

Invisibly returns the input object


Print Method for bbnp_regression Objects

Description

Print Method for bbnp_regression Objects

Usage

## S3 method for class 'bbnp_regression'
print(x, digits = 4, ...)

Arguments

x

An object of class bbnp_regression

digits

Number of digits to display (default: 4)

...

Additional arguments (unused)

Value

Invisibly returns the input object


Print Method for summary.bbnp_density Objects

Description

Print Method for summary.bbnp_density Objects

Usage

## S3 method for class 'summary.bbnp_density'
print(x, digits = 4, ...)

Arguments

x

An object of class summary.bbnp_density

digits

Number of digits to display (default: 4)

...

Additional arguments (unused)

Value

Invisibly returns the input object


Print Method for summary.bbnp_regression Objects

Description

Print Method for summary.bbnp_regression Objects

Usage

## S3 method for class 'summary.bbnp_regression'
print(x, digits = 4, ...)

Arguments

x

An object of class summary.bbnp_regression

digits

Number of digits to display (default: 4)

...

Additional arguments (unused)

Value

Invisibly returns the input object


Generate n samples from the distribution

Description

Generate n samples from the distribution

Usage

rpoly01(n, k = 5)

Arguments

n

The number of samples to generate.

k

The exponent in the distribution function, defaults to 5.

Value

A vector of n samples from the specified polynomial distribution.

CDF: f(x) = (x-1)^k + 1


Sample Data

Description

Sample Data

Usage

sample_data

Format

A data frame with 1000 rows and 2 variables:

X

Numeric vector, generated from 2 fold uniform distribution.

Y

Numeric vector, Y = -X^2 + 3*X + rnorm(1000)*X.


Select Optimal Bandwidth

Description

Selects an optimal bandwidth using the specified method.

Usage

select_bandwidth(
  X,
  Y = NULL,
  method = "cv",
  kernel.fun = "normal",
  if_approx_kernel = TRUE,
  kernel.resol = 1000
)

Arguments

X

A numerical vector of sample data.

Y

Optional. A numerical vector of sample data for conditional expectation estimation.

method

A string specifying the bandwidth selection method. Options are "cv" for cross-validation and "silverman" for Silverman's rule of thumb. Defaults to "cv".

kernel.fun

A string specifying the kernel type. Options include "normal", "epanechnikov", "Schennach2004", and "sinc".

if_approx_kernel

Logical. If TRUE, uses approximations for the kernel function.

kernel.resol

The resolution for kernel function approximation.

Value

A scalar representing the optimal bandwidth.

Examples

# Generate sample data
X <- rnorm(100)
# Get optimal bandwidth using cross-validation with normal kernel
h_opt <- select_bandwidth(X, method = "cv", kernel.fun = "normal")
# Get optimal bandwidth using Silverman's rule with Schennach kernel
h_opt <- select_bandwidth(X, method = "silverman", kernel.fun = "Schennach2004")

Silverman's Rule of Thumb for Bandwidth Selection

Description

Implements Silverman's rule of thumb for selecting an optimal bandwidth in kernel density estimation.

Usage

silverman_bandwidth(X, kernel_type = "normal")

Arguments

X

A numerical vector of sample data.

kernel_type

A string identifying the kernel type.

Value

A scalar representing the optimal bandwidth.

Examples

# Generate sample data
X <- rnorm(100)
# Get optimal bandwidth using Silverman's rule
h_opt <- silverman_bandwidth(X, kernel_type = "normal")

Infinite Kernel Function

Description

Infinite Kernel Function

Usage

sinc(u)

Arguments

u

A numerical value or vector where the sinc function is evaluated.

Value

The value of the sinc function at each point in u.


Define the closed form FT of the infinite order kernel sin(x)/(pi*x)

Description

Define the closed form FT of the infinite order kernel sin(x)/(pi*x)

Usage

sinc_ft(x)

Arguments

x

A numerical value or vector where the Fourier Transform is evaluated.

Value

The value of the Fourier Transform of the sinc function at each point in x.


Summary Method for bbnp_density Objects

Description

Summary Method for bbnp_density Objects

Usage

## S3 method for class 'bbnp_density'
summary(object, ...)

Arguments

object

An object of class bbnp_density

...

Additional arguments (unused)

Value

An object of class summary.bbnp_density


Summary Method for bbnp_regression Objects

Description

Summary Method for bbnp_regression Objects

Usage

## S3 method for class 'bbnp_regression'
summary(object, ...)

Arguments

object

An object of class bbnp_regression

...

Additional arguments (unused)

Value

An object of class summary.bbnp_regression


True density of 2-fold uniform distribution

Description

True density of 2-fold uniform distribution

Usage

true_density_2fold(x)

Arguments

x

A numerical value or vector where the true density function is evaluated.

Value

The value of the true density of the 2-fold uniform distribution at each point in x.