| Type: | Package |
| Title: | A Bias Bound Approach to Non-Parametric Inference |
| Version: | 1.1.0 |
| Description: | A novel bias-bound approach for non-parametric inference is introduced, focusing on both density and conditional expectation estimation. It constructs valid confidence intervals that account for the presence of a non-negligible bias and thus make it possible to perform inference with optimal mean squared error minimizing bandwidths. This package is based on Schennach (2020) <doi:10.1093/restud/rdz065>. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| LazyData: | true |
| URL: | https://xinyudai.net/rbbnp-dev/ |
| Imports: | dplyr, ggplot2 (≥ 3.4.0), gridExtra, pracma, purrr, Rglpk, tidyr |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 3.5) |
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-06-07 05:53:15 UTC; clawdbot |
| Author: | Xinyu DAI [aut, cre], Susanne M Schennach [aut] |
| Maintainer: | Xinyu DAI <xinyu_dai@brown.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-09 02:30:02 UTC |
rbbnp: Bias-Bounded Nonparametric Inference
Description
The rbbnp package provides functions for bias-bounded nonparametric density and conditional expectation estimation based on the methodology of Schennach (2020).
Main Functions
The package provides two main estimation functions:
-
biasBound_density: Bias-bounded density estimation -
biasBound_condExpectation: Bias-bounded conditional expectation estimation
S3 Object System
Both main functions return S3 objects with the following classes:
-
bbnp_density: Returned bybiasBound_density() -
bbnp_regression: Returned bybiasBound_condExpectation()
S3 Methods
The following generic methods are available for working with estimation results:
print()Concise summary of estimation results
summary()Detailed statistics and parameter estimates
plot()Visualization of results (density/regression plots, FT plots)
coef()Extract bias bound parameters (A, r, h) and B for regression
fitted()Extract fitted conditional expectation values (regression only)
confint()Extract confidence intervals
Usage Example
# Density estimation workflow X <- rnorm(100) fit <- biasBound_density(X, h = 0.1) print(fit) summary(fit) plot(fit) # Density plot plot(fit, type = "ft") # Fourier transform plot coef(fit) # Bias bound parameters confint(fit) # Confidence intervals # Regression estimation workflow Y <- X^2 + rnorm(100) fit_reg <- biasBound_condExpectation(Y, X, h = 0.1) plot(fit_reg) # Regression plot fitted(fit_reg) # Fitted values
Author(s)
Maintainer: Xinyu DAI xinyu_dai@brown.edu
Authors:
Susanne M Schennach
References
Schennach, S. M. (2020). "Long Story Short: Omitted Variable Bias in Causal Machine Learning." NBER Working Paper.
See Also
Useful links:
Check if LP solver is available
Description
Check if LP solver is available
Usage
.has_lp_solver()
Value
Logical indicating if Rglpk is available
Uniform Fourier-transform error bound Delta-phi (Schennach 2020, Theorem 1)
Description
Uniform Fourier-transform error bound Delta-phi (Schennach 2020, Theorem 1)
Usage
.noise_floor_avg_dphi(Y, n, noise_floor = "compact")
Arguments
Y |
Sample vector (or 1 for density). |
n |
Sample size. |
noise_floor |
"compact" (the |
Prepare LP problem matrices for envelope optimization
Description
Prepare LP problem matrices for envelope optimization
Usage
.prepare_lp_problem(ln_xi_range, avg_phi_log_values, r_max = 50)
Arguments
ln_xi_range |
Numeric vector of log-frequencies |
avg_phi_log_values |
Numeric vector of log-Fourier-transform values |
r_max |
Maximum allowed value for r parameter (default 50) |
Value
List with components: obj (objective coefficients), mat (constraint matrix), rhs (RHS vector), bounds (variable bounds)
Signal-to-noise frequency-window selection rule
Description
Selects the upper frequency cutoff as the point where the pointwise signal-to-noise ratio of
the empirical Fourier transform first drops (on a sustained run) below tau. Using the
pointwise (delta-method) standard error in place of Schennach (2020)'s worst-case uniform
error bound keeps the rule usable at realistic sample sizes, and a minimum-window guard
ensures it never returns an empty interval.
Usage
.snr_window(Y = 1, X, tau = NULL, run = 5L, per_log = 200)
Arguments
Y |
Numeric vector (or 1 for density). |
X |
Numeric vector of sample data. |
tau |
SNR threshold: |
run |
Minimum consecutive below-threshold points for a sustained crossing. |
per_log |
Grid resolution (points per log-unit of frequency). |
Value
A list with xi_lb and xi_ub.
Solve LP problem for envelope optimization
Description
Solve LP problem for envelope optimization
Usage
.solve_lp_envelope(lp_problem)
Arguments
lp_problem |
List from .prepare_lp_problem() |
Value
List with status, solution vector, and objective value
The Path to the Data Folder
Description
This variable provides the path to the data folder within the package.
Value
The path to the package's internal data folder as a character string.
The Path to the External Data Folder for Non-R Data Files
Description
This variable provides the path to the extdata folder within the package,
where non-standard R data files are stored.
Value
The path to the package's external data folder (for non-standard R data files) as a character string.
Define the inverse Fourier transform function of W
Description
Define the inverse Fourier transform function of W
Usage
W_kernel(u, L = 10)
Arguments
u |
A numerical value or vector representing the time or space domain. |
L |
The limit for numerical integration, defines the range of integration as |
Value
A numerical value or vector representing the inverse Fourier transform of the infinite order kernel at the given time or space point(s).
Define the Fourier transform of a infinite kernel proposed in Schennach 2004
Description
Define the Fourier transform of a infinite kernel proposed in Schennach 2004
Usage
W_kernel_ft(xi, xi_lb = 0.5, xi_ub = 1.5)
Arguments
xi |
A numerical value or vector representing the frequency domain. |
xi_lb |
The lower bound for the frequency domain. Defaults to 0.5. |
xi_ub |
The upper bound for the frequency domain. Defaults to 1.5. |
Value
A numerical value or vector representing the Fourier transform of the infinite order kernel at the given frequency/frequencies.
Bias bound approach for conditional expectation estimation
Description
Estimates the density at a given point or across a range, and provides visualization options for density, bias, and confidence intervals.
Usage
biasBound_condExpectation(
Y,
X,
x = NULL,
h = NULL,
h_method = "cv",
alpha = 0.05,
est_Ar = NULL,
resol = 100,
xi_lb = NULL,
xi_ub = NULL,
methods_get_xi = "snr",
noise_floor = "auto",
envelope_use_Y = TRUE,
integer_r = TRUE,
ora_Ar = NULL,
kernel.fun = "Schennach2004",
if_approx_kernel = TRUE,
kernel.resol = 1000
)
Arguments
Y |
A numerical vector of sample data. |
X |
A numerical vector of sample data. |
x |
Optional. A scalar or range of points where the density is estimated. If NULL, a range is automatically generated. |
h |
A scalar bandwidth parameter. If NULL, the bandwidth is automatically selected using the method specified in 'h_method'. |
h_method |
Method for automatic bandwidth selection when h is NULL. Options are "cv" (cross-validation) and "silverman" (Silverman's rule of thumb). Default is "cv". |
alpha |
Confidence level for intervals. Default is 0.05. |
est_Ar |
Optional list of estimates for A and r. If NULL, they are computed using |
resol |
Resolution for the estimation range. Default is 100. |
xi_lb |
Optional. Lower bound for the interval of Fourier Transform frequency xi. Used for determining the range over which A and r is estimated. If NULL, it is automatically determined based on the methods_get_xi. |
xi_ub |
Optional. Upper bound for the interval of Fourier Transform frequency xi. Similar to xi_lb, it defines the upper range for A and r estimation. If NULL, the upper bound is determined based on the methods_get_xi. |
methods_get_xi |
A string selecting the frequency-window rule used when xi_lb/xi_ub are NULL: "snr" (default; a signal-to-noise cutoff that selects a valid window at realistic sample sizes), "Schennach" (the data-driven rule of Schennach 2020, Theorem 2), or "Schennach_loose" (the initial, un-refined interval). |
noise_floor |
Noise-floor form for the Schennach test: "auto" (default), "compact", or "general". |
envelope_use_Y |
If TRUE (default), fit the regression envelope to the cross-spectrum |
integer_r |
If TRUE (default), clamp the fitted envelope slope up to r = 2 when it falls below the minimum smoothness assumed by Schennach (2020, Definition 2), i.e. r < 2, and refit A; this keeps the bias-bound integral finite. Slopes >= 2 are left unchanged. |
ora_Ar |
Optional list of oracle values for A and r (for research/comparison purposes). |
kernel.fun |
A string specifying the kernel function to be used. Options are "Schennach2004", "sinc", "normal", "epanechnikov". |
if_approx_kernel |
Logical. If TRUE, uses approximations for the kernel function. |
kernel.resol |
The resolution for kernel function approximation. See |
Value
An object of class bbnp_regression with components:
fitted_values |
|
x |
Evaluation points |
estimate |
Point estimate (for single x) |
conf_int |
List containing lower, upper bounds and conf_level. Note that
the confidence interval can be unbounded (i.e., contain |
bias_bound |
List containing b1x, byx, est_A, est_r, est_B, xi_interval |
std_error |
Standard errors |
marginal_density |
f(x) estimates |
joint_density |
f_YX estimates |
call |
The function call |
bandwidth |
Bandwidth used |
n |
Sample size |
kernel |
Kernel type |
data |
Original data (X, Y) |
Use plot(), summary(), coef(), fitted(), and confint() methods to work with the result.
Examples
# Example 1: Point estimation at x = 1
X <- rnorm(100)
Y <- X^2 + rnorm(100)
fit <- biasBound_condExpectation(Y = Y, X = X, x = 1, h = 0.09)
print(fit)
fitted(fit)
# Example 2: Range estimation with plots
fit2 <- biasBound_condExpectation(Y = Y, X = X, h = NULL, h_method = "cv")
plot(fit2) # Regression plot
plot(fit2, type = "ft") # Fourier transform plot
summary(fit2)
Bias bound approach for density estimation
Description
Estimates the density at a given point or across a range, and provides visualization options for density, bias, and confidence intervals.
Usage
biasBound_density(
X,
x = NULL,
h = NULL,
h_method = "cv",
alpha = 0.05,
resol = 100,
xi_lb = NULL,
xi_ub = NULL,
methods_get_xi = "snr",
noise_floor = "auto",
envelope_use_Y = TRUE,
integer_r = TRUE,
ora_Ar = NULL,
kernel.fun = "Schennach2004",
if_approx_kernel = TRUE,
kernel.resol = 1000
)
Arguments
X |
A numerical vector of sample data. |
x |
Optional. A scalar or range of points where the density is estimated. If NULL, a range is automatically generated. |
h |
A scalar bandwidth parameter. If NULL, the bandwidth is automatically selected using the method specified in 'h_method'. |
h_method |
Method for automatic bandwidth selection when h is NULL. Options are "cv" (cross-validation) and "silverman" (Silverman's rule of thumb). Default is "cv". |
alpha |
Confidence level for intervals. Default is 0.05. |
resol |
Resolution for the estimation range. Default is 100. |
xi_lb |
Optional. Lower bound for the interval of Fourier Transform frequency xi. Used for determining the range over which A and r is estimated. If NULL, it is automatically determined based on the methods_get_xi. |
xi_ub |
Optional. Upper bound for the interval of Fourier Transform frequency xi. Similar to xi_lb, it defines the upper range for A and r estimation. If NULL, the upper bound is determined based on the methods_get_xi. |
methods_get_xi |
A string selecting the frequency-window rule used when xi_lb/xi_ub are NULL: "snr" (default; a signal-to-noise cutoff that selects a valid window at realistic sample sizes), "Schennach" (the data-driven rule of Schennach 2020, Theorem 2), or "Schennach_loose" (the initial, un-refined interval). |
noise_floor |
Noise-floor form for the Schennach test: "auto" (default), "compact", or "general". |
envelope_use_Y |
If TRUE (default), fit the regression envelope to the cross-spectrum |
integer_r |
If TRUE (default), clamp the fitted envelope slope up to r = 2 when it falls below the minimum smoothness assumed by Schennach (2020, Definition 2), i.e. r < 2, and refit A; this keeps the bias-bound integral finite. Slopes >= 2 are left unchanged. |
ora_Ar |
Optional list of oracle values for A and r (for research/comparison purposes). |
kernel.fun |
A string specifying the kernel function to be used. Options are "Schennach2004", "sinc", "normal", "epanechnikov". |
if_approx_kernel |
Logical. If TRUE, uses approximations for the kernel function. |
kernel.resol |
The resolution for kernel function approximation. See |
Value
An object of class bbnp_density with components:
density |
Density estimates (for range estimation) |
x |
Evaluation points |
estimate |
Point estimate (for single x) |
conf_int |
List containing lower, upper bounds and conf_level |
bias_bound |
List containing b1x, est_A, est_r, xi_interval |
std_error |
Standard errors |
call |
The function call |
bandwidth |
Bandwidth used |
n |
Sample size |
kernel |
Kernel type |
data |
Original data |
Use plot(), summary(), coef(), and confint() methods to work with the result.
Examples
# Example 1: Point estimation at x = 1
X <- rnorm(100)
fit <- biasBound_density(X = X, x = 1, h = 0.09)
print(fit)
coef(fit)
# Example 2: Range estimation with automatic bandwidth
fit2 <- biasBound_density(X = X, h = NULL, h_method = "cv")
plot(fit2) # Density plot
plot(fit2, type = "ft") # Fourier transform plot
summary(fit2)
Extract Coefficients from bbnp_density Object
Description
Extracts the estimated bias bound parameters and bandwidth
Usage
## S3 method for class 'bbnp_density'
coef(object, ...)
Arguments
object |
An object of class bbnp_density |
... |
Additional arguments (unused) |
Value
Named numeric vector with A (amplitude), r (decay rate), and h (bandwidth)
Examples
X <- rnorm(100)
fit <- biasBound_density(X, h = 0.1)
coef(fit)
Extract Coefficients from bbnp_regression Object
Description
Extracts the estimated bias bound parameters and bandwidth
Usage
## S3 method for class 'bbnp_regression'
coef(object, ...)
Arguments
object |
An object of class bbnp_regression |
... |
Additional arguments (unused) |
Value
Named numeric vector with A (amplitude), r (decay rate), B (Y bound), and h (bandwidth)
Examples
X <- rnorm(100)
Y <- X^2 + rnorm(100)
fit <- biasBound_condExpectation(Y, X, h = 0.1)
coef(fit)
Extract Confidence Intervals from bbnp_density Object
Description
Extracts confidence intervals for density estimates
Usage
## S3 method for class 'bbnp_density'
confint(object, parm = NULL, level = 0.95, ...)
Arguments
object |
An object of class bbnp_density |
parm |
Not used (included for S3 generic compatibility) |
level |
Confidence level (default: 0.95). Note: this parameter is not used as the confidence level is fixed at object creation time. |
... |
Additional arguments (unused) |
Value
For range estimation: a matrix with columns "lower" and "upper" For point estimation: a named vector with elements "lower" and "upper"
Examples
X <- rnorm(100)
fit <- biasBound_density(X, h = 0.1)
confint(fit)
Extract Confidence Intervals from bbnp_regression Object
Description
Extracts confidence intervals for conditional expectation estimates
Usage
## S3 method for class 'bbnp_regression'
confint(object, parm = NULL, level = 0.95, ...)
Arguments
object |
An object of class bbnp_regression |
parm |
Not used (included for S3 generic compatibility) |
level |
Confidence level (default: 0.95). Note: this parameter is not used as the confidence level is fixed at object creation time. |
... |
Additional arguments (unused) |
Value
For range estimation: a matrix with columns "lower" and "upper" For point estimation: a named vector with elements "lower" and "upper"
Examples
X <- rnorm(100)
Y <- X^2 + rnorm(100)
fit <- biasBound_condExpectation(Y, X, h = 0.1)
confint(fit)
Create a configuration object for bias bound estimations
Description
Create a configuration object for bias bound estimations
Usage
create_biasBound_config(
X,
Y = NULL,
h = NULL,
h_method = "cv",
use_fft = TRUE,
alpha = 0.05,
resol = 100,
xi_lb = NULL,
xi_ub = NULL,
methods_get_xi = "snr",
noise_floor = "auto",
envelope_use_Y = TRUE,
integer_r = TRUE,
kernel.fun = "Schennach2004",
if_approx_kernel = TRUE,
kernel.resol = 1000
)
Arguments
X |
A numerical vector of sample data. |
Y |
Optional. A numerical vector of sample data for conditional expectation. |
h |
A scalar bandwidth parameter. If NULL, the bandwidth is automatically selected using the method specified in 'h_method'. |
h_method |
Method for automatic bandwidth selection when h is NULL. Options are "cv" (cross-validation) and "silverman" (Silverman's rule of thumb). Default is "cv". |
use_fft |
Ignored. Maintained for backward compatibility. |
alpha |
Confidence level for intervals. |
resol |
Resolution for the estimation range. |
xi_lb |
Lower bound for the interval of Fourier Transform frequency. |
xi_ub |
Upper bound for the interval of Fourier Transform frequency. |
methods_get_xi |
Method to determine the xi interval: "snr" (default), "Schennach", or "Schennach_loose". |
noise_floor |
Noise-floor form for the Schennach test: "auto" (default), "compact", or "general". |
envelope_use_Y |
If TRUE (default), fit the regression envelope to the cross-spectrum
|
integer_r |
If TRUE (default), when the fitted envelope slope falls below the minimum smoothness assumed by Schennach (2020, Definition 2), i.e. r < 2, clamp it up to r = 2 and refit A; this keeps the bias-bound integral finite. Slopes >= 2 are left unchanged. |
kernel.fun |
Kernel function to be used. Options include "normal", "epanechnikov", "Schennach2004", and "sinc". |
if_approx_kernel |
Use approximations for the kernel function. |
kernel.resol |
Resolution for kernel approximation. |
Value
A configuration object (list) with all parameters
Create kernel functions based on configuration
Description
Create kernel functions based on configuration
Usage
create_kernel_functions(
kernel.fun = "Schennach2004",
if_approx_kernel = TRUE,
kernel.resol = 1000
)
Arguments
kernel.fun |
A string specifying the kernel function to be used. |
if_approx_kernel |
Logical. If TRUE, uses approximations for the kernel function. |
kernel.resol |
The resolution for kernel function approximation. |
Value
A list containing kernel function, its Fourier transform, and the kernel type
Cross-Validation for Bandwidth Selection
Description
Implements least-squares cross-validation for bandwidth selection with any kernel function. Uses FFT-based algorithm for n >= 100 (fast, O(m log m) complexity) and exact computation for n < 100 (accurate). The FFT method bins data onto a regular grid and computes the LSCV objective via convolution in the frequency domain.
Usage
cv_bandwidth(
X,
h_grid = NULL,
kernel_func,
kernel_type = "normal",
grid_size = 512
)
Arguments
X |
A numerical vector of sample data. |
h_grid |
A numerical vector of bandwidth values to evaluate. If NULL (default), a grid is automatically generated based on the range and distribution of the data. |
kernel_func |
The kernel function to use for cross-validation. |
kernel_type |
A string identifying the kernel type, used only for reference bandwidth. |
grid_size |
Number of grid points for FFT-based evaluation (used when n >= 100). Default is 512. Larger values increase accuracy but reduce speed. Automatically rounded up to the next power of 2. |
Value
A scalar representing the optimal bandwidth that minimizes the cross-validation score.
Examples
# Generate sample data
X <- rnorm(100)
# Get optimal bandwidth using cross-validation with a normal kernel
kernel_functions <- create_kernel_functions("normal")
h_opt <- cv_bandwidth(X, kernel_func = kernel_functions$kernel,
kernel_type = kernel_functions$kernel_type)
Epanechnikov Kernel
Description
Epanechnikov Kernel
Usage
epanechnikov_kernel(u)
Arguments
u |
A numerical value or vector representing the input to the kernel function. |
Value
Returns the value of the Epanechnikov kernel function at the given input.
Fourier Transform Epanechnikov Kernel
Description
Fourier Transform Epanechnikov Kernel
Usage
epanechnikov_kernel_ft(xi)
Arguments
xi |
A numerical value or vector representing the frequency domain. |
Value
Returns the value of the Fourier transform of the Epanechnikov kernel at the given frequency/frequencies.
Extract Fitted Values from bbnp_regression Object
Description
Extracts the fitted conditional expectation values E[Y|X=x]
Usage
## S3 method for class 'bbnp_regression'
fitted(object, ...)
Arguments
object |
An object of class bbnp_regression |
... |
Additional arguments (unused) |
Value
Numeric vector of fitted values for range estimation, or a single numeric value for point estimation
Examples
X <- rnorm(100)
Y <- X^2 + rnorm(100)
fit <- biasBound_condExpectation(Y, X, h = 0.1)
fitted(fit)
Approximation Function for Intensive Calculations
Description
This function provides a lookup-based approximation for calculations that are computationally intensive. Once computed, it stores the results in an environment and uses linear interpolation for new data points to speed up subsequent computations.
Usage
fun_approx(u, u_lb = -100, u_ub = 100, resol = 10000, fun = W_kernel)
Arguments
u |
A vector of values where the function should be evaluated. |
u_lb |
Lower bound for the precomputed range. Defaults to -10. |
u_ub |
Upper bound for the precomputed range. Defaults to 10. |
resol |
The resolution or number of sample points in the precomputed range. Defaults to 1000. |
fun |
A function for which the approximation is computed. Defaults to the |
Details
The fun_approx function works by initially creating a lookup table of function values based on
the range specified by u_lb and u_ub and the resolution resol. This precomputation only happens once
for a given set of parameters (u_lb, u_ub, resol, and fun). Subsequent calls to fun_approx with the
same parameters use the lookup table to find the closest precomputed points to the requested u values
and then return an interpolated result.
Linear interpolation is used between the two closest precomputed points in the lookup table. This ensures a smooth approximation for values in between sample points.
This function is especially useful for computationally intensive functions where recalculating
function values is expensive or time-consuming. By using a combination of precomputation and
interpolation, fun_approx provides a balance between accuracy and speed.
Value
A vector of approximated function values corresponding to u.
Generate Sample Data
Description
This function used for generate some sample data for experiment
Usage
gen_sample_data(size, dgp, seed = NULL)
Arguments
size |
control the sample size. |
dgp |
data generating process, have options "normal", "chisq", "mixed", "poly", "2_fold_uniform". |
seed |
random seed number. |
Value
A numeric vector of length size. The elements of the vector
are generated according to the specified dgp:
- normal
Normally distributed values with mean 0 and standard deviation 2.
- chisq
Chi-squared distributed values with df = 10.
- mixed
Half normally distributed (mean 0, sd = 2) and half chi-squared distributed (df = 10) values.
- poly
Values from a polynomial cumulative distribution function on
[0,1].- 2_fold_uniform
Sum of two uniformly distributed random numbers.
Examples
# Generate 500 samples from 2-fold uniform distribution
X <- gen_sample_data(500, "2_fold_uniform", seed = 123)
Kernel point estimation
Description
Computes the point estimate using the specified kernel function.
Usage
get_avg_f1x(X, x, h, inf_k)
Arguments
X |
A numerical vector of sample data. |
x |
A scalar representing the point where the density is estimated. |
h |
A scalar bandwidth parameter. |
inf_k |
Kernel function used for the computation. |
Value
A scalar representing the kernel density estimate at point x.
Kernel point estimation
Description
Computes the point estimate using the specified kernel function.
Usage
get_avg_fyx(Y, X, x, h, inf_k)
Arguments
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
x |
A scalar representing the point where the density is estimated. |
h |
A scalar bandwidth parameter. |
inf_k |
Kernel function used for the computation. |
Value
A scalar representing the kernel density estimate at point x.
Compute Sample Average of Fourier Transform Magnitude
Description
Compute Sample Average of Fourier Transform Magnitude
Usage
get_avg_phi(Y = 1, X, xi)
Arguments
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
xi |
A single numerical value representing the frequency at which the Fourier transform is computed. |
Value
Returns the sample estimation of expected Fourier transform at frequency xi.
Compute log sample average of fourier transform and get mod
Description
Compute log sample average of fourier transform and get mod
Usage
get_avg_phi_log(Y = 1, X, ln_xi)
Arguments
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
ln_xi |
A single numerical value representing the log frequency at which the Fourier transform is computed. |
Value
Returns the log sample estimation of expected Fourier transform at frequency xi.
get the conditional variance of Y on X for given x
Description
Vectorized implementation using matrix operations for improved performance. For large datasets (n > 10,000), memory usage is approximately 8n^2 bytes.
Usage
get_conditional_var(X, Y, x, h, kernel_func)
Arguments
X |
A numerical vector representing the sample data of variable X. |
Y |
A numerical vector representing the sample data of variable Y. |
x |
The specific point at which the conditional variance is to be calculated. |
h |
A bandwidth parameter used in the kernel function for smoothing. |
kernel_func |
A kernel function used to weigh observations in the neighborhood of point x. |
Details
Performance: Achieves >10x speedup for n=1,000 and >20x speedup for n=10,000 compared to the original loop-based implementation.
Memory: Allocates an n x n distance matrix. For n=10,000, requires ~800 MB. Issues a warning when estimated memory exceeds 4 GB and stops with an error when estimated memory exceeds 10 GB.
Value
Returns a non-negative scalar representing the estimated conditional variance of Y given X at the point x. Returns 0 if the computed variance is negative.
Original loop-based implementation of conditional variance (preserved for testing)
Description
Original loop-based implementation of conditional variance (preserved for testing)
Usage
get_conditional_var_loop(X, Y, x, h, kernel_func)
Arguments
X |
A numerical vector representing the sample data of variable X. |
Y |
A numerical vector representing the sample data of variable Y. |
x |
The specific point at which the conditional variance is to be calculated. |
h |
A bandwidth parameter used in the kernel function for smoothing. |
kernel_func |
A kernel function used to weigh observations in the neighborhood of point x. |
Value
Returns a non-negative scalar representing the estimated conditional variance of Y given X at the point x. Returns 0 if the computed variance is negative.
Estimate A and r parameters for Fourier envelope
Description
Estimates parameters A and r that bound the Fourier transform decay. Uses linear programming by default, with automatic fallback to grid search.
Usage
get_est_Ar(
Y = 1,
X,
xi_interval,
r_stepsize = 150,
method = "auto",
use_Y = FALSE,
integer_r = FALSE
)
Arguments
Y |
Numeric vector (default 1 for density estimation) |
X |
Numeric vector of sample data |
xi_interval |
List with xi_lb and xi_ub |
r_stepsize |
Grid resolution for fallback method (default 150). Only used when method="grid". Ignored for LP method. |
method |
Character: "lp" (default), "grid", or "auto" |
Details
The LP-based method (default) is typically 1-5x faster than grid search, with speedup depending on sample size and Fourier transform computation cost. The grid search method is preserved for validation and as fallback if LP solver is unavailable.
The function finds the minimal-area envelope line in log-log space by minimizing the integral subject to envelope constraints on the Fourier transform decay.
Value
Named numeric vector with est_A and est_r
Estimate A and r parameters via grid search (LEGACY)
Description
This is the original grid search implementation, kept for fallback and validation.
Usage
get_est_Ar_grid(Y = 1, X, xi_interval, r_stepsize = 150, use_Y = FALSE)
Arguments
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
xi_interval |
A list with elements |
r_stepsize |
An integer value representing the number of steps in the r range. This controls the granularity of the estimation. Higher values lead to finer granularity but increase computation time. |
Details
The function internally defines a range for the natural logarithm of frequency values (ln_xi_range)
and a range for the parameter r (r_range). It then defines an optimization function optim_ln_A
to minimize the integral of a given function over the ln_xi_range. The actual estimation is done by
finding the r and A value that minimizes the the area of the line \ln A - r \ln \xi under the constraint that the line should not go below the Fourier transform curve.
Value
A named vector with elements est_A and est_r representing the estimated
values of A and r, respectively.
Estimate A and r parameters via linear programming
Description
Finds the minimal-area envelope line in log-log space using linear programming.
Usage
get_est_Ar_lp(Y = 1, X, xi_interval, r_max = 50, use_Y = FALSE)
Arguments
Y |
Numeric vector (default 1 for density estimation) |
X |
Numeric vector of sample data |
xi_interval |
List with xi_lb and xi_ub |
r_max |
Maximum allowed value for r (default 50) |
Value
Named numeric vector with est_A and est_r
get the estimation of B
Description
get the estimation of B
Usage
get_est_B(Y)
Arguments
Y |
A numerical vector representing the sample data of variable Y. |
Value
The mean of the absolute values of the elements in Y, representing the estimated value of B.
Estimation of bias b1x
Description
Computes the bias estimate for given parameters.
Usage
get_est_b1x(X, h, est_Ar, inf_k_ft, ...)
Arguments
X |
A numerical vector representing the sample data of variable X. |
h |
A scalar bandwidth parameter. |
est_Ar |
A vector containing the estimated A and r parameters. |
inf_k_ft |
A kernel Fourier transform function. |
... |
Additional arguments passed to the quadgk integration function. |
Value
A scalar representing the bias b1x estimate.
Estimation of bias byx
Description
Estimation of bias byx
Usage
get_est_byx(Y, X, ...)
Arguments
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
... |
Additional arguments passed to other methods. |
Value
A scalar representing the bias byx estimate.
get the estimation of Vy
Description
get the estimation of Vy
Usage
get_est_vy(Y)
Arguments
Y |
A numerical vector representing the sample data of variable Y. |
Estimation of sigma
Description
Computes the sigma estimate for given parameters.
Usage
get_sigma(X, x, h, inf_k)
Arguments
X |
A numerical vector of sample data. |
x |
A scalar representing the point where the density is estimated. |
h |
A scalar bandwidth parameter. |
inf_k |
Kernel function used for the computation. |
Value
A scalar representing the sigma estimate at point x. Returns 0 if the density estimate is negative.
Estimation of sigma_yx
Description
Estimation of sigma_yx
Usage
get_sigma_yx(Y, X, x, h, inf_k)
Arguments
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
x |
The specific point at which sigma_yx is to be estimated. |
h |
A bandwidth parameter used in the kernel function for smoothing. |
inf_k |
A kernel function used to weigh observations in the neighborhood of point x. |
Value
Returns a scalar representing the estimated value of sigma_yx at the point x. Returns 0 if either fyx or conditional variance is negative.
get xi interval
Description
get xi interval
Usage
get_xi_interval(
Y = 1,
X,
methods = "snr",
noise_floor = c("auto", "compact", "general")
)
Arguments
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
methods |
A character string: "snr" (default; a signal-to-noise cutoff that selects a valid frequency window at realistic sample sizes), "Schennach" (the data-driven rule of Schennach 2020, Theorem 2), or "Schennach_loose" (the initial, un-refined interval). |
noise_floor |
Noise-floor form for the feasibility test: "auto" (default; general
for non-constant Y, compact otherwise), "compact", or "general". See
|
Details
The "Schennach" method computes the xi interval by performing a test based on the
Schennach's theorem, adjusting the upper bound xi_ub if the test condition is met.
The "Schennach_loose" method provides a looser calculation of the xi interval without
performing the Schennach's test.
The "snr" method selects the upper cutoff from the pointwise signal-to-noise ratio of the
empirical Fourier transform (see .snr_window()); unlike "Schennach" it always selects a
valid cutoff at realistic sample sizes and never returns an empty window.
Value
A list containing the lower (xi_lb) and upper (xi_ub) bounds of the xi interval.
Kernel Regression function
Description
Kernel Regression function
Usage
kernel_reg(X, Y, x, h, kernel_func)
Arguments
X |
A numerical vector representing the sample data of variable X. |
Y |
A numerical vector representing the sample data of variable Y. |
x |
The point at which the regression function is to be estimated. |
h |
A bandwidth parameter that determines the weight assigned to each observation in X. |
kernel_func |
A function that computes the weight of each observation based on its distance to x. |
Value
Returns a scalar representing the estimated value of the regression function at the point x.
S3 Constructor for bbnp_density Objects
Description
Creates a bbnp_density S3 object with validation
Usage
new_bbnp_density(
density = NULL,
x = NULL,
estimate = NULL,
conf_int = list(),
bias_bound = list(),
std_error = NULL,
call = NULL,
bandwidth = NULL,
n = NULL,
kernel = NULL,
data = list(),
internals = list()
)
Arguments
density |
Numeric vector of density estimates (for range estimation) |
x |
Numeric vector of evaluation points |
estimate |
Numeric scalar point estimate (for point estimation) |
conf_int |
List containing lower, upper, and conf_level |
bias_bound |
List containing b1x, est_A, est_r, xi_interval |
std_error |
Numeric vector of standard errors |
call |
The original function call |
bandwidth |
Numeric scalar bandwidth used |
n |
Integer sample size |
kernel |
Character string kernel type |
data |
List containing original data (X) |
internals |
List containing internal objects (config, kernel_functions) |
Value
An object of class c("bbnp_density", "bbnp")
S3 Constructor for bbnp_regression Objects
Description
Creates a bbnp_regression S3 object with validation
Usage
new_bbnp_regression(
fitted_values = NULL,
x = NULL,
estimate = NULL,
conf_int = list(),
bias_bound = list(),
std_error = NULL,
marginal_density = NULL,
joint_density = NULL,
call = NULL,
bandwidth = NULL,
n = NULL,
kernel = NULL,
data = list(),
internals = list()
)
Arguments
fitted_values |
Numeric vector of |
x |
Numeric vector of evaluation points |
estimate |
Numeric scalar point estimate (for point estimation) |
conf_int |
List containing lower, upper, and conf_level |
bias_bound |
List containing b1x, byx, est_A, est_r, est_B, xi_interval |
std_error |
Numeric vector of standard errors |
marginal_density |
Numeric vector of f(x) estimates |
joint_density |
Numeric vector of f_YX estimates |
call |
The original function call |
bandwidth |
Numeric scalar bandwidth used |
n |
Integer sample size |
kernel |
Character string kernel type |
data |
List containing original data (X, Y) |
internals |
List containing internal objects (config, kernel_functions) |
Value
An object of class c("bbnp_regression", "bbnp")
Normal Kernel Function
Description
Normal Kernel Function
Usage
normal_kernel(u)
Arguments
u |
A numerical value or vector representing the input to the kernel function. |
Value
Returns the value of the Normal kernel function at the given input.
Fourier Transform of Normal Kernel
Description
Fourier Transform of Normal Kernel
Usage
normal_kernel_ft(xi)
Arguments
xi |
A numerical value or vector representing the frequency domain. |
Value
Returns the value of the Fourier transform of the Normal kernel at the given frequency/frequencies.
Plot Method for bbnp_density Objects
Description
Creates visualizations of bias-bounded density estimation results
Usage
## S3 method for class 'bbnp_density'
plot(
x,
type = c("density", "ft"),
fill_ci = .bbnp_pal[["ci"]],
fill_bias = .bbnp_pal[["bias"]],
alpha_ci = 0.55,
alpha_bias = 0.55,
ft_resol = 500,
xi_range = NULL,
expand = 2.5,
...
)
Arguments
x |
An object of class bbnp_density |
type |
Character string specifying plot type. Options are:
|
fill_ci |
Color for confidence interval ribbon (default: a muted blue). |
fill_bias |
Color for bias bound ribbon (default: a muted terracotta). |
alpha_ci |
Transparency for confidence interval ribbon (default: 0.30) |
alpha_bias |
Transparency for bias bound ribbon (default: 0.45) |
ft_resol |
Resolution for Fourier transform plot (default: 500) |
xi_range |
Optional numeric |
expand |
For the |
... |
Additional arguments (unused) |
Value
A ggplot2 object
Examples
X <- rnorm(100)
fit <- biasBound_density(X, h = 0.1)
plot(fit)
plot(fit, type = "ft")
Plot Method for bbnp_regression Objects
Description
Creates visualizations of bias-bounded conditional expectation estimation results
Usage
## S3 method for class 'bbnp_regression'
plot(
x,
type = c("regression", "ft"),
fill_ci = .bbnp_pal[["ci"]],
alpha_ci = 0.55,
point_alpha = 0.28,
point_color = .bbnp_pal[["point"]],
ft_resol = 500,
xi_range = NULL,
expand = 2.5,
...
)
Arguments
x |
An object of class bbnp_regression |
type |
Character string specifying plot type. Options are:
|
fill_ci |
Color for confidence interval ribbon (default: a muted blue). |
alpha_ci |
Transparency for confidence interval ribbon (default: 0.35) |
point_alpha |
Transparency for data points (default: 0.28) |
point_color |
Color for data points (default: a soft grey). |
ft_resol |
Resolution for Fourier transform plot (default: 500) |
xi_range |
Optional numeric |
expand |
For the |
... |
Additional arguments (unused) |
Value
A ggplot2 object
Examples
X <- rnorm(100)
Y <- X^2 + rnorm(100)
fit <- biasBound_condExpectation(Y, X, h = 0.1)
plot(fit)
plot(fit, type = "ft")
Internal Helper: Plot Density Estimation
Description
Creates density plot with bias bounds and confidence intervals
Usage
plot_density_internal(x, fill_ci, fill_bias, alpha_ci, alpha_bias)
Arguments
x |
An object of class bbnp_density |
fill_ci |
Color for confidence interval ribbon |
fill_bias |
Color for bias bound ribbon |
alpha_ci |
Transparency for confidence interval ribbon |
alpha_bias |
Transparency for bias bound ribbon |
Value
A ggplot2 object
Plot the Fourier Transform (Deprecated)
Description
Deprecated: This function is deprecated and will be removed in a future version.
Use plot(fit, type = "ft") instead, where
fit is a bbnp_density or bbnp_regression object.
Usage
plot_ft(X, xi_interval, ft_plot.resol = 500)
Arguments
X |
A numerical vector of sample data. |
xi_interval |
A list containing the lower ( |
ft_plot.resol |
An integer representing the resolution of the plot, specifically the number of points used to represent the Fourier transform. Defaults to 500. |
Details
C = 1, the parameter in O(1/n^{0.25}), see more details in in Schennach (2020).
Value
A ggplot object representing the plot of the Fourier transform.
Examples
## Not run:
# Old (deprecated):
# plot_ft(sample_data$X, xi_interval = list(xi_lb = 1, xi_ub = 50))
# New (recommended):
fit <- biasBound_density(sample_data$X, h = 0.1)
plot(fit, type = "ft")
## End(Not run)
Internal Helper: Plot Fourier Transform
Description
Creates a Fourier transform plot over a wide frequency range with the
rule-selected fitting window shaded and the estimated envelope overlaid.
For regression objects the cross-spectrum |phi_YX| is shown (matching the
fitted envelope); for density objects the marginal |phi_X| is shown.
Usage
plot_ft_internal(x, ft_resol, xi_range = NULL, expand = 2.5)
Arguments
x |
An object of class bbnp_density or bbnp_regression |
ft_resol |
Resolution for Fourier transform plot |
xi_range |
Optional display range c(lower, upper); NULL = wide auto range |
expand |
How far past the window to display when xi_range is NULL |
Value
A ggplot2 object
Internal Helper: Plot Conditional Expectation
Description
Creates regression plot with confidence interval and data points
Usage
plot_regression_internal(x, fill_ci, alpha_ci, point_alpha, point_color)
Arguments
x |
An object of class bbnp_regression |
fill_ci |
Color for confidence interval ribbon |
alpha_ci |
Transparency for confidence interval ribbon |
point_alpha |
Transparency for data points |
point_color |
Color for data points |
Value
A ggplot2 object
Print Method for bbnp_density Objects
Description
Print Method for bbnp_density Objects
Usage
## S3 method for class 'bbnp_density'
print(x, digits = 4, ...)
Arguments
x |
An object of class bbnp_density |
digits |
Number of digits to display (default: 4) |
... |
Additional arguments (unused) |
Value
Invisibly returns the input object
Print Method for bbnp_regression Objects
Description
Print Method for bbnp_regression Objects
Usage
## S3 method for class 'bbnp_regression'
print(x, digits = 4, ...)
Arguments
x |
An object of class bbnp_regression |
digits |
Number of digits to display (default: 4) |
... |
Additional arguments (unused) |
Value
Invisibly returns the input object
Print Method for summary.bbnp_density Objects
Description
Print Method for summary.bbnp_density Objects
Usage
## S3 method for class 'summary.bbnp_density'
print(x, digits = 4, ...)
Arguments
x |
An object of class summary.bbnp_density |
digits |
Number of digits to display (default: 4) |
... |
Additional arguments (unused) |
Value
Invisibly returns the input object
Print Method for summary.bbnp_regression Objects
Description
Print Method for summary.bbnp_regression Objects
Usage
## S3 method for class 'summary.bbnp_regression'
print(x, digits = 4, ...)
Arguments
x |
An object of class summary.bbnp_regression |
digits |
Number of digits to display (default: 4) |
... |
Additional arguments (unused) |
Value
Invisibly returns the input object
Generate n samples from the distribution
Description
Generate n samples from the distribution
Usage
rpoly01(n, k = 5)
Arguments
n |
The number of samples to generate. |
k |
The exponent in the distribution function, defaults to 5. |
Value
A vector of n samples from the specified polynomial distribution.
CDF: f(x) = (x-1)^k + 1
Sample Data
Description
Sample Data
Usage
sample_data
Format
A data frame with 1000 rows and 2 variables:
- X
Numeric vector, generated from 2 fold uniform distribution.
- Y
Numeric vector,
Y = -X^2 + 3*X + rnorm(1000)*X.
Select Optimal Bandwidth
Description
Selects an optimal bandwidth using the specified method.
Usage
select_bandwidth(
X,
Y = NULL,
method = "cv",
kernel.fun = "normal",
if_approx_kernel = TRUE,
kernel.resol = 1000
)
Arguments
X |
A numerical vector of sample data. |
Y |
Optional. A numerical vector of sample data for conditional expectation estimation. |
method |
A string specifying the bandwidth selection method. Options are "cv" for cross-validation and "silverman" for Silverman's rule of thumb. Defaults to "cv". |
kernel.fun |
A string specifying the kernel type. Options include "normal", "epanechnikov", "Schennach2004", and "sinc". |
if_approx_kernel |
Logical. If TRUE, uses approximations for the kernel function. |
kernel.resol |
The resolution for kernel function approximation. |
Value
A scalar representing the optimal bandwidth.
Examples
# Generate sample data
X <- rnorm(100)
# Get optimal bandwidth using cross-validation with normal kernel
h_opt <- select_bandwidth(X, method = "cv", kernel.fun = "normal")
# Get optimal bandwidth using Silverman's rule with Schennach kernel
h_opt <- select_bandwidth(X, method = "silverman", kernel.fun = "Schennach2004")
Silverman's Rule of Thumb for Bandwidth Selection
Description
Implements Silverman's rule of thumb for selecting an optimal bandwidth in kernel density estimation.
Usage
silverman_bandwidth(X, kernel_type = "normal")
Arguments
X |
A numerical vector of sample data. |
kernel_type |
A string identifying the kernel type. |
Value
A scalar representing the optimal bandwidth.
Examples
# Generate sample data
X <- rnorm(100)
# Get optimal bandwidth using Silverman's rule
h_opt <- silverman_bandwidth(X, kernel_type = "normal")
Infinite Kernel Function
Description
Infinite Kernel Function
Usage
sinc(u)
Arguments
u |
A numerical value or vector where the sinc function is evaluated. |
Value
The value of the sinc function at each point in u.
Define the closed form FT of the infinite order kernel sin(x)/(pi*x)
Description
Define the closed form FT of the infinite order kernel sin(x)/(pi*x)
Usage
sinc_ft(x)
Arguments
x |
A numerical value or vector where the Fourier Transform is evaluated. |
Value
The value of the Fourier Transform of the sinc function at each point in x.
Summary Method for bbnp_density Objects
Description
Summary Method for bbnp_density Objects
Usage
## S3 method for class 'bbnp_density'
summary(object, ...)
Arguments
object |
An object of class bbnp_density |
... |
Additional arguments (unused) |
Value
An object of class summary.bbnp_density
Summary Method for bbnp_regression Objects
Description
Summary Method for bbnp_regression Objects
Usage
## S3 method for class 'bbnp_regression'
summary(object, ...)
Arguments
object |
An object of class bbnp_regression |
... |
Additional arguments (unused) |
Value
An object of class summary.bbnp_regression
True density of 2-fold uniform distribution
Description
True density of 2-fold uniform distribution
Usage
true_density_2fold(x)
Arguments
x |
A numerical value or vector where the true density function is evaluated. |
Value
The value of the true density of the 2-fold uniform distribution at each point in x.