---
title: "Introduction to smriti: Structural Variance Preservation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to smriti}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## The Imputation Uncertainty Principle
Modern machine learning imputation algorithms (like `missForest`) excel at
minimizing point-wise prediction error (RMSE). However, this point-wise
optimization inherently shrinks the variance of the imputed values, causing
**structural variance collapse**. In longitudinal Growth Curve Models (GCM),
this crushes the latent slope variance ($\sigma^2_S$), destroying the
statistical power needed to track patient trajectories over time.

The `smriti` package resolves this by decoupling prediction from structural
geometry. It utilizes a two-stage architecture:
1. **Initialization:** Non-parametric imputation bridges the missingness to
   establish a dense matrix.
2. **Lagrangian Projection:** A C++ gradient descent layer projects the
   hallucinated data toward a target covariance manifold while preserving
   fidelity to the initial imputed values. The augmented loss function is

   $$L(X) = \frac{1}{2}\|X - X_{\text{imp}}\|_F^2
            + \frac{\lambda}{2}\|\operatorname{cov}(X) - \Sigma_{\text{target}}\|_F^2$$

   where the first term anchors the solution near the initial imputation
   and the second (governed by $\lambda$) enforces the covariance structure.

## The Robustness-Efficiency Tradeoff
Real-world clinical data often contains heavy-tailed skew or corrupted sensor
artifacts. The `smriti_impute()` function handles this via the `robust`
routing toggle:

*   `robust = FALSE`: Uses pairwise-complete Pearson covariance, projected to
    the nearest positive-semidefinite matrix to correct any non-PSD artefacts
    from pairwise deletion. Best for well-behaved, approximately-Normal data.
*   `robust = TRUE`: Constructs the target from pairwise Spearman correlations
    (rank-based, outlier-resistant) and column-wise MAD scale estimates.
    The resulting matrix is projected to the nearest PSD manifold, producing a
    target that is structurally robust to severe outliers (e.g., broken EHR
    sensors).

## Fidelity-Constraint Balance
The penalty weight `lambda` controls the trade-off between preserving the
original imputation values and matching the target covariance. At `lambda = 1.0`
(the default) both objectives are weighted equally. Increasing `lambda`
enforces the covariance constraint more strictly but allows greater deviation
from the initial imputation. The `learning_rate` (default `0.001`) governs
gradient step size; `max_iter` (default `2000`) bounds the optimisation.

## Example: Shielding Against Corrupted EHR Data
```{r, eval=FALSE}
library(smriti)
library(missForest)

# Load clinical data with structural missingness and sensor artifacts
data <- read.csv("clinical_proxy.csv")

# Execute robust refinement to isolate the structural manifold
clean_data <- smriti_impute(
  data       = data,
  time_cols  = c("T1", "T2", "T3", "T4"),
  robust     = TRUE,
  lambda     = 1.0
)
```
