---
title: "CharAnalysis: Diagnostic and analytical tools for peak detection and fire-history interpretations using high-resolution sediment-charcoal records"
author: "Philip Higuera"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    fig_width: 7
    fig_height: 5
vignette: >
  %\VignetteIndexEntry{CharAnalysis: Diagnostic and analytical tools for peak detection and fire-history interpretations using high-resolution sediment-charcoal records}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.align = "center",
  warning   = FALSE,
  message   = FALSE
)
```

## Overview

*CharAnalysis* reconstructs local fire histories from lake-sediment charcoal records. Charcoal preserved in lake sediments is a direct proxy for past fire activity: individual fire events near a lake deposit pulses of charcoal that appear as peaks above a slowly varying background signal. *CharAnalysis* formalises this peak-detection logic as a reproducible, quantitative workflow.

This R package (v2.0.0) is a direct translation of *CharAnalysis* v2.0 (MATLAB), validated against reference outputs on four benchmark datasets spanning a range of record lengths, sampling resolutions, ecosystems, and analysis configurations. Analytical methods are described in Higuera et al. (2009).

The full workflow proceeds in five steps:

1. **Pretreatment** — Resample the raw charcoal series to equal time steps and compute charcoal accumulation rate (CHAR, pieces cm^-2^ yr^-1^).
2. **Smoothing** — Estimate low-frequency trends (C~background~) using lowess or a moving-window statistic.
3. **Peak decomposition** — Compute the high-frequency residual C~peak~ = C~interp~ − C~background~ (or ratio C~interp~ / C~background~).
4. **Thresholding** — Define a noise threshold and flag C~peak~ values that exceed it as candidate fire events.  Thresholds can be global (one distribution fitted to the full record) or local (sliding-window distribution).
5. **Screening and metrics** — Apply a minimum-count significance test, then compute fire-return intervals (FRIs), Weibull statistics, smoothed fire frequency, signal-to-noise index (SNI), and goodness-of-fit.

---

## Installation

```{r install, eval = FALSE}
# Install from GitHub (requires devtools)
devtools::install_github("phiguera/CharAnalysis",
                         subdir = "CharAnalysis_2_0_R")

# Suggested packages for figures
install.packages(c("ggplot2", "patchwork", "ggtext"))
```

---

## Worked example: Code Lake

Code Lake (CO; Colorado, USA) is the primary validation dataset for *CharAnalysis*. The record spans approximately 7,300 calibrated years BP and is analysed here using a local Gaussian mixture model (GMM) threshold — the recommended default configuration.

### Input files

*CharAnalysis* reads two CSV files:

- **`CO_charData.csv`** — the charcoal data table (depth, age, volume, count).
- **`CO_charParams.csv`** — the analysis parameter file.

Both files are included in the package's `inst/extdata/` directory and can be located with `system.file()`:

```{r paths, eval = FALSE}
params_file <- system.file("extdata", "CO_charParams.csv",
                           package = "CharAnalysis")
```

For this vignette we reference the files directly from the repository:

```{r set-paths, eval = FALSE}
params_file <- "path/to/CO_charParams.csv"   # adjust to your local path
```

### Running the full pipeline

A single call to `CharAnalysis()` runs all five analytical steps and returns a named list of results.

```{r run, eval = FALSE}
library(CharAnalysis)

out <- CharAnalysis(params_file)
```

The progress messages show each step completing in sequence:

```
(1) Reading input file...
      ...done.
(1b) Validating input parameters...
(2) Pretreating charcoal data...
      ...done.
(3) Smoothing resampled CHAR to estimate low-frequency trends
    and calculating peak CHAR...
      ...done.
(4) Defining possible thresholds for peak identification...
      ...done.
(5) Identifying peaks and applying minimum-count screening...
      ...done.
(6) Post-processing: fire-return intervals, Weibull statistics...
      ...done.
(7) Pipeline complete.  Call char_write_results(...) to save output CSV.
```

### Exploring the output

`CharAnalysis()` returns a named list. The most commonly used elements are:

```{r output-structure, eval = FALSE}
names(out)
#> [1] "charcoal"      "pretreatment"  "smoothing"     "peak_analysis"
#> [5] "results"       "site"          "gap_in"        "char_thresh"
#> [9] "post"          "char_results"
```

#### The `charcoal` object

`out$charcoal` holds all time-series outputs, both raw and processed:

```{r charcoal, eval = FALSE}
# Inspect the first few rows of key time series
head(data.frame(
  age_BP    = out$charcoal$ybpI,          # interpolated age (yr BP)
  CHAR      = out$charcoal$accI,          # C_interpolated  (pieces cm-2 yr-1)
  C_bkg     = out$charcoal$accIS,         # C_background
  C_peak    = out$charcoal$peak,          # C_peak (residuals)
  peaks     = out$charcoal$charPeaks[, 4] # final-threshold peak flags (0/1)
))
```

#### The `char_thresh` object

`out$char_thresh` holds threshold values, SNI, and goodness-of-fit results:

```{r thresh, eval = FALSE}
# Threshold at the final percentile (column 4 = threshValues[4])
range(out$char_thresh$pos[, 4], na.rm = TRUE)

# Signal-to-noise index (SNI): values > 3 indicate a strong signal
summary(out$char_thresh$SNI)
```

#### Post-processing metrics

`out$post` holds fire-history summary metrics:

```{r post, eval = FALSE}
# Fire-return intervals (FRIs) and mean FRI
cat("Number of FRIs:", length(out$post$FRI), "\n")
cat("Mean FRI:", round(mean(out$post$FRI), 1), "yr\n")

# Per-zone Weibull statistics (zone 1)
fri_z1 <- out$post$FRI_params_zone[1, ]
cat(sprintf(
  "Zone 1 — nFRI: %d  mFRI: %.1f yr  WBLb: %.1f  WBLc: %.2f\n",
  fri_z1[1], fri_z1[2], fri_z1[5], fri_z1[8]
))
```

#### The `char_results` matrix

`out$char_results` is the complete N × 33 output matrix, with columns matching the MATLAB `charResults` CSV exactly:

```{r char-results, eval = FALSE}
dim(out$char_results)
#> [1] 500  33

# Total number of fire events identified
sum(out$charcoal$charPeaks[, 4], na.rm = TRUE)
#> [1] 39
```

---

### Output figures

*CharAnalysis* provides five publication-quality ggplot2 figures. All are produced by `char_plot_all()`, or individually by their dedicated functions.

#### Figure 3 — C~interp~, C~background~, and C~peak~

```{r fig3, eval = FALSE}
char_plot_peaks(out)
```

The top panel shows resampled CHAR (black bars) with the smoothed C~background~ trend (grey line). The bottom panel shows C~peak~ with the positive and negative threshold lines (red), identified fire events (+ symbols), and peaks that failed the minimum-count screen (grey dots).

#### Figure 5 — Cumulative peaks through time

```{r fig5, eval = FALSE}
char_plot_cumulative(out)
```

The slope of the cumulative curve at any point reflects the instantaneous fire frequency. Changes in slope indicate periods of higher or lower fire activity.

#### Figure 6 — FRI distributions by zone

```{r fig6, eval = FALSE}
char_plot_fri(out)
```

Each panel shows a histogram of fire-return intervals within one stratigraphic zone (20-yr bins, normalised to proportions). A two-parameter Weibull probability density function is overlaid where the Kolmogorov-Smirnov goodness-of-fit test passes (p > 0.10 for n < 30; p > 0.05 for n ≥ 30). Weibull scale (b) and shape (c) parameters, mean FRI, and sample size are annotated.

#### Figure 7 — Continuous fire history

```{r fig7, eval = FALSE}
char_plot_fire_history(out)
```

Three stacked panels show (from top to bottom): peak magnitude (integrated C~peak~ above threshold per fire event, pieces cm^-2^ peak^-1^); fire-return intervals through time as filled squares with the smoothed mean FRI curve (black line) and bootstrapped 95% CI ribbon (grey); and lowess-smoothed fire frequency (fires per 1000 yr).

#### Figure 8 — Between-zone CHAR comparisons

```{r fig8, eval = FALSE}
char_plot_zones(out)
```

The left panel shows empirical cumulative distribution functions of raw CHAR within each zone, with pairwise two-sample Kolmogorov-Smirnov test p-values annotated. The right panel shows box plots (10th, 25th, 50th, 75th, 90th percentiles) of raw CHAR by zone.

#### Saving all figures to PDF

`out_dir` is a required argument when `save = TRUE`; the example below writes to a temporary directory so it can be run on any system, but you would normally substitute a path of your choosing (for example, `"Results"` inside your project folder).

```{r save-figs, eval = FALSE}
char_plot_all(out, save = TRUE, out_dir = tempdir())
# Saves to tempdir():
#   CO_03_CHAR_analysis.pdf
#   CO_05_cumulative_peaks.pdf
#   CO_06_FRI_distributions.pdf
#   CO_07_continuous_fire_hx.pdf
#   CO_08_zone_comparisons.pdf
```

> **Note:** Figures 9 (threshold sensitivity detail) and 10 (multi-site comparisons) from the MATLAB v2.0 interface are not implemented in this R package. All core analytical outputs are available through Figures 1–8 above.

---

### Writing results to CSV

`char_write_results()` writes the 33-column output matrix to a CSV file whose column names and format match the MATLAB `charResults` output exactly. `out_dir` is required (no default); substitute a path of your choosing for the temporary directory used here.

```{r write, eval = FALSE}
char_write_results(out$char_results,
                 site    = out$site,
                 out_dir = tempdir())
# Writes: <tempdir>/CO_charResults.csv
```

The output CSV contains one row per interpolated time step and 33 columns covering all analytical outputs from C~interp~ through to per-zone Weibull confidence intervals. Column headers match the MATLAB reference file exactly to facilitate direct numerical comparison.

---

## Key parameters

The parameter file (`*_charParams.csv`) controls all aspects of the analysis. The most commonly adjusted parameters are:

| Parameter | Default | Description |
|-----------|---------|-------------|
| `yrInterp` | 15 | Resampling resolution (yr). Set to 0 for automatic (median raw resolution). |
| `yr` | 500 | Smoothing window width (yr) for C~background~ estimation. |
| `threshType` | 2 | Threshold type: 1 = global, 2 = local (sliding window). |
| `threshMethod` | 3 | Noise distribution: 2 = Gaussian, 3 = Gaussian mixture model. |
| `threshValues` | 0.95, 0.99, 0.999, 0.99 | Percentile thresholds; the final value defines the working threshold. |
| `minCountP` | 0.05 | Alpha level for the minimum-count significance screen. |
| `peakFrequ` | 1000 | Window width (yr) for smoothed fire frequency and FRI statistics. |
| `zoneDiv` | record bounds | Zone boundaries (yr BP) for stratified FRI and Weibull analysis. |

---

## Comparison with MATLAB v2.0

*CharAnalysis* v2.0.0 (R) is a direct translation of *CharAnalysis* v2.0 (MATLAB). Outputs are quantitatively equivalent on all validated reference datasets. The table below summarises results across the four validation datasets; full details are in `inst/z_Validation_report_R_vs_MATLAB_V_2.0.md`.

| Dataset | Site | Smoothing | charBkg max\|diff\| | Peaks R v2.0.0 | Peaks MATLAB v2.0 |
|---------|------|-----------|-------------------|---------|-------------|
| CO | Code Lake, AK | Method 1 (lowess) | ~5 × 10^-6^ | 39 | 48 |
| CH10 | Chickaree Lake, CO | Method 2 (robust lowess) | 0.267 | 59 | 50 |
| SI17 | Silver Lake, CO | Method 2 (robust lowess) | 0.130 | 25 | 31 |
| RA07 | Raven Lake, AK | Method 2 (robust lowess) | < 0.001 | 15 | 17 |

Two sources of numerical difference are documented:

**1. Robust lowess background (smoothing method 2)** — MATLAB's Curve Fitting Toolbox `smooth(..., 'rlowess')` and the R `char_lowess()` port produce slightly different C~background~ series. For gap-free records (RA07) the difference is negligible (< 0.001). For records with NaN gaps (CH10) the difference is larger (≤ 0.267) because the two implementations handle gap positions differently inside the bisquare robustness iteration. Smoothing method 1 (plain lowess) is not affected and agrees to within floating-point noise on all datasets.

**2. Gaussian mixture model (GMM) peak counts** — The R package ports the MATLAB `GaussianMixture.m` EM algorithm directly. Floating-point arithmetic accumulates differently across languages during EM iterations, causing the two implementations to reach slightly different threshold values in some local windows. Peak counts differ by 10–20% across datasets, with the direction varying (R sometimes higher, sometimes lower). All threshold and peak differences are downstream consequences of this single source; interpolation and peak-magnitude outputs agree to within numerical precision.

**Weibull confidence intervals** — Bootstrap CIs use random resampling and will differ between R and MATLAB runs regardless of any other differences. Weibull point estimates (scale *b*, shape *c*) agree within a few percent on datasets where peak counts are similar.

---

## Citation

If you use *CharAnalysis* in published research, please cite Higuera et al. (2009), the first study to apply the core analytical tools implemented in the program. If you used *CharAnalysis* v2.0 (MATLAB or R v2.0.0) specifically, please also cite the software:

> Higuera, P.E., L.B. Brubaker, P.M. Anderson, F.S. Hu, and T.A. Brown. 2009. Vegetation mediated the impacts of postglacial climate change on fire regimes in the south-central Brooks Range, Alaska. *Ecological Monographs* 79:201–219. https://doi.org/10.1890/07-2019.1

> Higuera, P.E. 2026. *CharAnalysis*: Diagnostic and analytical tools for peak analysis in sediment-charcoal records (Version 2.0). Zenodo. https://doi.org/10.5281/zenodo.19304064

---

## References

Higuera, P.E., L.B. Brubaker, P.M. Anderson, F.S. Hu, and T.A. Brown. 2009. Vegetation mediated the impacts of postglacial climate change on fire regimes in the south-central Brooks Range, Alaska. *Ecological Monographs* 79:201–219. https://doi.org/10.1890/07-2019.1