---
title: "Solving nonlinear programs with Uno"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Solving nonlinear programs with Uno}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = requireNamespace("Uno", quietly = TRUE)
)
library(Uno)
```

## Introduction

**Uno** wraps the [Uno](https://github.com/cvanaret/Uno) C++ solver
(Unifying Nonlinear Optimization) through its C API. One describes a
nonlinear program with R callbacks — objective, gradient, constraints,
constraint Jacobian, and Lagrangian Hessian — and Uno solves it. The
package is the R analog of the `unopy` Python binding, and is intended
as the nonlinear (DNLP) solver backend for
[CVXR](https://cvxr.rbind.io).

```{r version}
uno_version()
```

Two solver paths are available, selected with the `preset` argument:

* **`ipopt`** — an interior-point method whose symmetric-indefinite KKT systems
  are factored by **MUMPS**, reached at run time through the
  [rmumps](https://cran.r-project.org/package=rmumps) package. It handles
  indefiniteness (nonconvex problems) through inertia correction.
* **`filtersqp`** — a filter SQP method whose QP subproblems are solved by
  **HiGHS** (built from source with this package). HiGHS solves *convex* QPs, so
  this preset is the right choice for convex problems.

We provide some examples to illustrate how the
**callbacks** and their **sparsity structures** fit together.

## Problem Description

Uno minimizes (or maximizes) a smooth objective subject to smooth constraints
and bounds,

$$
\begin{aligned}
\min_{x \in \mathbb{R}^n} \quad & f(x) \\
\text{subject to} \quad & c^{\mathrm{L}} \le c(x) \le c^{\mathrm{U}}, \\
                        & x^{\mathrm{L}} \le x \le x^{\mathrm{U}},
\end{aligned}
$$

where bounds may be infinite. As with most NLP solvers, the *type* of each
constraint is encoded entirely in its bounds: an equality sets the lower and
upper bound equal, and a one-sided inequality leaves the open side infinite.

## Anatomy of `uno_solve()`

```r
uno_solve(
  n, lb, ub, sense,                 # variables, bounds, minimize/maximize
  obj, grad,                        # objective f and its gradient
  m, cl, cu, cons,                  # constraints c and their bounds
  jac_rows, jac_cols, jac,          # constraint Jacobian (COO form)
  hess_rows, hess_cols, hess,       # Lagrangian Hessian (lower triangle, COO)
  x0, preset, base_indexing, verbose,
  options = list(),                 # Uno options, applied after the preset
  lagrangian_sign = "negative",     # sign convention of the Hessian you return
  ...
)
```

### Variables, bounds, and sense

`n` is the number of variables; `lb`/`ub` are length-`n` bound vectors (use
`-Inf`/`Inf` for free, and `lb[i] == ub[i]` to fix a variable). `sense` is
`"minimize"` or `"maximize"`.

```{r maximize}
# maximize -(x - 3)^2 on [0, 10]  ->  x = 3
mx <- uno_solve(
  n = 1L, lb = 0, ub = 10, sense = "maximize",
  obj  = function(x) -(x[1] - 3)^2,
  grad = function(x) -2 * (x[1] - 3),
  m = 0L, cl = numeric(), cu = numeric(), cons = function(x) numeric(),
  jac_rows = integer(), jac_cols = integer(), jac = function(x) numeric(),
  hess_rows = 1L, hess_cols = 1L, hess = function(x, sigma, lambda) sigma * (-2),
  x0 = 0, preset = "ipopt", base_indexing = 1L, verbose = FALSE,
  options = list(logger = "SILENT")
)
mx$primal
```

### Objective and gradient

`obj(x)` returns the scalar $f(x)$; `grad(x)` returns the **dense** length-`n`
gradient $\nabla f(x)$ (all `n` partials in variable order, including zeros).

### Constraints and their bounds

`m` is the number of constraints (`0` for an unconstrained problem, in which
case the constraint and Jacobian callbacks return empty vectors). `cons(x)`
returns the length-`m` vector $c(x)$, and `cl`/`cu` are its bounds:

| Constraint            | `cl` | `cu`   |
|-----------------------|:----:|:------:|
| $c_j(x) = b$          | `b`  | `b`    |
| $c_j(x) \ge a$        | `a`  | `Inf`  |
| $c_j(x) \le d$        |`-Inf`| `d`    |
| $a \le c_j(x) \le d$  | `a`  | `d`    |

### The constraint Jacobian (COO form)

The Jacobian $J_{jk} = \partial c_j/\partial x_k$ is supplied in **coordinate
(COO) form**: you declare its sparsity pattern *once* as parallel
`jac_rows`/`jac_cols` index vectors, and `jac(x)` returns just the nonzero
*values* in that same order.

The index vectors must be **integer** (`1L`, `2L`, …, or wrapped with
`as.integer()`), and their base is controlled by `base_indexing`:

* `base_indexing = 1L` — Fortran/R-style **one-based** indices (used throughout
  this vignette);
* `base_indexing = 0L` — C-style **zero-based** indices.

The base declared must match the indices you actually provided in
`jac_rows`/`jac_cols` (and `hess_rows`/`hess_cols`).

### The Lagrangian Hessian (lower triangle)

Like most interior-point/SQP solvers, Uno does not use the Hessian of the
objective, rather the Hessian of the **Lagrangian**, supplied as the **lower
triangle** in COO form (`hess_rows`/`hess_cols` with `row >= col`). The callback
receives the objective scale `sigma` ($\sigma$) and the constraint multipliers
`lambda`:

```r
hess <- function(x, sigma, lambda) ...   # lower-triangular nonzero values
```

The one subtlety unique to Uno is the **sign convention**. By default Uno uses

$$
\nabla^2_{xx} L = \sigma\, \nabla^2 f(x) \;-\; \sum_j \lambda_j\, \nabla^2 c_j(x)
\qquad (\texttt{lagrangian\_sign = "negative"}),
$$

so the constraint-Hessian terms are **subtracted**. If your derivations instead
produce the IPOPT/`sparsediff` convention

$$
\nabla^2_{xx} L = \sigma\, \nabla^2 f(x) \;+\; \sum_j \lambda_j\, \nabla^2 c_j(x)
\qquad (\texttt{lagrangian\_sign = "positive"}),
$$

pass `lagrangian_sign = "positive"` so the terms get the right sign. **The
convention you pass must match the convention your `hess` returns.** When all
constraints are *linear* their Hessians vanish and the choice is irrelevant.

### Options and the return value

`options` is a named list of Uno options applied *after* the preset (so they
override it); each value is coerced to the option's declared type, and an
unknown name or unacceptable value raises an error. We use
`list(logger = "SILENT")` below to keep Uno quiet.

`uno_solve()` returns a named list. The `optimization_status` and
`solution_status` are **named integers** of the form `c(SUCCESS = 0L)` — the
value is Uno's enum code and the name its canonical label — so you can key a
status map on `names(status)` and still read the bare code with `status[[1L]]`.
The list also carries `objective`, `primal`, the dual solution
(`constraint_dual`, `lower_bound_dual`, `upper_bound_dual`), KKT residuals, and
per-callback evaluation counters.

```{r quiet}
quiet <- list(logger = "SILENT")
```

## Example 1: HS071 — the full machinery

The Hock–Schittkowski problem #71 exercises everything at once: a nonlinear
objective, one nonlinear inequality, one nonlinear equality, and bounds.

$$
\begin{aligned}
\min_{x \in \mathbb{R}^4} \quad & x_1 x_4 (x_1 + x_2 + x_3) + x_3 \\
\text{s.t.}\quad & x_1 x_2 x_3 x_4 \ge 25, \\
                 & x_1^2 + x_2^2 + x_3^2 + x_4^2 = 40, \\
                 & 1 \le x_i \le 5.
\end{aligned}
$$

Both constraints depend on every variable, so the Jacobian is dense (8
nonzeros). The Lagrangian Hessian is a full symmetric $4\times4$ matrix; its
lower triangle has 10 nonzeros. We write the Hessian in the **positive**
convention (matching, e.g., the `ipopt` package's vignette).

```{r hs071}
hs071 <- uno_solve(
  n = 4L, lb = rep(1, 4), ub = rep(5, 4), sense = "minimize",

  obj  = function(x) x[1] * x[4] * (x[1] + x[2] + x[3]) + x[3],
  grad = function(x) c(
    x[4] * (2 * x[1] + x[2] + x[3]),
    x[1] * x[4],
    x[1] * x[4] + 1,
    x[1] * (x[1] + x[2] + x[3])
  ),

  m = 2L, cl = c(25, 40), cu = c(Inf, 40),   # g1 >= 25 ; g2 == 40
  cons = function(x) c(prod(x), sum(x^2)),

  # dense 2 x 4 Jacobian, one-based COO
  jac_rows = rep(1:2, each = 4L),
  jac_cols = rep(1:4, times = 2L),
  jac = function(x) c(
    x[2] * x[3] * x[4], x[1] * x[3] * x[4], x[1] * x[2] * x[4], x[1] * x[2] * x[3],
    2 * x[1], 2 * x[2], 2 * x[3], 2 * x[4]
  ),

  # lower triangle of the 4 x 4 Lagrangian Hessian, row by row
  hess_rows = as.integer(c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4)),
  hess_cols = as.integer(c(1, 1, 2, 1, 2, 3, 1, 2, 3, 4)),
  hess = function(x, sigma, lambda) {
    l1 <- lambda[1]; l2 <- lambda[2]
    c(
      sigma * (2 * x[4])                  + l2 * 2,        # (1,1)
      sigma * x[4]            + l1 * (x[3] * x[4]),        # (2,1)
      l2 * 2,                                             # (2,2)
      sigma * x[4]            + l1 * (x[2] * x[4]),        # (3,1)
      l1 * (x[1] * x[4]),                                 # (3,2)
      l2 * 2,                                             # (3,3)
      sigma * (2 * x[1] + x[2] + x[3]) + l1 * (x[2] * x[3]),  # (4,1)
      sigma * x[1]            + l1 * (x[1] * x[3]),        # (4,2)
      sigma * x[1]            + l1 * (x[1] * x[2]),        # (4,3)
      l2 * 2                                              # (4,4)
    )
  },

  x0 = c(1, 5, 5, 1), preset = "ipopt", base_indexing = 1L, verbose = FALSE,
  options = quiet, lagrangian_sign = "positive"
)

names(hs071$optimization_status)
hs071$objective                    # 17.014
hs071$primal                       # (1, 4.743, 3.821, 1.379)
hs071$constraint_dual
```

## Example 2: HS015 — nonconvex, the default (negative) convention

HS015 is nonconvex, which makes it a good illustration of two things: the
**default** Lagrangian sign convention, and *why nonconvex problems need the
interior-point preset*.

$$
\min_{x}\; 100\,(x_2 - x_1^2)^2 + (1 - x_1)^2
\quad\text{s.t.}\quad x_1 x_2 \ge 1,\; x_1 + x_2^2 \ge 0,\; x_1 \le 0.5 .
$$

The constraints are nonlinear, so here the sign convention matters. We leave
`lagrangian_sign` at its default `"negative"` and therefore **subtract** the
`lambda` terms in the Hessian.

```{r hs015}
hs015 <- uno_solve(
  n = 2L, lb = c(-Inf, -Inf), ub = c(0.5, Inf), sense = "minimize",

  obj  = function(x) 100 * (x[2] - x[1]^2)^2 + (1 - x[1])^2,
  grad = function(x) c(
    400 * x[1]^3 - 400 * x[1] * x[2] + 2 * x[1] - 2,
    200 * (x[2] - x[1]^2)
  ),

  m = 2L, cl = c(1, 0), cu = c(Inf, Inf),
  cons = function(x) c(x[1] * x[2], x[1] + x[2]^2),

  jac_rows = as.integer(c(1, 2, 1, 2)),
  jac_cols = as.integer(c(1, 1, 2, 2)),
  jac = function(x) c(x[2], 1, x[1], 2 * x[2]),

  hess_rows = as.integer(c(1, 2, 2)),
  hess_cols = as.integer(c(1, 1, 2)),
  hess = function(x, sigma, lambda) c(
    sigma * (1200 * x[1]^2 - 400 * x[2] + 2),   # (1,1): from the objective
    -400 * sigma * x[1] - lambda[1],            # (2,1): note the minus signs
    200 * sigma - 2 * lambda[2]                 # (2,2)
  ),

  x0 = c(-2, 1), preset = "ipopt", base_indexing = 1L, verbose = FALSE,
  options = quiet                                # lagrangian_sign = "negative" (default)
)

names(hs015$optimization_status)
hs015$objective                    # 306.5
hs015$primal                       # (0.5, 2)
```

## Example 3: Rosenbrock — unconstrained, with and without a Hessian

The Rosenbrock "banana" function is the classic unconstrained test problem,
with global minimum $0$ at $(1, 1)$. With no constraints (`m = 0`) there is no
Jacobian, and the Lagrangian Hessian reduces to $\sigma \nabla^2 f$
(so `lambda` is unused).

```{r rosenbrock}
rosen <- uno_solve(
  n = 2L, lb = c(-Inf, -Inf), ub = c(Inf, Inf), sense = "minimize",

  obj  = function(x) (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2,
  grad = function(x) c(
    -2 * (1 - x[1]) - 400 * x[1] * (x[2] - x[1]^2),
    200 * (x[2] - x[1]^2)
  ),

  m = 0L, cl = numeric(), cu = numeric(), cons = function(x) numeric(),
  jac_rows = integer(), jac_cols = integer(), jac = function(x) numeric(),

  hess_rows = as.integer(c(1, 2, 2)),
  hess_cols = as.integer(c(1, 1, 2)),
  hess = function(x, sigma, lambda) sigma * c(
    2 - 400 * x[2] + 1200 * x[1]^2,   # (1,1)
    -400 * x[1],                      # (2,1)
    200                               # (2,2)
  ),

  x0 = c(-1.2, 1), preset = "ipopt", base_indexing = 1L, verbose = FALSE,
  options = quiet
)
rosen$primal
```

If coding an exact Hessian is inconvenient, pass `hess = NULL` (with empty
`hess_rows`/`hess_cols`); Uno then builds an L-BFGS approximation from
gradients. This is only available with the **`ipopt`** preset — the HiGHS QP
solver behind `filtersqp` requires an explicit Hessian.

```{r rosenbrock-lbfgs}
rosen_lm <- uno_solve(
  n = 2L, lb = c(-Inf, -Inf), ub = c(Inf, Inf), sense = "minimize",
  obj  = function(x) (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2,
  grad = function(x) c(
    -2 * (1 - x[1]) - 400 * x[1] * (x[2] - x[1]^2),
    200 * (x[2] - x[1]^2)
  ),
  m = 0L, cl = numeric(), cu = numeric(), cons = function(x) numeric(),
  jac_rows = integer(), jac_cols = integer(), jac = function(x) numeric(),
  hess_rows = integer(), hess_cols = integer(), hess = NULL,
  x0 = c(-1.2, 1), preset = "ipopt", base_indexing = 1L, verbose = FALSE,
  options = quiet
)
rosen_lm$primal
```

## Example 4: Maximum-entropy distribution — equality constraints

This example is a staple of statistics and information-theory courses:
among all probability distributions $p$ on $\{1, \dots, K\}$ with a
prescribed mean $\mu$, find the one of maximum entropy. Equivalently,
minimize the negative entropy $\sum_i p_i \log p_i$ subject to $\sum_i
p_i = 1$ and $\sum_i i\,p_i = \mu$, $p_i \ge 0$.

Both constraints are **linear**, so their Hessians vanish — the Lagrangian
Hessian is just $\sigma\,\mathrm{diag}(1/p_i)$, a diagonal matrix, and the sign
convention is irrelevant.

```{r maxent}
K <- 5L; support <- 1:K; mu <- 2

ent <- uno_solve(
  n = K, lb = rep(1e-8, K), ub = rep(1, K), sense = "minimize",   # p_i > 0

  obj  = function(p) sum(p * log(p)),
  grad = function(p) log(p) + 1,

  m = 2L, cl = c(1, mu), cu = c(1, mu),                # two equalities
  cons = function(p) c(sum(p), sum(support * p)),
  jac_rows = rep(1:2, each = K),
  jac_cols = rep(seq_len(K), times = 2L),
  jac = function(p) c(rep(1, K), support),             # constant Jacobian

  hess_rows = seq_len(K), hess_cols = seq_len(K),      # diagonal only
  hess = function(p, sigma, lambda) sigma * (1 / p),

  x0 = rep(1 / K, K), preset = "ipopt", base_indexing = 1L, verbose = FALSE,
  options = quiet
)

round(ent$primal, 5)
c(total = sum(ent$primal), mean = sum(support * ent$primal))
```

Maximum-entropy theory says the answer is a discrete exponential distribution
$p_i \propto e^{-\theta i}$, with $\theta$ the multiplier on the mean
constraint — and `constraint_dual` recovers exactly that:

```{r maxent-check}
theta <- ent$constraint_dual[2]
cf <- exp(-theta * support); cf <- cf / sum(cf)
round(cf, 5)                       # matches ent$primal
```

## Example 5: a convex QP via the SQP (HiGHS) preset

The examples above all use the interior-point `ipopt` preset because they are
either nonconvex (HS015, Rosenbrock) or just convenient to solve that way. For a
**convex** problem the SQP preset `filtersqp`, backed by HiGHS, applies. Here is
a small convex QP, $\min (x_1-1)^2 + (x_2-2.5)^2$ subject to $x_1 + x_2 \le 3$,
$x \ge 0$:

```{r qp-filtersqp}
qp <- uno_solve(
  n = 2L, lb = c(0, 0), ub = c(Inf, Inf), sense = "minimize",
  obj  = function(x) (x[1] - 1)^2 + (x[2] - 2.5)^2,
  grad = function(x) c(2 * (x[1] - 1), 2 * (x[2] - 2.5)),
  m = 1L, cl = -Inf, cu = 3,
  cons = function(x) x[1] + x[2],
  jac_rows = as.integer(c(1, 1)), jac_cols = as.integer(c(1, 2)),
  jac = function(x) c(1, 1),
  hess_rows = as.integer(c(1, 2)), hess_cols = as.integer(c(1, 2)),
  hess = function(x, sigma, lambda) sigma * c(2, 2),
  x0 = c(0, 0), preset = "filtersqp", base_indexing = 1L, verbose = FALSE,
  options = quiet
)
qp$primal
qp$objective
```

## Choosing a preset

* **Convex** problems — including the smooth problems produced by CVXR's DNLP
  reduction — work with either preset. `filtersqp` (HiGHS) is natural for convex
  QP-shaped subproblems; `ipopt` (MUMPS) also works. `filtersqp` always requires
  an explicit Hessian (HiGHS cannot use an L-BFGS approximation).
* **Nonconvex** problems (HS015, Rosenbrock) produce indefinite subproblems that
  HiGHS cannot solve. Use the interior-point `ipopt` preset, which copes with
  indefiniteness through inertia correction in MUMPS.

## Solver options

Any Uno option can be passed through `options`; values are coerced to the
option's declared type and applied after the preset:

```{r options, eval = FALSE}
# cap the iteration count and tighten the tolerance
uno_solve(..., preset = "ipopt",
          options = list(max_iterations = 200L, tolerance = 1e-8))

# pick the linear solver explicitly, and silence output
uno_solve(..., preset = "ipopt",
          options = list(linear_solver = "MUMPS", logger = "SILENT"))
```

An unknown option name, or a value the option rejects, raises an error rather
than being silently ignored.

## Monitoring and early termination

`uno_solve()` accepts an optional `iter_callback`, a `function(info)` that Uno
calls at each *acceptable iterate*. `info` is a named list carrying the current
`primals`, the duals (`lower_bound_dual`, `upper_bound_dual`, `constraint_dual`),
the `objective_multiplier`, and the KKT residuals (`primal_feasibility`,
`stationarity`, `complementarity`). The callback's return value controls the
solve: return `FALSE` to continue, or `TRUE` to **stop early** — in which case
the `optimization_status` is `USER_TERMINATION`.

Used as a pure monitor (always returning `FALSE`), it lets you watch
convergence. Here we record the stationarity residual at each iterate of the
Rosenbrock solve:

```{r itercb-monitor}
ros_obj  <- function(x) (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
ros_grad <- function(x) c(-2 * (1 - x[1]) - 400 * x[1] * (x[2] - x[1]^2),
                          200 * (x[2] - x[1]^2))
ros_hess <- function(x, sigma, lambda)
  sigma * c(2 - 400 * x[2] + 1200 * x[1]^2, -400 * x[1], 200)

stationarity <- numeric(0)
mon <- uno_solve(
  n = 2L, lb = c(-Inf, -Inf), ub = c(Inf, Inf), sense = "minimize",
  obj = ros_obj, grad = ros_grad,
  m = 0L, cl = numeric(), cu = numeric(), cons = function(x) numeric(),
  jac_rows = integer(), jac_cols = integer(), jac = function(x) numeric(),
  hess_rows = as.integer(c(1, 2, 2)), hess_cols = as.integer(c(1, 1, 2)), hess = ros_hess,
  x0 = c(-1.2, 1), preset = "ipopt", base_indexing = 1L, verbose = FALSE, options = quiet,
  iter_callback = function(info) {
    stationarity[[length(stationarity) + 1L]] <<- info$stationarity
    FALSE                                            # keep going
  }
)
names(mon$optimization_status)
length(stationarity)                                 # one entry per acceptable iterate
signif(range(stationarity), 3)                       # residual shrinks toward 0
```

Returning `TRUE` stops the solve. For example, to bail out after five acceptable
iterates — note the super-assignment `<<-`, needed so the counter persists across
calls rather than writing a local copy:

```{r itercb-stop}
iter <- 0L
stopped <- uno_solve(
  n = 2L, lb = c(-Inf, -Inf), ub = c(Inf, Inf), sense = "minimize",
  obj = ros_obj, grad = ros_grad,
  m = 0L, cl = numeric(), cu = numeric(), cons = function(x) numeric(),
  jac_rows = integer(), jac_cols = integer(), jac = function(x) numeric(),
  hess_rows = as.integer(c(1, 2, 2)), hess_cols = as.integer(c(1, 1, 2)), hess = ros_hess,
  x0 = c(-1.2, 1), preset = "ipopt", base_indexing = 1L, verbose = FALSE, options = quiet,
  iter_callback = function(info) { iter <<- iter + 1L; iter >= 5L }
)
iter                                 # stopped after the 5th acceptable iterate
names(stopped$optimization_status)   # "USER_TERMINATION"
```

An error thrown inside the callback is caught and treated as "do not terminate,"
so a buggy monitor cannot crash the solve.

## Additional Notes

* Errors thrown inside a callback are caught and reported back to Uno as an
  evaluation error — they will not crash the R session. If this
  happens, please report the bug with a reproducible example.
* `uno_solve()` also accepts a `log_callback` (a sink for Uno's output stream);
  see `?uno_solve` for the full argument list.
