1 Preamble

suppressPackageStartupMessages(library(plantmix))


2 Models

2.1 Overview

The phenotypic variation among varietal mixtures can be decomposed into contributions of general and specific mixing abilities, noted GMA and SMA respectively. Note that these models tackle the case where only a single phenotypic observation is available per plot of a mixed stand: there is no access to genotypic performances within the mixture.

Forst et al (2019) introduced three models to jointly analyze stands of any order, that is, monovarietal stands (order 1) and mixed stands (order > 1). This article also distinguished between inter-genotypic interactions, noted \(SMA_{ij}\), and intra-genotypic interactions, noted \(SMA_{ii}\). Moreover, all models were framed into the formalism of linear mixed models, GMAs and SMAs being random effects, in order to deal with incomplete designs.

All three models have a contribution of \(GMA\), but they distinguish from each other based on the kind of interactions that are explicitly modeled:

  • model 1 assumes no interaction (no \(SMA\));

  • model 2 assumes inter-genotypic interactions (\(SMA_{ij}\)) in mixed stands and intra-genotypic interactions (\(SMA_{ii}\)) in monovarietal stands only, both kinds of interactions coming from a single distribution;

  • model 3 assumes inter-genotypic interactions in mixed stands (\(SMA_{ij}\)) and intra-genotypic interactions (\(SMA_{ii}\)) in both monovarietal and mixed stands, and here again with both kinds of interactions coming from a single distribution.

Other models are also conceivable, such as:

  • 2’, a variant of model 2: \(SMA_{ij}\) in mixed stands and no other interactions;

  • 2’’, another variant of model 2: \(SMA_{ij}\) in mixed stands and \(SMA_{ii}\) in monovarietal stands, but with a different distribution for each kind of interactions;

  • 3’, a variant of model 3: \(SMA_{ij}\) in mixed stands and \(SMA_{ii}\) in both monovarietal and mixed stands, but with a different distribution for each kind of interactions.

2.2 Notations

  • \(n\) is the index of a given plot that can contain a monovarietal or a mixed stand;

  • \(K(n)\) is the number of genotypes in plot \(n\) (1 if monovarietal stand, >1 if mixed);

  • \(y_{nb}\) is the response variable corresponding to the yield of plot \(n\) in block \(b\);

  • \(\mu\) is the intercept;

  • \(\alpha_b\) is the effect of block \(b\), modeled as “fixed”;

  • \(GMA_{k(n)}\) is the general mixture ability of genotype \(k\) in plot \(n\), modeled as “random”;

  • \(SMA_{k(n)k'(n)}\) is the specific mixture ability of the genotype pair \(k\) and \(k'\) in plot \(n\), modeled as “random”.

2.3 Major assumptions

  1. Only first-order (i.e., pairwise) interactions are taken into account; high-order interactions are ignored.

  2. Genotypes in mixed stands are assumed to be sowned in equal proportions, \(1 / K(n)\), and to be harvested in the same proportions (but the proportions can be adapted as indicated at the end of section 4.7 in the article).

  3. The magnitude of interactions is inversely proportional to mixture size.

2.4 Model 1

There are only GMA effects and no SMA effect:

\[ y_{nb} = \mu + \alpha_b + \frac{1}{K(n)} \sum_{k(n)=1}^{K(n)} GMA_{k(n)} + \epsilon_{nb} \]

2.5 Model 2

There are both GMA and SMA effects:

  • in mixed stands, SMAs only contain inter-genotypic interactions;

  • monovarietal stands also have SMAs, corresponding to intra-genotypic interactions.

\[ y_{nb} = \mu + \alpha_b + \frac{1}{K(n)} \sum_{k(n)=1}^{K(n)} GMA_{k(n)} + \frac{1}{C_{K(n)}^2} \sum_{k(n)=1}^{K(n)-1} \sum_{k'(n)=k(n)+1}^{K(n)} SMA_{k(n)k'(n)} + \epsilon_{nb} \]

where \(C_{K(n)}^2\) corresponds to the number of combinations of \(K(n)\) genotypes taken two at a time (used for mixed stands only).

Note that interactions are not oriented, i.e., \(SMA_{k(n)k'(n)} = SMA_{k'(n)k(n)}\).

2.6 Model 3

There are also both GMA and SMA effects:

  • in mixed stands, SMAs contains inter-genotypic interactions as well as intra-genotypic interactions;

  • monovarietal stands also have SMAs, corresponding to intra-genotypic interactions.

\[ y_{nb} = \mu + \alpha_b + \frac{1}{K(n)} \sum_{k(n)=1}^{K(n)} GMA_{k(n)} + \frac{1}{K(n)^2} \sum_{k(n)=1}^{K(n)} \sum_{k'(n)=1}^{K(n)} SMA_{k(n)k'(n)} + \epsilon_{nb} \]

2.7 Models in vector form

All three models can be written in vector form:

\[ \boldsymbol{y} = X \, \boldsymbol{\beta} + Z_G \, \boldsymbol{GMA} + (\, Z_S \, \boldsymbol{SMA} \, ) + \boldsymbol{\epsilon} \]

where vectors are in bold.

The design matrices are \(X\), \(Z_G\) and \(Z_S\). They make the correspondence between \(\boldsymbol{y}\) and the levels of the effects, respectively \(\boldsymbol{\beta}\) (fixed), \(\boldsymbol{GMA}\) (random) and \(\boldsymbol{SMA}\) (random). All three matrices have as many rows as the length of \(\boldsymbol{y}\).

  • \(\boldsymbol{GMA} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{GMA} A_1)\) where \(A_1\) is known

  • \(\boldsymbol{SMA} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{SMA} A_2)\) for both \(SMA_{ij}\) and \(SMA_{ii}\) where \(A_2\) is also known

  • \(\boldsymbol{\epsilon} \sim \mathcal{N}\boldsymbol{0}, \sigma^2_e Id)\)

2.8 Other models

  • model 2’, with \(SMA_{ij}\) in mixed stands and no other interactions: same as model 2 except that there is no \(SMA_{ii}\)

  • model 2’’, with \(SMA_{ij}\) in mixed stands and \(SMA_{ii}\) only in monovarietal stands, with a different distribution for each kind of interactions: same as model 2 except that \(\boldsymbol{SMA}_{ij} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{SMA_{ij}} A_2)\) and \(\boldsymbol{SMA}_{ii} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{SMA_{ii}} A_2)\)

  • model 3’, with \(SMA_{ij}\) in mixed stands and \(SMA_{ii}\) in both monovarietal and mixed stands, with a different distribution for each kind of interactions: same as model 3 except that \(\boldsymbol{SMA}_{ij} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{SMA_{ij}} A_2)\) and \(\boldsymbol{SMA}_{ii} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{SMA_{ii}} A_2)\)

2.9 Examples of design matrices

For pedagogical purposes, let us imagine a design with three genotypes (g1, g2 and g3) containing one monovarietal stand (stand1), one two-way mixture (stand2) and one three-way mixture (stand3).

id genos type order
1 stand1 g1 monovarietal 1
2 stand2 g1-g2 mixed 2
3 stand3 g1-g2-g3 mixed 3

2.9.1 Design matrix for GMAs

\(Z_G\) has as many columns as the number of genotypes in the experimental design. From the example data frame above, \(Z_G\) hence has 3 rows and 3 columns.

Weights:

  • for a monovarietal stand: \(1 \, / \, K(n) = 1 \, / \, 1\)

  • for a binary mixture: \(1 \, / \, K(n) = 1 \, / \, 2\)

  • for a ternary mixture: \(1 \, / \, K(n) = 1 \, / \, 3\)

  • etc

g1 g2 g3
g1 1.00 0.00 0.00
g1-g2 0.50 0.50 0.00
g1-g2-g3 0.33 0.33 0.33

2.9.2 Design matrix for SMAs

\(Z_S\) has as many columns as the total number of possible intra-genotypic interactions plus the total number of possible (pairwise) inter-genotypic interactions. From the example data frame above, \(Z_S\) hence has \(3\) rows and \(3 + C_3^2 = 6\) columns.

2.9.2.1 Model 2

Weights:

  • for a binary mixture: \(1 \, / \, C_2^2 = 1 \, / \, 1\)

  • for a ternary mixture: \(1 \, / \, C_3^2 = 1 \, / \, 3\)

  • for a quaternary mixture: \(1 \, / \, C_4^2 = 1 \, / \, 6\)

  • etc

g1-g1 g1-g2 g1-g3 g2-g3
g1 1 0.00 0.00 0.00
g1-g2 0 1.00 0.00 0.00
g1-g2-g3 0 0.33 0.33 0.33

2.9.2.2 Model 3

Weights:

  • for a binary mixture: \(1 \, / \, K(n)^2 = 1 \, / \, 4\)

  • for a ternary mixture: \(1 \, / \, K(n)^2 = 1 \, / \, 9\)

g1-g1 g1-g2 g1-g3 g2-g2 g2-g3 g3-g3
g1 1.00 0.00 0.00 0.00 0.00 0.00
g1-g2 0.25 0.50 0.00 0.25 0.00 0.00
g1-g2-g3 0.11 0.22 0.22 0.11 0.22 0.11

2.9.2.3 Model 2’

Same as model 2 but no \(SMA_{ii}\):

g1-g2 g1-g3 g2-g3
g1 0.00 0.00 0.00
g1-g2 1.00 0.00 0.00
g1-g2-g3 0.33 0.33 0.33

2.9.2.4 Model 2’’

Same as model 2 but \(SMA_{ii}\) and \(SMA_{ij}\) with different distributions:

Design matrix for \(SMA_{ij}\) only:

g1-g2 g1-g3 g2-g3
g1 0.00 0.00 0.00
g1-g2 1.00 0.00 0.00
g1-g2-g3 0.33 0.33 0.33

Design matrix for \(SMA_{ii}\) only:

Z_S_ii
g1 1
g1-g2 0
g1-g2-g3 0

2.9.2.5 Model 3’

Same as model 3 but \(SMA_{ii}\) and \(SMA_{ij}\) with different distributions:

Design matrix for \(SMA_{ij}\) only:

g1-g2 g1-g3 g2-g3
g1 0.00 0.00 0.00
g1-g2 0.50 0.00 0.00
g1-g2-g3 0.22 0.22 0.22

Design matrix for \(SMA_{ii}\) only:

g1-g1 g2-g2 g3-g3
g1 1.00 0.00 0.00
g1-g2 0.25 0.25 0.00
g1-g2-g3 0.11 0.11 0.11

3 Data simulation

For the purpose of illustration, varietal mixtures will be assembled from a panel of 25 genotypes. Only 75 binary, balanced mixtures will be observed, among all 300 possible mixtures. Phenotypes will be simulated according to model 2 (see above), organized into a field trial with each mixture present in each block.

Ignored below, but available in the inference functions:

  • using data in monovarietal conditions while analyzing mixture data;

  • taking genomic relationships into account.

3.1 Sparse design

nbGenos <- 25
levGenos <- sprintf(
  fmt = paste0("geno%0", floor(log10(nbGenos)) + 1, "i"),
  1:nbGenos
)
nbMixes <- 75 # only binary and balanced
design <- getDesignBinaryVarMix(levGenos, nbMixes, seed = 12345)
tmp <- getMixturesPerGeno(getMixtureList(design$combs))
table(sapply(tmp, length)) # each genotype is in the same nb of mixtures -> balanced design
## 
##  6 
## 25

nbBlocks <- 3
levBlocks <- LETTERS[1:nbBlocks]
dat <- do.call(rbind, lapply(levBlocks, function(block) {
  data.frame(
    stand = paste0(design$combs$comp1, "_", design$combs$comp2),
    block = block,
    stringsAsFactors = TRUE
  )
}))
str(dat)
## 'data.frame':    225 obs. of  2 variables:
##  $ stand: Factor w/ 75 levels "geno01_geno03",..: 1 7 13 15 8 22 2 9 35 32 ...
##  $ block: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...

3.2 Design matrices

The design matrices can be generated with the mkZGMA and mkZSMA functions. See also mkAllZSMA to make the SMA design matrices all at once.

listContr <- list(block = "contr.sum")
X <- model.matrix(~ 1 + block, data = dat, contrasts = listContr)
Z_GMA <- mkZGMA(dat, "stand", sep = "_")
Z_SMA <- mkZSMA(dat, "stand", sep = "_", inc_SMA_ii = "no")

3.3 Parameters

truth <- list(
  "intercept" = 80,
  "var_GMA" = 10,
  "var_SMA" = 2,
  "var_error" = 1
)
set.seed(1234)
truth[["blockEffs"]] <- sample(x = c(-1, 1), size = nbBlocks - 1, replace = TRUE) *
  rnorm(n = nbBlocks - 1, mean = 3, sd = 5)
truth[["GMA"]] <- rnorm(n = nbGenos, mean = 0, sd = sqrt(truth$var_GMA))
truth[["SMA"]] <- rnorm(n = nbMixes, mean = 0, sd = sqrt(truth$var_SMA))
truth[["errors"]] <- rnorm(n = nrow(dat), mean = 0, sd = sqrt(truth$var_error))

3.4 Phenotypes

y <- X %*% c(truth$intercept, truth$blockEffs) +
  Z_GMA %*% truth$GMA + Z_SMA %*% truth$SMA +
  truth$errors
dat$yield <- y[, 1]

boxplot(yield ~ block, data = dat, las = 1, main = "Simulated data")

4 Inference

Models 1 and 2 will be fitted and compared.

In this first vignette, only the lme4 package will be used, but see the second vignette that shows how to use TMB allowing to provide a genomic relationship matrix and to quantify the uncertainty around estimates of variance components.

4.1 Model fits

listZ1 <- list("GMA" = mkZGMA(dat, "stand", sep = "_"))
system.time(
  fit1 <- fitGMASMA(yield ~ 1 + block, dat, listZ1, pkg = "lme4", contrasts = listContr)
)
##    user  system elapsed 
##   0.146   0.004   0.155

listZ2 <- list(
  "GMA" = mkZGMA(dat, "stand", sep = "_"),
  "SMA" = mkZSMA(dat, "stand", sep = "_", inc_SMA_ii = "no")
)
system.time(
  fit2 <- fitGMASMA(yield ~ 1 + block, dat, listZ2, pkg = "lme4", contrasts = , listContr)
)
##    user  system elapsed 
##   0.174   0.000   0.175

4.2 Checks

BLUEs <- fixef(fit2)
data.frame(
  "true" = c(truth$intercept, truth$blockEffs),
  "estim" = BLUEs
)
##                  true      estim
## (Intercept) 80.000000  83.278413
## blockB       4.387146   4.134128
## blockC       8.422206 -17.217861

estV <- as.data.frame(lme4::VarCorr(fit2))
data.frame(
  "true" = c(truth$var_GMA, truth$var_SMA, truth$var_error),
  "estim" = c(estV$vcov[estV$grp == "GMA"], estV$vcov[estV$grp == "SMA"], estV$vcov[estV$grp == "Residual"]),
  row.names = c("GMA", "SMA", "error")
)
##       true    estim
## GMA     10 8.276224
## SMA      2 1.391127
## error    1 1.009792

BLUPs <- ranef(fit2)
cor(truth$GMA, BLUPs$GMA[, "(Intercept)"])
## [1] 0.8504539
cor(truth$SMA, BLUPs$SMA[, "(Intercept)"])
## [1] 0.7285536
op <- par(mfrow = c(1, 2))
for (MA in c("GMA", "SMA")) {
  plot(BLUPs[[MA]][, "(Intercept)"], truth[[MA]],
    xlab = paste0("BLUP(", MA, ")"), ylab = paste0("true ", MA),
    main = "Accuracy with lme4", las = 1, pch = 19
  )
  abline(a = 0, b = 1, v = 0, h = 0, lty = 2)
  abline(lm(truth[[MA]] ~ BLUPs[[MA]][, "(Intercept)"]), col = "red")
}

par(op)

4.3 Model comparisons

fits <- list("mod1" = fit1, "mod2" = fit2)
t(summarizeGMASMAs(fits))
##           mod1      mod2     
## AIC       882.5341  823.1463 
## R2        0.9800357 0.9922544
## RMSE      1.353726  0.8431985
## var.GMA   9.047029  8.276224 
## var.SMA   NA        1.391127 
## var.SMAij NA        NA       
## var.SMAii NA        NA       
## var.err   2.067555  1.009792

As expected, model 2 is better than model 1 on this example (as it was used to simulate it).

5 Reference

See the article for more details:

  • Forst E, Enjalbert J, Allard V, Ambroise C, Krissaane I, Mary-Huard T, Robin S, Goldringer I. 2019. A generalized statistical framework to assess mixing ability from incomplete mixing designs using binary or higher order variety mixtures and application to wheat. Field Crops Research. 242:107571. doi: 10.1016/j.fcr.2019.107571.

6 Appendix

t1 <- proc.time()
t1 - t0
##    user  system elapsed 
##   3.262   0.096   3.372
print(sessionInfo(), locale = FALSE)
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Debian GNU/Linux 12 (bookworm)
## 
## Matrix products: default
## BLAS:   /usr/local/R/4.5.1/lib/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0  LAPACK version 3.11.0
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] plantmix_1.0.2 lme4_1.1-37    Matrix_1.7-3   ggplot2_3.5.2  knitr_1.50    
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       jsonlite_2.0.0     dplyr_1.1.4        compiler_4.5.1    
##  [5] Rcpp_1.1.0         tidyselect_1.2.1   dichromat_2.0-0.1  jquerylib_0.1.4   
##  [9] splines_4.5.1      scales_1.4.0       boot_1.3-31        yaml_2.3.10       
## [13] fastmap_1.2.0      lattice_0.22-7     R6_2.6.1           generics_0.1.4    
## [17] igraph_2.1.4       rbibutils_2.3      MASS_7.3-65        tibble_3.3.0      
## [21] nloptr_2.2.1       minqa_1.2.8        TMB_1.9.17         bslib_0.9.0       
## [25] pillar_1.11.0      RColorBrewer_1.1-3 rlang_1.1.7        cachem_1.1.0      
## [29] xfun_0.52          sass_0.4.10        cli_3.6.5          withr_3.0.2       
## [33] magrittr_2.0.3     Rdpack_2.6.4       digest_0.6.37      grid_4.5.1        
## [37] lifecycle_1.0.5    nlme_3.1-168       reformulas_0.4.1   vctrs_0.6.5       
## [41] evaluate_1.0.4     glue_1.8.0         farver_2.1.2       rmarkdown_2.29    
## [45] tools_4.5.1        pkgconfig_2.0.3    htmltools_0.5.8.1