Towards a Sweetpotato Genomic-Enabled Breeding:

Optimizing Two-Stage Analysis of Multi-Environment Augmented Trials

Saulo Chaves

San Diego, USA
January 2026

Objectives

  • To provide a brief overview about the importance of sweetpotato breeding in developing countries
  • To demonstrate the importance of genomic selection as a feasible strategy in sweetpotato breeding
  • To show the utility of a proper two-stage analysis when facing complex situations

Content

  1. Background
    • Sweetpotato breeding
    • Genomic selection
  2. Population
    • Material and experimental design
  3. Models and outputs
    • Two-stage models
    • Single-stage models
  4. Wrap-up

Background

Sweetpotato

  • High production and consumption in Sub-Saharan Africa
    • 3rd most important food crop in 7 Eastern and Central African countries
    • 4th in 6 Southern African countries
    • 8th in 4 of Western African countries

Sweetpotato

  • Tolerates marginal growing conditions
  • Provides more edible energy per hectare per day than wheat, rice and cassava
  • A valuable source of vitamins A, B, C, and E

Sweetpotato

Orange-fleshed sweetpotato

“Just 125 g of fresh roots from most orange-fleshed varieties contain enough beta-carotene to provide the daily pro-vitamin A needs of a preschooler”

CIP

Sweetpotato breeding


Genomic selection

Modelling strategies

  • Single-stage model

\[ \boldsymbol{y} = \boldsymbol{X}\boldsymbol{b} + \boldsymbol{Z}\boldsymbol{g} + \boldsymbol{\varepsilon} \]

  • Plot-level multi-environment model
    • Gold-standard practice
    • No loss of information
  • Two-stage model
  • Single-environment model
    • Obtain adjusted means: \(\boldsymbol{y} = \boldsymbol{X}_1\boldsymbol{b} + \boldsymbol{X}_2\boldsymbol{g} + \boldsymbol{\varepsilon}\)
  • Weighted multi-environment model
    • Obtain GEBVs: \(\boldsymbol{\bar{y}} = \boldsymbol{X}\boldsymbol{e} + \boldsymbol{Z}\boldsymbol{g} + \boldsymbol{\varepsilon}\)

Modelling strategies

  • If single-stage models are gold standard, why using two-stage models?
    • Lack of computational resource
    • Data availability (e.g.: historical data)
    • Complexity of single-stage models

Hypothesis and objective

Considering that the efficiency of the two-stage approach relies heavily on the quality of the adjusted means obtained in the first stage, our goal was to test the hypothesis that dBLUPs or pedigree-based dBLUPs (dABLUPs) would be more appropriate as inputs for second-stage models than BLUEs

Population

Population

  • 1,138 test clones
    • Two pools: partial diallel
    • ~20 parents each
  • 39 parents
  • 8 checks
  • Trait: storage root yield

Genotyping

Models and outputs

Single-environment analyses

\[ \boldsymbol y = \boldsymbol 1 \mu + \boldsymbol{X}_1 \boldsymbol b + \boldsymbol{X}_2 \boldsymbol t + \boldsymbol{Z}_1 \boldsymbol g + \boldsymbol {Z}_2 \boldsymbol r + \boldsymbol Z_3 \boldsymbol c + \boldsymbol Z_4 \boldsymbol{rc} + \boldsymbol \varepsilon \]

  • \(\boldsymbol g\) as fixed \(\to\) BLUEs
  • \(\boldsymbol g\) as random with \(\boldsymbol g \sim \mathcal N \left(\boldsymbol 0, \sigma^2_g \boldsymbol I_V \right) \to\) dBLUPs
  • \(\boldsymbol g\) as random with \(\boldsymbol g \sim \mathcal N \left(\boldsymbol 0, \begin{bmatrix} \sigma^2_a \color{red}{\boldsymbol A} & \boldsymbol 0 \\ \boldsymbol 0 & \sigma^2_{na} \color{red}{\boldsymbol I_V} \end{bmatrix} \right) \to\) dABLUPs
    • \(\boldsymbol g^\prime = \left[ \boldsymbol a \; \boldsymbol {na} \right]\) and \(\boldsymbol Z_1 = \left[\boldsymbol Z_{1_a} \; \boldsymbol Z_{1_{na}} \right]\)

Second-stage model

\[ \boldsymbol y^\star = \boldsymbol 1 \mu + \boldsymbol X \boldsymbol e + \boldsymbol Z \boldsymbol g + \boldsymbol \varepsilon \]

  • Full weight (FW): \(\boldsymbol{ \varepsilon} \sim \mathcal N \left(\boldsymbol 0, \boldsymbol \Omega^{-1} \right)\)
  • Diagonal weight (DW): \(\boldsymbol{ \varepsilon} \sim \mathcal N \left[\boldsymbol 0, D\left(\boldsymbol \Omega^{-1}\right) \right]\)
  • \(\boldsymbol g\) partitioned into additive and non-additive: \(\boldsymbol g^\prime = \left[ \boldsymbol a \; \boldsymbol {na} \right]\)
    • Additive: Factor analytic covariance structure: \(\boldsymbol a \sim \mathcal{N}[ \boldsymbol 0, \underbrace{\left(\boldsymbol \Lambda \boldsymbol \Lambda^\prime + \boldsymbol \Psi\right)}_{\boldsymbol \Sigma_g} \otimes \boldsymbol G]\)

Single-stage model

\[ \boldsymbol y = \boldsymbol X \boldsymbol f + \boldsymbol{Z}_1 \boldsymbol g + \boldsymbol {Z}_2 \boldsymbol r + \boldsymbol Z_3 \boldsymbol c + \boldsymbol Z_4 \boldsymbol{rc} + \boldsymbol \varepsilon \]

  • \(\boldsymbol f \to\) fixed effects
  • \(\boldsymbol g^\prime = \left[ \boldsymbol a \; \boldsymbol {na} \right]\)
    • Additive: factor analytic covariance structure
  • Block-diagonal covariance structure for design-related effects, spatial adjustments and residuals

Factor Analytic model selection

Genotype-by-environment interaction

Partitioning the genotype-by-environment interaction (Bančič et al., 2024)

Selection

Cross-validations

More important than changing the entry is providing proper weighting!

Cross-validations

Particularizing the predictions by pool did not harm the model’s performance and had less computational cost!

Wrap-up

Key messages

  • Despite the possible loss of information, two-stage models are still useful
    • Early-stage multi-environment unreplicated trials
  • A proper weighting is crucial for obtaining correct results from two-stage models
    • Models weighted using the full matrix performed better than those weighted using diagonal weights
  • Is possible to simplify models focusing or per pool predictions

Acknowledgments

Funding and management:

  • Bill & Melinda Gates Foundation
  • CGIAR
  • CIP
  • UFV
  • ESALQ-USP

People:

  • UFV: Guilherme da Silva Pereira, Kaio Olimpio G. Dias
  • CIP: Reuben Ssali, Bert de Boeck, Thiago Mendes, Hannele Lindqvist-Kreuze, Hugo Campos
  • ESALQ-USP: José Tiago Barroso Chagas
  • NCSU: Craig Yencho

Thank you!

📧 saulochaves@usp.br