1 Introduction
Welcome to ProbBreed
! This guide will help you using the package’s functions, from fitting a Bayesian model to computing probabilities. Feel free to contact us if there are any issues via GitHub or e-mail. Further details are found within each function documentation (e.g., ?bayes_met
) and in the papers Dias et al. (2022) and Chaves et al. (2024).
library(ProbBreed)
2 Step one - fit the Bayesian model
The first step is to fit a multi-environment Bayesian model using the bayes_met
function. This function has a set of predefined models that are used to run Bayesian analyses using rstan
. There are some important details that needs to be stressed:
1- Currently, the function has nine options of models (Figure 1).
repl
), years (year
) and regions (reg
) effects in the bayes_met
function. Users must substitute Repl, Block, Year, and Region by the name of the column that contains the information of replicates, blocks (if applicable), year (if available) and region (if available). RCDB and IBD are acronyms for randomized complete block design and incomplete block design, respectively. A model using only adjusted means can only be fitted with reg = NULL
and year = NULL
These models differ according to the considered information: only locations, locations and years, locations and breeding regions, and locations, years and breeding regions. Should you consider an “environment” as a combination of environmental factors (for instance, location and planting dates within location), this information must be in the loc
argument. All models will have gen
and loc
. If you want to consider the effect of regions (or mega-environment) and/or time factors (for instance, years or harvests), then reg
and year
must represent the column that contains the information on these effects. Otherwise, these arguments are set to NULL
. You may control the experimental design effects in the repl
argument. If repl = NULL
, bayes_met
will assume that you are entering with adjusted means of each environment, i.e., data without replicates. If you have data from trials laid out in randomized complete blocks, repl = 'Replicate'
. Finally, if the multi-environment trials were laid out in incomplete blocks design, repl = c('Replicate', 'Block')
. By default, when repl = NULL
, reg
and year
must also be NULL
.
2- You may change the number of iterations, chains, and cores. This will vary according to the data and the processing capacity of your machine. The default is set 2000 iterations (including warm-up), 2 cores and 4 chains. Be aware that the more iterations and chains, the longer the computation time. At the same time, results are likely more reliable, and mixing/convergence issues will diminish. By default, if more than one core is set, the function runs one chain per core. The number of cores depends on the processing capacity of your machine. Choose wisely.
3- Sometimes, bayes_met
may yield warnings on possible mixing/convergence issues. More details on these problems and how to deal with them are available here. Usually, increasing the number of iterations is enough. If this does not work, bayes_met
has several arguments passed to rstan::sampling()
in which advanced users can configure the default sampling parameters (more details on bayes_met
documentation: help("bayes_met")
), and hopefully, diminish these issues. Nevertheless, before taking any measures - or even disposing of the model - we recommend doing the posterior predictive check using the extr_outs
function. Checking the \(\hat{R}\) statistic, Bayesian p-values and Empirical vs Sampled density plot is particularly useful. If bayes_met
showed warnings but the posterior predictive checks do not indicate serious issues, you can rely on the results and proceed with the analysis.
4- You can choose between a model with homogeneous or heterogeneous residual variances using res.het = FALSE
and res.het = TRUE
, respectively. When res.het = TRUE
, there will be a variance for each location (or environment). By default, when repl = NULL
, res.het
must be TRUE.
5- The Bayesian models implemented in ProbBreed
have predefined priors described by Dias et al. (2022). In summary: \[
x \sim \mathcal{N}\left(0,S^{[x]}\right)
\] \[
\sigma \sim \mathcal{HalfCauchy}\left(0, S^{[\sigma]}\right)
\]
where \(x\) can be any effect but the residual, and \(\sigma\) is the standard deviation of the likelihood. If an heterogeneous model is set (res.het = TRUE
), then \(\sigma_k \sim HalfCauchy\left(0, S^{[\sigma_k]}\right)\).
The hyperpriors are set as follows: \[ S^{[x]} \sim HalfCauchy(0, \phi) \] where \(\phi\) is the known global hyperparameter defined such as \(\phi = max(y) \times 10\). By doing this, we restrain the hyperparameter to be finite and allow the model to take full advantage of the data to infer the posterior distribution since it provide weakly informative prior distributions, avoiding subjectivity that can possibly hamper the results. The half-Cauchy distribution was used to guarantee that variance components will not be negative.
Let us see how the nine models can be fitted using bayes_met()
. The following models are only for educational purposes, and will not provide a valid output. We will use the terms “location”, “region” and “years” to avoid the possible confusion caused by the term “environment”, which can have several meanings.
2.1 Entry-mean model
\[ y_{jk} = \mu + l_k + g_j + \varepsilon_{jk} \]
where the \(y_{jk}\) is the phenotypic record of the \(j^{\text{th}}\) genotype in the \(k^{\text{th}}\) location, \(\mu\) is the intercept, \(l_k\) is the main effects of the \(k^{\text{th}}\) location, \(g_j\) is the main effect of the \(j^{\text{th}}\) genotype, and \(\varepsilon_{jk}\) is the residual effect.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = NULL,
year = NULL,
reg = NULL,
res.het = TRUE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
Within-environment probabilities of superior performance, and probabilities of superior stability are unavailable for this type of model, since we have no explicit genotype \(\times\) environment interaction effect in this case.
2.2 Randomized complete blocks design
2.2.1 Only locations
\[ y_{jkp} = \mu + l_k + b_{p(k)} + g_j + gl_{jk} + \varepsilon_{jkp} \]
where the \(y_{jkp}\) is the phenotypic record of the \(j^{\text{th}}\) genotype, allocated in the \(p^{\text{th}}\) block, in the \(k^{\text{th}}\) location. All other effects were previously defined but \(b_{p(k)}\), which is the effect of the \(p^{\text{th}}\) block in the \(k^{\text{th}}\) location, and \(gl_{jk}\), which correspond to the genotype-by-location interaction.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
year = NULL,
reg = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.2.2 Locations and regions
\[ y_{jkwp} = \mu + m_w + l_k + b_{p(k)} + g_j + gl_{jk} + gm_{jw} + \varepsilon_{jkwp} \]
All effects were previously defined but \(m_w\) and \(gm_{jw}\), which are the main effects of region and genotypes-by-regions interaction, respectively.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
reg = "Region",
year = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.2.3 Locations and years
\[ y_{jkhp} = \mu + t_h + l_k + b_{p(k)} + g_j + gl_{jk} + gt_{jh} + \varepsilon_{jkhp} \]
where \(t_h\) and \(t_{jh}\) are the main effect of years and the genotypes-by-years interaction effect, respectively.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
reg = NULL,
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.2.4 Locations, regions and years
\[ y_{jkhwp} = \mu + t_h + m_w + l_k + b_{p(k)} + g_j + gl_{jk} + gt_{jh} + gm_{jw} + \varepsilon_{jkhwp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
reg = "Region",
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.3 Incomplete blocks desing
2.3.1 Only locations
\[ y_{jkqp} = \mu + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + \varepsilon_{jkqp} \]
where the \(y_{jkqp}\) is the phenotypic record of the \(j^{\text{th}}\) genotype, allocated in the \(p^{\text{th}}\) block of the \(q^{\text{th}}\) replicate, in the \(k^{\text{th}}\) location. All other effects were previously defined but \(r_{q(k)}\), which is the effect of the \(q^{\text{th}}\) replicate in the \(k^{\text{th}}\) location. Note that the indices of \(b\) also change.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = NULL,
year = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.3.2 Locations and regions
\[ y_{jkwqp} = \mu + m_w + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + gm_{jw} + \varepsilon_{jkwqp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = "Region",
year = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.3.3 Locations and years
\[ y_{jkhqp} = \mu + t_h + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + gt_{jh} + \varepsilon_{jkhqp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = NULL,
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.3.4 Locations, regions and years
\[ y_{jkhwqp} = \mu + t_h + m_w + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + gt_{jh} + gm_{jw} + \varepsilon_{jkhwqp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = "Region",
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.4 Example dataset
For the next steps, we will use the maize
data, which is contained within the package. This dataset was used by Dias et al. (2022) in the paper that proposed the methods. It contains 32 single-cross hybrids and four commercial checks (36 genotypes in total) evaluated in 16 locations across five regions or mega-environments. These trials were laid out in incomplete blocks design, using a block size of 6 and two replications per trial. This dataset was kindly provided by Embrapa Milho e Sorgo.
= bayes_met(data = maize,
mod gen = "Hybrid",
loc = "Location",
repl = c("Rep","Block"),
trait = "GY",
reg = "Region",
year = NULL,
res.het = TRUE,
iter = 4000,
cores = 4,
chain = 4)
ProbBreed
uses rstan
to fit Bayesian models, so packages and functions designed to manage and explore these models can be used in the output object of the bayes_met
function. Some interesting options are rstantools
, shinystan
, and bayesplot
.
3 Step two - explore the model’s outputs
After fitting the model, the next step is to extract outputs using extr_outs
. This function also provides other useful information, like variance components and posterior predictive checks. Using the mod
object from the previous step:
= extr_outs(model = mod,
outs probs = c(0.05, 0.95),
verbose = TRUE)
where:
model
is the model fitted usingbayes_met
probs
are the probabilities considered to calculate the quantiles and the highest posterior density (HPD)
This function provides an object of class extr
, which is a list with the posterior distribution of each effect, the data generated by the model, the maximum posterior values of each effect, the variances of each effect, and quality parameters of the model (see below).
$variances outs
effect var sd naive.se HPD_0.05 HPD_0.95
1 Rep 0.028 0.033 0.000 0.000 0.092
2 Block 0.219 0.049 0.001 0.143 0.303
3 Hybrid 0.222 0.074 0.001 0.123 0.360
4 Location 7.956 3.752 0.042 3.821 15.125
5 Hybrid:Location 0.373 0.070 0.001 0.263 0.492
6 Region 4.441 16.508 0.185 0.016 16.736
7 Hybrid:Region 0.046 0.039 0.000 0.000 0.116
8 error_env1 0.875 0.195 0.002 0.596 1.225
9 error_env2 0.956 0.255 0.003 0.601 1.413
10 error_env3 1.402 0.322 0.004 0.951 1.988
11 error_env4 0.595 0.153 0.002 0.387 0.881
12 error_env5 1.072 0.238 0.003 0.730 1.504
13 error_env6 1.456 0.328 0.004 0.997 2.044
14 error_env7 0.287 0.075 0.001 0.188 0.424
15 error_env8 2.002 0.473 0.005 1.326 2.864
16 error_env9 0.575 0.164 0.002 0.358 0.878
17 error_env10 0.624 0.143 0.002 0.425 0.886
18 error_env11 1.251 0.284 0.003 0.840 1.760
19 error_env12 0.467 0.123 0.001 0.297 0.695
20 error_env13 0.734 0.178 0.002 0.487 1.060
21 error_env14 1.822 0.398 0.004 1.253 2.527
22 error_env15 0.818 0.190 0.002 0.555 1.171
23 error_env16 1.807 0.419 0.005 1.215 2.566
$ppcheck outs
Diagnostics
p.val_max 0.3359
p.val_min 0.3240
p.val_median 0.5528
p.val_mean 0.4981
p.val_sd 0.5342
Eff_No_parameters 184.1117
WAIC2 3925.8574
mean_Rhat 1.0011
Eff_sample_size 0.6517
The S3 method plot
will generate some useful plots, like the density plots and histograms built from the posterior effects (Figure 2). It can also build trace plots.
plot(outs, category = "density")
plot(outs, category = "histogram")
A particularly useful plot is the comparison between the empirical and sampled phenotype, which illustrates the model’s convergence (Figure 3). The more the density plots overlap, the more successful was the model in sampling the effects.
plot(outs)
See more plot options in help('plot.extr')
.
4 Step three - compute the probabilities
With the outputs extracted by extr_outs
, we can calculate the probabilities of superior performance and superior stability of the evaluated genotypes with prob_sup
:
= prob_sup(extr = outs,
results int = .2,
increase = TRUE,
save.df = FALSE,
verbose = FALSE)
where:
increase
: The objective is for increasing (TRUE
, default) or decreasing (FALSE
) the trait meanextr
: Anextr
object that contains the outputs of theextr_outs()
function.int
: The selection intensity, expressed in decimal values. In the example, the selection intensity is 20%.save.df
:TRUE
if you want to save the data frames containing the calculated probabilities in the working directory,FALSE
otherwise.
This function generates an object of class probsup
, which consists of two lists: across
and within
. The first list contains the probabilities of superior performance and superior stability across environments, while the second contains the probabilities of superior performance within environments. probsup
is also compatible with the S3 method plot
. See the details below or in help("plot.probsup")
4.1 Across-environments results
The across
list has the following outputs:
4.1.1 HPD of the posterior genotypic main effects
head(results$across$g_hpd)
gen g HPD95 HPD97.5 HPD5 HPD7.5
1 G1 0.64802254 1.04277679 1.11536085 0.2448963 0.17518977
2 G10 -0.48119025 -0.09811394 -0.02581486 -0.8853352 -0.95593432
3 G11 -0.06566819 0.31472466 0.38618334 -0.4392575 -0.51121607
4 G12 0.24970415 0.64208234 0.71103315 -0.1183366 -0.18457682
5 G13 0.37696273 0.76841281 0.84375660 -0.0166694 -0.08544481
6 G14 -0.26388539 0.11287745 0.18337804 -0.6380324 -0.71035304
plot(results, category = "hpd")
4.1.2 Probability of superior performance
Let \(\Omega\) be a subset of the high-performance selected genotypes according to the intensity of selection. A given genotype \(j\) will belong to \(\Omega\) if its genotypic marginal value \(\left(\hat{g}_j\right)\) is high (or low) enough compared to its peers. We can emulate the occurrence of \(S\) trials \(\left(s = 1, 2, \dots, S\right)\) by leveraging discretized samples of Monte Carlo posterior distributions of the fitted Bayesian models. Then, the probability of the \(j^{\text{th}}\) genotype belonging to \(\Omega\) is its ratio of success \(\left(\hat{g}_j \in \Omega\right)\) events over the total number of sampled events \(\left[S = \left(\hat{g}_j \in \Omega\right) + \left(\hat{g}_j \notin \Omega\right)\right]\), as follows:
\[
Pr\left(\hat{g}_j \in \Omega \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_j^{(s)} \in \Omega \vert y\right)
\] where \(I\left(\hat{g}_j^{(s)} \in \Omega \vert y\right)\) is an indicator variable that can assume two values: (1) if \(\hat{g}_j \in \Omega\) in the \(s^{\text{th}}\) sample, and (0) otherwise. The results provided by prob_sup
can be accessed as follows:
head(results$across$perfo)
ID prob
36 G9 0.998000
1 G1 0.925750
22 G29 0.808250
24 G30 0.655125
5 G13 0.607750
35 G8 0.568375
plot(results)
4.1.3 Pairwise probability of superior performance
To directly compare a selection candidate with a commercial check, or another promising genotype, we can calculate the pairwise probability of superior performance. This metric computes the probability of a genotype \(j\) performing better than a genotype \(j^\prime\):
\[
Pr\left(\hat{g}_{j} > \hat{g}_{j^\prime} \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_{j}^{(s)} > \hat{g}_{j^\prime}^{(s)} \vert y\right)
\] Note that this equation applies if the selection direction is to increase the trait value. If the aim is to decrease it (for example, plant height or susceptibility to a disease), you can set increase = FALSE
, and the following equation is employed:
\[ Pr\left(\hat{g}_{j} < \hat{g}_{j^\prime} \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_{j}^{(s)} < \hat{g}_{j^\prime}^{(s)} \vert y\right) \] The results are accessed using the following commands:
$across$pair_perfo[1:5, 1:5] results
G1 G10 G11 G12 G13
G1 0.000000 0.000375 0.012750 0.103000 0.197500
G10 0.999625 0.000000 0.904250 0.991625 0.996250
G11 0.987250 0.095750 0.000000 0.852500 0.925625
G12 0.897000 0.008375 0.147500 0.000000 0.664125
G13 0.802500 0.003750 0.074375 0.335875 0.000000
plot(results, category = "pair_perfo")
4.1.4 Probability of superior stability
Considering genotypes that have low genotype-by-environment interaction variance \(\left[var\left(ge_{jk}\right)\right]\) as stable, the probability of superior stability is given by the following equation:
\[ Pr\left[var\left(\widehat{ge}_{jk}\right) \in \Omega \vert y\right] = \frac{1}{S} \sum_{s=1}^S I\left[var\left(\widehat{ge}_{jk}^{(s)}\right) \in \Omega \vert y\right] \] where \(I\left[var\left(\widehat{ge}_{jk}^{(s)}\right) \in \Omega \vert y\right]\) indicates if \(var\left(\widehat{ge}_{jk}^{(s)}\right)\) exists in \(\Omega\) (1) or not (0) for the \(j^{th}\) genotype on the \(k^{th}\) environment. Note that this probability can only be computed across environments since it depends on \(var\left(\widehat{ge}_{jk}\right)\). Here are the codes:
head(results$across$stabi$gl)
ID prob
8 G16 0.503375
25 G31 0.457375
28 G34 0.428500
4 G12 0.399000
29 G35 0.398875
24 G30 0.380875
head(results$across$stabi$gm)
ID prob
13 G20 0.266000
24 G30 0.263500
28 G34 0.260375
18 G25 0.253375
32 G5 0.252875
5 G13 0.250250
plot(results, category = "stabi")
4.1.5 Pairwise probability of superior stability
We can also compare the stability of two selection candidates, or a experimental genotype and a commercial check for example. The following equation is used:
\[ Pr\left[var\left(\widehat{ge_{jk}}\right) < var\left(\widehat{ge}_{ik}\right) \vert y\right] = \frac{1}{S} \sum_{s=1}^S{I \left[var\left(\widehat{ge}_{jk}^{(s)}\right) < var\left(\widehat{ge}_{ik}^{(s)}\right) \vert y\right]} \]
Note that, in this case, we are interested in the genotype that has a lower variance of the genotype-by-environment (or genotype-by-region) interaction effects.
$across$pair_stabi$gl[1:5, 1:5] results
G1 G10 G11 G12 G13
G1 0.000000 0.754625 0.780375 0.885500 0.787250
G10 0.245375 0.000000 0.530125 0.702500 0.546625
G11 0.219625 0.469875 0.000000 0.681250 0.533125
G12 0.114500 0.297500 0.318750 0.000000 0.340375
G13 0.212750 0.453375 0.466875 0.659625 0.000000
$across$pair_stabi$gm[1:5, 1:5] results
G1 G10 G11 G12 G13
G1 0.000000 0.532750 0.562875 0.549000 0.57625
G10 0.467250 0.000000 0.522750 0.518125 0.53475
G11 0.437125 0.477250 0.000000 0.493750 0.51450
G12 0.451000 0.481875 0.506250 0.000000 0.52000
G13 0.423750 0.465250 0.485500 0.480000 0.00000
plot(results, category = "pair_stabi")
4.1.6 Joint probability of superior performance and superior stability
Assuming that the genotypic main effects are independent of the variance of the genotype-by-environment interaction effects, the joint probability of performance and superior stability follows the same idea as the probability of occurrence of two independent events:
\[ Pr\left[\hat{g}_j \in \Omega \cap var\left(\widehat{ge}_{jk}\right) \in \Omega\right] = Pr\left(\hat{g}_j \in \Omega\right) \times Pr\left[var\left(\widehat{ge}_{jk}\right) \in \Omega\right] \]
The results are accessed using the following commands:
head(results$across$joint)
ID level category prob
145 G1 Location Joint 0.0245323750
146 G10 Location Joint 0.0000198125
147 G11 Location Joint 0.0066500000
148 G12 Location Joint 0.1475302500
149 G13 Location Joint 0.1271716875
150 G14 Location Joint 0.0007494375
plot(results, category = "joint")
4.2 Within-environments results
The probabilities of superior performance can be investigated within individual environments. This is useful for specific recommendations. Here are the available results:
4.2.1 Probability of superior performance
It is exactly the same idea as previously stated, but instead of using the marginal genotypic effect of each genotype, we use the within environment genotypic effect \(\left(g_{jk} = g_j + ge_{jk}\right)\):
\[ Pr\left(\hat{g}_{jk} \in \Omega_k \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_{jk}^{(s)} \in \Omega_k \vert y\right) \]
The results are accessed using the following commands:
head(results$within$perfo$gl)
gen E1 E10 E11 E12 E13 E14 E15 E16
1 G1 0.982250 0.276500 0.403625 0.443000 0.283000 0.950875 0.302375 0.970000
2 G10 0.008625 0.001000 0.001375 0.001375 0.025500 0.069000 0.000000 0.039500
3 G11 0.004125 0.103375 0.107625 0.004125 0.599500 0.114625 0.750375 0.085125
4 G12 0.158875 0.087500 0.045500 0.064250 0.066875 0.309875 0.162500 0.563500
5 G13 0.203750 0.195750 0.149500 0.136000 0.071750 0.592625 0.809000 0.066375
6 G14 0.000000 0.004625 0.111375 0.161500 0.029125 0.065000 0.000625 0.022125
E2 E3 E4 E5 E6 E7 E8 E9
1 0.895000 0.613125 0.909750 0.525750 0.328625 0.007125 0.874875 0.395000
2 0.091375 0.191500 0.001125 0.048375 0.027750 0.002750 0.000000 0.046500
3 0.018500 0.029875 0.015750 0.109125 0.331125 0.014000 0.042875 0.232375
4 0.411375 0.599875 0.194375 0.387250 0.445000 0.825500 0.157875 0.314125
5 0.231000 0.704375 0.672750 0.605000 0.826875 0.448000 0.106750 0.526625
6 0.003750 0.053000 0.127250 0.110500 0.610625 0.026625 0.032125 0.062125
plot(results, category = "perfo", level = "within")
The grey cells are environments in which the genotype specified in the row was not evaluated.
4.2.2 Pairwise probability of superior performance
A directly comparison between genotypes evaluated at the same environment. We use the following equations:
- When
increase = TRUE
\[ Pr\left(\hat{g}_{jk} > \hat{g}_{j^\prime k} \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_{jk}^{(s)} > \hat{g}_{j^\prime k}^{(s)} \vert y\right) \]
- When
increase = FALSE
\[
Pr\left(\hat{g}_{jk} < \hat{g}_{j^\prime k} \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_{jk}^{(s)} < \hat{g}_{j^\prime k}^{(s)} \vert y\right)
\] Since the pairwise comparisons are performed per environment, it is not possible to illustrate them all in a single plot. In fact, plot
generates a list of plots, with each element corresponding to an environment. It is strongly recommended that you store this list in an object. Otherwise, plot
will simply try to plot them all at once. For that reason, plot
will double check if you are using an object to store the results. It will print the following question in the console: “Are you using an object to store the outputs of this function? Enter Y/n:”. If you type “n”, it will break and give you the following message “There is more than one plot, so it is not a good idea to plot them all at once. Please, store the output of this function in an object”. If you plot “Y”, it will build the plots.
The pairwise comparisons and the ggplots are stored in lists. Therefore, the results are accessed using the following commands, taking the environment “E13” and the region “TB” as examples:
= plot(results, category = "pair_perfo", level = "within")
pairsupwithin $gl$E13
pairsupwithin$gm$TB pairsupwithin
Are you using an object to store the outputs of this function? Enter Y/n:
Like the other heatmaps, we are evaluating the probability of genotypes at x-axis being superior than genotypes at the y-axis.
5 Wrap up
The estimated probabilities demonstrated in this tutorial from ProbBreed
are related to some key question that constantly arises in plant breeding:
What is the risk of recommending a selection candidate?
What is the probability of a given selection candidate having good performance across or within environments?
What is the probability of a selection candidate having better performance than a check cultivar check?
How probable is it that a given selection candidate performs similarly across environments?
What are the chances that a given selection candidate is more stable than a cultivar check?
What is the probability that a given selection candidate having a superior and invariable performance across environments?
We hope this package is useful for you to answer some of these questions. Feel free to contact us if there any issues or suggestions for improvement.