1 Introduction
Welcome to ProbBreed
! This guide will help you using the package’s functions, from fitting a Bayesian model to computing probabilities. Feel free to contact us if there are any issues via GitHub or e-mail. Further details are found within each function documentation (e.g., ?bayes_met
) and in the papers Dias et al. (2022) and Chaves et al. (2024).
library(ProbBreed)
2 Step one - fit the Bayesian model
The first step is to fit a multi-environment Bayesian model using the bayes_met
function. This function has a set of predefined models that are used to run Bayesian analyses using rstan
. There are some important details that needs to be stressed:
1- Currently, the function has nine options of models (Figure 1).
These models differ according to the considered information: only locations, locations and years, locations and breeding regions, and locations, years and breeding regions. Should you consider an “environment” as a combination of environmental factors (for instance, location and planting dates within location), this information must be in the loc
argument. All models will have gen
and loc
. If you want to consider the effect of regions (or mega-environment) and/or time factors (for instance, years or harvests), then reg
and year
must represent the column that contains the information on these effects. Otherwise, these arguments are set to NULL
. You may control the experimental design effects in the repl
argument. If repl = NULL
, bayes_met
will assume that you are entering with adjusted means of each environment, i.e., data without replicates. If you have data from trials laid out in randomized complete blocks, repl = 'Replicate'
. Finally, if the multi-environment trials were laid out in incomplete blocks design, repl = c('Replicate', 'Block')
. By default, when repl = NULL
, reg
and year
must also be NULL
.
2- You may change the number of iterations, chains, and cores. This will vary according to the data and the processing capacity of your machine. The default is set 2000 iterations (including warm-up), 2 cores and 4 chains. Be aware that the more iterations and chains, the longer the computation time. At the same time, results are likely more reliable, and mixing/convergence issues will diminish. By default, if more than one core is set, the function runs one chain per core. The number of cores depends on the processing capacity of your machine. Choose wisely.
3- Sometimes, bayes_met
may yield warnings on possible mixing/convergence issues. More details on these problems and how to deal with them are available here. Usually, increasing the number of iterations is enough. If this does not work, bayes_met
has several arguments passed to rstan::sampling()
in which advanced users can configure the default sampling parameters (more details on bayes_met
documentation: help("bayes_met")
), and hopefully, diminish these issues. Nevertheless, before taking any measures - or even disposing of the model - we recommend doing the posterior predictive check using the extr_outs
function. Checking the \(\hat{R}\) statistic, Bayesian p-values and Empirical vs Sampled density plot is particularly useful. If bayes_met
showed warnings but the posterior predictive checks do not indicate serious issues, you can rely on the results and proceed with the analysis.
4- You can choose between a model with homogeneous or heterogeneous residual variances using res.het = FALSE
and res.het = TRUE
, respectively. When res.het = TRUE
, there will be a variance for each location (or environment). By default, when repl = NULL
, res.het
must be TRUE.
5- The Bayesian models implemented in ProbBreed
have predefined priors described by Dias et al. (2022). In summary: \[
x \sim \mathcal{N}\left(0,S^{[x]}\right)
\] \[
\sigma \sim \mathcal{HalfCauchy}\left(0, S^{[\sigma]}\right)
\]
where \(x\) can be any effect but the residual, and \(\sigma\) is the standard deviation of the likelihood. If an heterogeneous model is set (res.het = TRUE
), then \(\sigma_k \sim HalfCauchy\left(0, S^{[\sigma_k]}\right)\).
The hyperpriors are set as follows: \[ S^{[x]} \sim HalfCauchy(0, \phi) \] where \(\phi\) is the known global hyperparameter defined such as \(\phi = max(y) \times 10\). By doing this, we restrain the hyperparameter to be finite and allow the model to take full advantage of the data to infer the posterior distribution since it provide weakly informative prior distributions, avoiding subjectivity that can possibly hamper the results. The half-Cauchy distribution was used to guarantee that variance components will not be negative.
Let us see how the nine models can be fitted using bayes_met()
. The following models are only for educational purposes, and will not provide a valid output. We will use the terms “location”, “region” and “years” to avoid the possible confusion caused by the term “environment”, which can have several meanings.
2.1 Entry-mean model
\[ y_{jk} = \mu + l_k + g_j + \varepsilon_{jk} \]
where the \(y_{jk}\) is the phenotypic record of the \(j^{\text{th}}\) genotype in the \(k^{\text{th}}\) location, \(\mu\) is the intercept, \(l_k\) is the main effects of the \(k^{\text{th}}\) location, \(g_j\) is the main effect of the \(j^{\text{th}}\) genotype, and \(\varepsilon_{jk}\) is the residual effect.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = NULL,
year = NULL,
reg = NULL,
res.het = TRUE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
Within-environment probabilities of superior performance, and probabilities of superior stability are unavailable for this type of model, since we have no explicit genotype \(\times\) environment interaction effect in this case.
2.2 Randomized complete blocks design
2.2.1 Only locations
\[ y_{jkp} = \mu + l_k + b_{p(k)} + g_j + gl_{jk} + \varepsilon_{jkp} \]
where the \(y_{jkp}\) is the phenotypic record of the \(j^{\text{th}}\) genotype, allocated in the \(p^{\text{th}}\) block, in the \(k^{\text{th}}\) location. All other effects were previously defined but \(b_{p(k)}\), which is the effect of the \(p^{\text{th}}\) block in the \(k^{\text{th}}\) location, and \(gl_{jk}\), which correspond to the genotype-by-location interaction.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
year = NULL,
reg = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.2.2 Locations and regions
\[ y_{jkwp} = \mu + m_w + l_k + b_{p(k)} + g_j + gl_{jk} + gm_{jw} + \varepsilon_{jkwp} \]
All effects were previously defined but \(m_w\) and \(gm_{jw}\), which are the main effects of region and genotypes-by-regions interaction, respectively.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
reg = "Region",
year = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.2.3 Locations and years
\[ y_{jkhp} = \mu + t_h + l_k + b_{p(k)} + g_j + gl_{jk} + gt_{jh} + \varepsilon_{jkhp} \]
where \(t_h\) and \(t_{jh}\) are the main effect of years and the genotypes-by-years interaction effect, respectively.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
reg = NULL,
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.2.4 Locations, regions and years
\[ y_{jkhwp} = \mu + t_h + m_w + l_k + b_{p(k)} + g_j + gl_{jk} + gt_{jh} + gm_{jw} + \varepsilon_{jkhwp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
reg = "Region",
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.3 Incomplete blocks desing
2.3.1 Only locations
\[ y_{jkqp} = \mu + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + \varepsilon_{jkqp} \]
where the \(y_{jkqp}\) is the phenotypic record of the \(j^{\text{th}}\) genotype, allocated in the \(p^{\text{th}}\) block of the \(q^{\text{th}}\) replicate, in the \(k^{\text{th}}\) location. All other effects were previously defined but \(r_{q(k)}\), which is the effect of the \(q^{\text{th}}\) replicate in the \(k^{\text{th}}\) location. Note that the indices of \(b\) also change.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = NULL,
year = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.3.2 Locations and regions
\[ y_{jkwqp} = \mu + m_w + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + gm_{jw} + \varepsilon_{jkwqp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = "Region",
year = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.3.3 Locations and years
\[ y_{jkhqp} = \mu + t_h + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + gt_{jh} + \varepsilon_{jkhqp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = NULL,
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.3.4 Locations, regions and years
\[ y_{jkhwqp} = \mu + t_h + m_w + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + gt_{jh} + gm_{jw} + \varepsilon_{jkhwqp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = "Region",
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
2.4 Example dataset
For the next steps, we will use the maize
data, which is contained within the package. This dataset was used by Dias et al. (2022) in the paper that proposed the methods. It contains 32 single-cross hybrids and four commercial checks (36 genotypes in total) evaluated in 16 locations across five regions or mega-environments. These trials were laid out in incomplete blocks design, using a block size of 6 and two replications per trial. This dataset was kindly provided by Embrapa Milho e Sorgo.
= bayes_met(data = maize,
mod gen = "Hybrid",
loc = "Location",
repl = c("Rep","Block"),
trait = "GY",
reg = "Region",
year = NULL,
res.het = TRUE,
iter = 4000,
cores = 4,
chain = 4)
ProbBreed
uses rstan
to fit Bayesian models, so packages and functions designed to manage and explore these models can be used in the output object of the bayes_met
function. Some interesting options are rstantools
, shinystan
, and bayesplot
.
3 Step two - explore the model’s outputs
After fitting the model, the next step is to extract outputs using extr_outs
. This function also provides other useful information, like variance components and posterior predictive checks. Using the mod
object from the previous step:
= extr_outs(model = mod,
outs probs = c(0.05, 0.95),
verbose = TRUE)
where:
model
is the model fitted usingbayes_met
probs
are the probabilities considered to calculate the quantiles and the highest posterior density (HPD)
This function provides an object of class extr
, which is a list with the posterior distribution of each effect, the data generated by the model, the maximum posterior values of each effect, the variances of each effect, and quality parameters of the model (see below).
$variances outs
effect var sd naive.se HPD_0.05 HPD_0.95
1 Rep 0.025 0.034 0.000 0.000 0.091
2 Block 0.219 0.047 0.001 0.148 0.301
3 Hybrid 0.216 0.076 0.001 0.117 0.359
4 Location 7.887 3.845 0.043 3.794 14.791
5 Hybrid:Location 0.361 0.068 0.001 0.254 0.477
6 Region 4.746 20.828 0.233 0.020 16.855
7 Hybrid:Region 0.061 0.040 0.000 0.006 0.136
8 error_env1 0.875 0.196 0.002 0.597 1.223
9 error_env2 0.942 0.257 0.003 0.593 1.420
10 error_env3 1.400 0.318 0.004 0.950 1.979
11 error_env4 0.597 0.152 0.002 0.389 0.877
12 error_env5 1.075 0.232 0.003 0.743 1.490
13 error_env6 1.457 0.326 0.004 0.995 2.071
14 error_env7 0.282 0.072 0.001 0.186 0.414
15 error_env8 2.009 0.479 0.005 1.319 2.879
16 error_env9 0.567 0.155 0.002 0.357 0.858
17 error_env10 0.624 0.146 0.002 0.416 0.889
18 error_env11 1.255 0.284 0.003 0.845 1.773
19 error_env12 0.452 0.116 0.001 0.292 0.664
20 error_env13 0.749 0.189 0.002 0.483 1.096
21 error_env14 1.827 0.403 0.005 1.270 2.553
22 error_env15 0.822 0.198 0.002 0.550 1.184
23 error_env16 1.818 0.407 0.005 1.237 2.546
$ppcheck outs
Diagnostics
p.val_max 0.3349
p.val_min 0.3201
p.val_median 0.5388
p.val_mean 0.4904
p.val_sd 0.5374
Eff_No_parameters 183.7621
WAIC2 3924.2884
mean_Rhat 1.0006
Eff_sample_size 0.7963
The S3 method plot
will generate some useful plots, like the density plots and histograms built from the posterior effects (Figure 2). It can also build trace plots.
plot(outs, category = "density")
plot(outs, category = "histogram")
A particularly useful plot is the comparison between the empirical and sampled phenotype, which illustrates the model’s convergence (Figure 3). The more the density plots overlap, the more successful was the model in sampling the effects.
plot(outs)
See more plot options in help('plot.extr')
.
4 Step three - compute the probabilities
With the outputs extracted by extr_outs
, we can calculate the probabilities of superior performance and superior stability of the evaluated genotypes with prob_sup
:
= prob_sup(extr = outs,
results int = .2,
increase = TRUE,
save.df = FALSE,
verbose = FALSE)
where:
increase
: The objective is for increasing (TRUE
, default) or decreasing (FALSE
) the trait meanextr
: Anextr
object that contains the outputs of theextr_outs()
function.int
: The selection intensity, expressed in decimal values. In the example, the selection intensity is 20%.save.df
:TRUE
if you want to save the data frames containing the calculated probabilities in the working directory,FALSE
otherwise.
This function generates an object of class probsup
, which consists of two lists: across
and within
. The first list contains the probabilities of superior performance and superior stability across environments, while the second contains the probabilities of superior performance within environments. probsup
is also compatible with the S3 method plot
. See the details below or in help("plot.probsup")
4.1 Across-environments results
The across
list has the following outputs:
4.1.1 HPD of the posterior genotypic main effects
head(results$across$g_hpd)
gen g HPD95 HPD97.5 HPD5 HPD7.5
1 G1 0.63126869 1.0259712 1.110047110 0.235731433 0.16498478
2 G10 -0.47653152 -0.0817441 -0.005033713 -0.871636673 -0.95611857
3 G11 -0.07243611 0.3106306 0.379956875 -0.449724066 -0.52916081
4 G12 0.24867322 0.6426858 0.721897025 -0.137741834 -0.20659020
5 G13 0.37807601 0.7693715 0.851806417 -0.009092229 -0.08398443
6 G14 -0.25286212 0.1262817 0.202358345 -0.644663245 -0.72620990
plot(results, category = "hpd")
4.1.2 Probability of superior performance
Let \(\Omega\) be a subset of the high-performance selected genotypes according to the intensity of selection. A given genotype \(j\) will belong to \(\Omega\) if its genotypic marginal value \(\left(\hat{g}_j\right)\) is high (or low) enough compared to its peers. We can emulate the occurrence of \(S\) trials \(\left(s = 1, 2, \dots, S\right)\) by leveraging discretized samples of Monte Carlo posterior distributions of the fitted Bayesian models. Then, the probability of the \(j^{\text{th}}\) genotype belonging to \(\Omega\) is its ratio of success \(\left(\hat{g}_j \in \Omega\right)\) events over the total number of sampled events \(\left[S = \left(\hat{g}_j \in \Omega\right) + \left(\hat{g}_j \notin \Omega\right)\right]\), as follows:
\[
Pr\left(\hat{g}_j \in \Omega \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_j^{(s)} \in \Omega \vert y\right)
\] where \(I\left(\hat{g}_j^{(s)} \in \Omega \vert y\right)\) is an indicator variable that can assume two values: (1) if \(\hat{g}_j \in \Omega\) in the \(s^{\text{th}}\) sample, and (0) otherwise. The results provided by prob_sup
can be accessed as follows:
head(results$across$perfo)
ID prob
36 G9 0.996250
1 G1 0.920375
22 G29 0.798875
24 G30 0.643500
5 G13 0.606750
35 G8 0.558875
plot(results)
4.1.3 Pairwise probability of superior performance
To directly compare a selection candidate with a commercial check, or another promising genotype, we can calculate the pairwise probability of superior performance. This metric computes the probability of a genotype \(j\) performing better than a genotype \(j^\prime\):
\[
Pr\left(\hat{g}_{j} > \hat{g}_{j^\prime} \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_{j}^{(s)} > \hat{g}_{j^\prime}^{(s)} \vert y\right)
\] Note that this equation applies if the selection direction is to increase the trait value. If the aim is to decrease it (for example, plant height or susceptibility to a disease), you can set increase = FALSE
, and the following equation is employed:
\[ Pr\left(\hat{g}_{j} < \hat{g}_{j^\prime} \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_{j}^{(s)} < \hat{g}_{j^\prime}^{(s)} \vert y\right) \] The results are accessed using the following commands:
$across$pair_perfo[1:5, 1:5] results
G1 G10 G11 G12 G13
G1 0.000000 0.000250 0.012625 0.11600 0.210125
G10 0.999750 0.000000 0.892500 0.98975 0.995375
G11 0.987375 0.107500 0.000000 0.84200 0.923875
G12 0.884000 0.010250 0.158000 0.00000 0.662250
G13 0.789875 0.004625 0.076125 0.33775 0.000000
plot(results, category = "pair_perfo")
4.1.4 Probability of superior stability
Considering genotypes that have low genotype-by-environment interaction variance \(\left[var\left(ge_{jk}\right)\right]\) as stable, the probability of superior stability is given by the following equation:
\[ Pr\left[var\left(\widehat{ge}_{jk}\right) \in \Omega \vert y\right] = \frac{1}{S} \sum_{s=1}^S I\left[var\left(\widehat{ge}_{jk}^{(s)}\right) \in \Omega \vert y\right] \] where \(I\left[var\left(\widehat{ge}_{jk}^{(s)}\right) \in \Omega \vert y\right]\) indicates if \(var\left(\widehat{ge}_{jk}^{(s)}\right)\) exists in \(\Omega\) (1) or not (0) for the \(j^{th}\) genotype on the \(k^{th}\) environment. Note that this probability can only be computed across environments since it depends on \(var\left(\widehat{ge}_{jk}\right)\). Here are the codes:
head(results$across$stabi$gl)
ID prob
8 G16 0.504875
25 G31 0.447750
28 G34 0.410125
4 G12 0.404125
29 G35 0.389875
24 G30 0.371375
head(results$across$stabi$gm)
ID prob
28 G34 0.282625
24 G30 0.281250
13 G20 0.276500
32 G5 0.264500
36 G9 0.259000
15 G22 0.257625
plot(results, category = "stabi")
4.1.5 Pairwise probability of superior stability
We can also compare the stability of two selection candidates, or a experimental genotype and a commercial check for example. The following equation is used:
\[ Pr\left[var\left(\widehat{ge_{jk}}\right) < var\left(\widehat{ge}_{ik}\right) \vert y\right] = \frac{1}{S} \sum_{s=1}^S{I \left[var\left(\widehat{ge}_{jk}^{(s)}\right) < var\left(\widehat{ge}_{ik}^{(s)}\right) \vert y\right]} \]
Note that, in this case, we are interested in the genotype that has a lower variance of the genotype-by-environment (or genotype-by-region) interaction effects.
$across$pair_stabi$gl[1:5, 1:5] results
G1 G10 G11 G12 G13
G1 0.000000 0.744125 0.768750 0.885750 0.778375
G10 0.255875 0.000000 0.526125 0.709875 0.551500
G11 0.231250 0.473875 0.000000 0.690750 0.523750
G12 0.114250 0.290125 0.309250 0.000000 0.338500
G13 0.221625 0.448500 0.476250 0.661500 0.000000
$across$pair_stabi$gm[1:5, 1:5] results
G1 G10 G11 G12 G13
G1 0.00000 0.555000 0.578250 0.571750 0.583250
G10 0.44500 0.000000 0.525375 0.525125 0.534625
G11 0.42175 0.474625 0.000000 0.500000 0.509000
G12 0.42825 0.474875 0.500000 0.000000 0.506625
G13 0.41675 0.465375 0.491000 0.493375 0.000000
plot(results, category = "pair_stabi")
4.1.6 Joint probability of superior performance and superior stability
Assuming that the genotypic main effects are independent of the variance of the genotype-by-environment interaction effects, the joint probability of performance and superior stability follows the same idea as the probability of occurrence of two independent events:
\[ Pr\left[\hat{g}_j \in \Omega \cap var\left(\widehat{ge}_{jk}\right) \in \Omega\right] = Pr\left(\hat{g}_j \in \Omega\right) \times Pr\left[var\left(\widehat{ge}_{jk}\right) \in \Omega\right] \]
The results are accessed using the following commands:
head(results$across$joint)
ID level category prob
145 G1 Location Joint 0.0280714375
146 G10 Location Joint 0.0000773125
147 G11 Location Joint 0.0066858281
148 G12 Location Joint 0.1502839844
149 G13 Location Joint 0.1213500000
150 G14 Location Joint 0.0010070000
plot(results, category = "joint")
4.2 Within-environments results
The probabilities of superior performance can be investigated within individual environments. This is useful for specific recommendations. Here are the available results:
4.2.1 Probability of superior performance
It is exactly the same idea as previously stated, but instead of using the marginal genotypic effect of each genotype, we use the within environment genotypic effect \(\left(g_{jk} = g_j + ge_{jk}\right)\):
\[ Pr\left(\hat{g}_{jk} \in \Omega_k \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_{jk}^{(s)} \in \Omega_k \vert y\right) \]
The results are accessed using the following commands:
head(results$within$perfo$gl)
gen E1 E10 E11 E12 E13 E14 E15 E16
1 G1 0.984125 0.268750 0.376750 0.423875 0.273000 0.952625 0.325500 0.969875
2 G10 0.010375 0.001250 0.001875 0.002750 0.021375 0.069375 0.000125 0.040875
3 G11 0.005125 0.094500 0.097500 0.005000 0.599500 0.123625 0.736125 0.093625
4 G12 0.158000 0.084250 0.045500 0.053250 0.061375 0.304750 0.160500 0.569500
5 G13 0.205500 0.182500 0.130500 0.129000 0.063875 0.592625 0.805375 0.072250
6 G14 0.000000 0.005375 0.111750 0.156750 0.028750 0.067000 0.000625 0.022875
E2 E3 E4 E5 E6 E7 E8 E9
1 0.898875 0.625875 0.906625 0.530625 0.306750 0.005125 0.852625 0.376125
2 0.096750 0.195250 0.000875 0.042625 0.024875 0.002000 0.000125 0.042250
3 0.019625 0.031250 0.017500 0.106750 0.329625 0.014625 0.043500 0.223250
4 0.418500 0.605375 0.195750 0.376750 0.435250 0.820375 0.164750 0.321500
5 0.231500 0.697125 0.685750 0.613250 0.820375 0.457125 0.113125 0.535500
6 0.003000 0.045875 0.125750 0.115875 0.616250 0.026250 0.038750 0.068500
plot(results, category = "perfo", level = "within")
The grey cells are environments in which the genotype specified in the row was not evaluated.
4.2.2 Pairwise probability of superior performance
A directly comparison between genotypes evaluated at the same environment. We use the following equations:
- When
increase = TRUE
\[ Pr\left(\hat{g}_{jk} > \hat{g}_{j^\prime k} \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_{jk}^{(s)} > \hat{g}_{j^\prime k}^{(s)} \vert y\right) \]
- When
increase = FALSE
\[
Pr\left(\hat{g}_{jk} < \hat{g}_{j^\prime k} \vert y\right) = \frac{1}{S} \sum_{s=1}^S I\left(\hat{g}_{jk}^{(s)} < \hat{g}_{j^\prime k}^{(s)} \vert y\right)
\] Since the pairwise comparisons are performed per environment, it is not possible to illustrate them all in a single plot. In fact, plot
generates a list of plots, with each element corresponding to an environment. It is strongly recommended that you store this list in an object. Otherwise, plot
will simply try to plot them all at once. For that reason, plot
will double check if you are using an object to store the results. It will print the following question in the console: “Are you using an object to store the outputs of this function? Enter Y/n:”. If you type “n”, it will break and give you the following message “There is more than one plot, so it is not a good idea to plot them all at once. Please, store the output of this function in an object”. If you plot “Y”, it will build the plots.
The pairwise comparisons and the ggplots are stored in lists. Therefore, the results are accessed using the following commands, taking the environment “E13” and the region “TB” as examples:
head(results$within$perfo$gl)
= plot(results, category = "pair_perfo", level = "within")
pairsupwithin $gl$E13
pairsupwithin$gm$TB pairsupwithin
gen E1 E10 E11 E12 E13 E14 E15 E16
1 G1 0.984125 0.268750 0.376750 0.423875 0.273000 0.952625 0.325500 0.969875
2 G10 0.010375 0.001250 0.001875 0.002750 0.021375 0.069375 0.000125 0.040875
3 G11 0.005125 0.094500 0.097500 0.005000 0.599500 0.123625 0.736125 0.093625
4 G12 0.158000 0.084250 0.045500 0.053250 0.061375 0.304750 0.160500 0.569500
5 G13 0.205500 0.182500 0.130500 0.129000 0.063875 0.592625 0.805375 0.072250
6 G14 0.000000 0.005375 0.111750 0.156750 0.028750 0.067000 0.000625 0.022875
E2 E3 E4 E5 E6 E7 E8 E9
1 0.898875 0.625875 0.906625 0.530625 0.306750 0.005125 0.852625 0.376125
2 0.096750 0.195250 0.000875 0.042625 0.024875 0.002000 0.000125 0.042250
3 0.019625 0.031250 0.017500 0.106750 0.329625 0.014625 0.043500 0.223250
4 0.418500 0.605375 0.195750 0.376750 0.435250 0.820375 0.164750 0.321500
5 0.231500 0.697125 0.685750 0.613250 0.820375 0.457125 0.113125 0.535500
6 0.003000 0.045875 0.125750 0.115875 0.616250 0.026250 0.038750 0.068500
Are you using an object to store the outputs of this function? Enter Y/n:
Like the other heatmaps, we are evaluating the probability of genotypes at x-axis being superior than genotypes at the y-axis.
5 Wrap up
The estimated probabilities demonstrated in this tutorial from ProbBreed
are related to some key question that constantly arises in plant breeding:
What is the risk of recommending a selection candidate?
What is the probability of a given selection candidate having good performance across or within environments?
What is the probability of a selection candidate having better performance than a check cultivar check?
How probable is it that a given selection candidate performs similarly across environments?
What are the chances that a given selection candidate is more stable than a cultivar check?
What is the probability that a given selection candidate having a superior and invariable performance across environments?
We hope this package is useful for you to answer some of these questions. Feel free to contact us if there any issues or suggestions for improvement.