install.packages("StanHeaders", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
install.packages("rstan", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
install.packages('BH')
1 Introduction
Welcome to ProbBreed
! This guide will help you using the package’s function, from fitting a Bayesian model to computing probabilities. Feel free to contact us if there are any issues via GitHub or e-mail. Further details are found within each function documentation (e.g., ?bayes_met
) and in the original paper by Dias et al. (2022).
2 Issues with rstan
Long story short: install rstan
using the tutorial below before installing ProbBreed
!
ProbBreed
internally uses rstan
(Stan Development Team, 2023a), the R
interface to Stan
(Stan Development Team, 2023b), to fit Bayesian models. We noticed that the version of rstan
that is currently on CRAN may not work properly. Several on-line tutorials offer solutions to overcome this problem. A solution that worked for us and other colleagues is in this link. Below, we will summarize it in four steps.
2.1 Step one: Update R
We suggest keeping R
up-to-date. We will describe a solution for overcoming the rstan
issue in the newer versions of R
(4.3.#). The link above has solutions for older versions.
2.2 Step two: Install Rtools
If you already have an older Rtools version installed on your machine, you may need to uninstall it and manually delete the folder where its data is stored. Then, identify the installed R
version and download the proper Rtools using this link. The Rtools for R 4.3 is available at this link.
2.3 Step three: Install rstan
, StanHeaders
and BH
Use the code below to install rstan
2.26, and the BH
package:
2.4 Step four: Install ProbBreed
Once rstan
is properly installed, ProbBreed
can be installed and used without further concerns:
install.packages("ProbBreed")
or, for the development version:
# install.packages("devtools")
::install_github("saulo-chaves/ProbBreed") devtools
2.5 Issues in macOs and Linux
For specific issues when using ProbBreed
in macOs and Linux, refer to the following links:
- For macOs
- For Linux
If these links do not help you, feel free to contact us or open an issue in rstan
forum.
3 Vignette
3.1 Step one - load ProbBreed
library(ProbBreed)
3.2 Step two - bayes_met
The first step is to fit a multi-environment Bayesian model using the bayes_met
function. This function has a set of predefined models that are used to run Bayesian analyses using rstan
. There are some important details that needs to be stressed:
1- Currently, the function has twelve options of models (Figure 1).
These models differ according to the considered information: only from locations, locations and years, environments (location-year combinations) and breeding regions, and locations, years and breeding regions. Should you consider an “environment” as a combination of environmental factors (for instance, location and planting dates within location), this information must be in the loc
argument. All models will have gen
and loc
. If you want to consider the effect of regions (or mega-environment) and/or time factors (for instance, years or harvests), then reg
and year
must represent the column that contains the information on these effects. Otherwise, these arguments are set to NULL
. You may control the experimental design effects in the repl
argument. If repl = NULL
, bayes_met
will assume that you are entering with adjusted means of each environment, i.e., data without replicates. If you have data from trials laid out in randomized complete blocks, repl = 'Replicate'
. Finally, if the multi-environment trials were laid out in incomplete blocks design, repl = c('Replicate', 'Block')
.
2- You may change the number of iterations, chains, and cores. This will vary according to the data and the processing capacity of your machine. The default is set 2000 iterations (including warm-up), 2 cores and 4 chains. Be aware that the more iterations and chains, the longer the computation time. At the same time, results are likely more reliable, and mixing/convergence issues will diminish. By default, if more than one core is set, the function runs one chain per core. The number of cores depends on the processing capacity of your machine. Choose wisely.
3- Sometimes, bayes_met
may yield warnings on possible mixing/convergence issues. More details on these problems and how to deal with them are available here. Usually, increasing the number of iterations is enough. If this does not work, bayes_met
has several arguments passed to rstan::sampling()
in which advanced users can configure the default sampling parameters (more details on bayes_met
documentation: help("bayes_met")
), and hopefully, diminish these issues. Nevertheless, before taking any measures - or even disposing of the model - we recommend doing the posterior predictive check using the extr_outs
function (see Step three). Checking the \(\hat{R}\) statistic, Bayesian p-values and Empirical vs Sampled density plot is particularly useful. If bayes_met
showed warnings but the posterior predictive checks do not indicate serious issues, you can rely on the results and proceed with the analysis.
4- You can choose between a model with homogeneous or heterogeneous residual variances using res.het = FALSE
and res.het = TRUE
, respectively. When res.het = TRUE
, there will be a variance for each location.
5- The Bayesian models implemented in ProbBreed
have predefined priors described by Dias et al. (2022). In summary: \[
x \sim N(0,S^{[x]})
\] \[
\sigma \sim HalfCauchy(0, S^{[\sigma]})
\]
where \(x\) can be any effect but the residual, and \(\sigma\) is the standard deviation of the likelihood. If an heterogeneous model is set (res.het = TRUE
), then \(\sigma_k \sim HalfCauchy(0, S^{[\sigma_k]})\).
The hyperpriors are set as follows: \[ S^{[x]} \sim HalfCauchy(0, \phi) \] where \(\phi\) is the known global hyperparameter defined such as \(\phi = max(y) \times 10\). By doing this, we restrain the hyperparameter to be finite and allow the model to take full advantage of the data to infer the posterior distribution since it provide weakly informative prior distributions, avoiding subjectivity that can possibly hamper the results. The half-Cauchy distribution was used to guarantee that variance components will not be negative.
Let us see how the twelve models can be fitted using bayes_met()
. The following models are only for educational purposes, and will not provide a valid output. We will use the terms “location”, “region” and “years” to avoid the possible confusion caused by the term “environment”, which can have several meanings.
3.2.1 Locations: only means
\[ y_{jk} = \mu + l_k + g_j + gl_{jk} + \varepsilon_{jk} \]
where the \(y_{jk}\) is the phenotypic record of the \(j^{\text{th}}\) genotype in the \(k^{\text{th}}\) location, \(\mu\) is the intercept, \(l_k\) is the main effects of the \(k^{\text{th}}\) location, \(g_j\) is the main effect of the \(j^{\text{th}}\) genotype, \(gl_{jk}\) is the interaction between these two effects, and \(\varepsilon_{jk}\) is the residual effect.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = NULL,
year = NULL,
reg = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.2 Locations: randomized complete block design
\[ y_{jkp} = \mu + l_k + b_{p(k)} + g_j + gl_{jk} + \varepsilon_{jkp} \]
where the \(y_{jkp}\) is the phenotypic record of the \(j^{\text{th}}\) genotype, allocated in the \(p^{\text{th}}\) block, in the \(k^{\text{th}}\) location. All other effects were previously defined but \(b_{p(k)}\), which is the effect of the \(p^{\text{th}}\) block in the \(k^{\text{th}}\) location.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
year = NULL,
reg = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.3 Locations: incomplete block design
A lattice, for example:
\[ y_{jkqp} = \mu + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + \varepsilon_{jkqp} \]
where the \(y_{jkqp}\) is the phenotypic record of the \(j^{\text{th}}\) genotype, allocated in the \(p^{\text{th}}\) block of the \(q^{\text{th}}\) replicate, in the \(k^{\text{th}}\) location. All other effects were previously defined but \(r_{q(k)}\), which is the effect of the \(q^{\text{th}}\) replicate in the \(k^{\text{th}}\) location. Note that the indices of \(b\) also change.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = NULL,
year = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.4 Regions and locations: only means
\[ y_{jkw} = \mu + m_w + l_k + g_j + gl_{jk} + gm_{jw} + \varepsilon_{jkw} \]
All effects were previously defined but \(m_w\) and \(gm_{jw}\), which are the main effects of region and genotypes-by-regions interaction, respectively.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = NULL,
year = NULL,
res.het = FALSE,
reg = "Region",
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.5 Regions and locations: randomized complete block design
\[ y_{jkwp} = \mu + m_w + l_k + b_{p(k)} + g_j + gl_{jk} + gm_{jw} + \varepsilon_{jkwp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
reg = "Region",
year = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.6 Regions and Environments: incomplete block design
\[ y_{jkwqp} = \mu + m_w + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + gm_{jw} + \varepsilon_{jkwqp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = "Region",
year = NULL,
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.7 Years and locations: only means
\[ y_{jkh} = \mu + t_h + l_k + g_j + gl_{jk} + gt_{jh} + \varepsilon_{jkh} \] where \(t_h\) and \(t_{jh}\) are the main effect of years and the genotypes-by-years interaction effect, respectively.
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = NULL,
year = 'Year',
res.het = FALSE,
reg = NULL,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.8 Years and locations: randomized complete block design
\[ y_{jkhp} = \mu + t_h + l_k + b_{p(k)} + g_j + gl_{jk} + gt_{jh} + \varepsilon_{jkhp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
reg = NULL,
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.9 Years and locations: incomplete block design
\[ y_{jkhqp} = \mu + t_h + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + gt_{jh} + \varepsilon_{jkhqp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = NULL,
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.10 Years, regions and locations: only means
\[ y_{jkhw} = \mu + t_h + m_w + l_k + g_j + gl_{jk} + gt_{jh} + gm_{jw} + \varepsilon_{jkhw} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = NULL,
year = 'Year',
reg = 'Region',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.11 Years, regions and locations: randomized complete block design
\[ y_{jkhwp} = \mu + t_h + m_w + l_k + b_{p(k)} + g_j + gl_{jk} + gt_{jh} + gm_{jw} + \varepsilon_{jkhwp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = "Block",
reg = "Region",
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.2.12 Years, regions and locations: incomplete block design
\[ y_{jkhwqp} = \mu + t_h + m_w + l_k + r_{q(k)} + b_{p(qk)} + g_j + gl_{jk} + gt_{jh} + gm_{jw} + \varepsilon_{jkhwqp} \]
= bayes_met(data = MyData,
mod gen = "Genotype",
loc = "Location",
repl = c("Replicate", "Block"),
reg = "Region",
year = 'Year',
res.het = FALSE,
trait = "Phenotype",
iter = 2000, cores = 2, chains = 4)
3.3 Example dataset
For the next steps, we will use the soy
data, which is contained within the package. This dataset belongs to the USDA Northern Region Uniform Soybean Tests, and it is a subset of the data from Krause et al. (2023). It contains the empirical best linear unbiased estimates of genotypic means of the seed yield from 39 experimental genotypes evaluated in 14 locations across three regions or mega-environments. The full dataset can be found in the SoyURT
package. Note that we do not have experimental design in this case, so the correct model has repl = NULL
. In addition, the dataset belongs to a single year, so year = NULL
.
= bayes_met(data = soy,
mod gen = "Gen",
loc = "Loc",
repl = NULL,
reg = "Reg",
year = NULL,
res.het = FALSE,
trait = "Y",
iter = 10000,
cores = 4,
chains = 4)
ProbBreed
uses rstan
to fit Bayesian models, so packages and functions designed to manage and explore these models can be used in the output object of the bayes_met
function. Some interesting options are rstantools
, shinystan
, and bayesplot
.
3.4 Step three - extr_outs
After fitting the model, the next step is to extract outputs. The extr_outs()
function is a requisite for estimating the probability of superior performance using other functions. extr_outs()
also provides important diagnostics regarding the convergence and reliability of the model.
Using the mod
object from the previous step:
= extr_outs(data = soy,
outs trait = 'Y',
model = mod,
probs = c(0.05, 0.95),
check.stan.diag = FALSE,
verbose = TRUE)
where:
model
is the model fitted usingbayes_met
probs
are the probabilities considered to calculate the quantiles and the highest posterior density (HPD)check.stan.diag
asks whether you want (TRUE
, the default) diagnostic plots provided byrstan::stan_diag()
or not (FALSE
).
This function provides a list with the posterior distribution of each effect, the data generated by the model, the variances of each effect, the maximum posterior values of each effect, and quality parameters of the model (See below):
$variances outs
effect var sd naive.se HPD_0.05 HPD_0.95
1 genotype 3.284 1.417 0.006 1.331 5.864
2 location 255.655 136.307 0.556 116.515 503.423
3 gen.loc 6.989 5.019 0.020 0.309 15.430
4 region 4150.795 64757.982 264.373 0.883 9695.849
5 gen.reg 1.125 1.087 0.004 0.012 3.261
6 error 11.086 5.016 0.020 2.974 18.346
$ppcheck outs
Diagnostics
p.val_max 0.9711
p.val_min 0.2730
p.val_median 0.6677
p.val_mean 0.5066
p.val_sd 0.5158
Eff_No_parameters 125.8434
WAIC2 2554.2142
mean_Rhat 1.0172
Eff_sample_size 0.1165
It also provides a list containing built-in histograms, density plots, and trace plots of each effect, built using the ggplot2
package (Wickham, 2016). For example, the density plot comparing the generated data by the Bayesian model and the real data.
$plots$densities$sampled.y outs
Another example is the histogram of the genotypic main effects:
$plots$histogram$genotype outs
These plots are available for all effects declared in extr_outs()
. Feel free to explore it. As an option, the function also provides diagnostic plots from rstan::stan_diag()
. In fact, extr_outs
lets you use further options provided by the function. For more information please refer to ?rstan::stan_diag
.
= extr_outs(data = soy,
outs trait = 'Y',
model = mod,
res.het = FALSE,
check.stan.diag = TRUE,
verbose = TRUE)
3.5 Step four - prob_sup
With the outputs extracted by extr_outs()
, we can calculate the probabilities of superior performance and superior stability of the evaluated genotypes with prob_sup()
:
= prob_sup(data = soy,
results trait = "Y",
gen = "Gen",
loc = "Loc",
mod.output = outs,
reg = 'Reg',
year = NULL,
int = .2,
increase = TRUE,
save.df = FALSE,
interactive = FALSE,
verbose = FALSE)
The arguments data
, trait
, gen
, loc
, year
and reg
were previously described. The new arguments of this function are:
increase
: Increasing (TRUE
, default) or decreasing (FALSE
) the trait meanmod.output
: An object that contains the outputs extracted byextr_outs()
.int
: The selection intensity, expressed in decimal values.save.df
:TRUE
if you want to save the data frames containing the calculated probabilities in the work directory,FALSE
otherwise.interactive
:TRUE
if you want to convert the ggplots into ggplotlys (Sievert, 2020), i.e., make them interactive plots,FALSE
otherwise.
The outputs of this function are divided into two lists: marginal
and conditional
. The first list contains the probabilities of superior performance and superior stability across environments (marginal), while the second contains the probabilities of superior performance within environments (conditional). Both lists have the pairwise probabilities of both superior performance and stability.
3.5.1 Marginal probabilities
The marginal
list contains two other lists: one with data frames detailing the results of each computed probability, and another with ggplots illustrating the computed probabilities. Below, we detail each result:
3.5.1.1 Highest posterior density of genotypic main effects
This metric is illustrated in a caterpillar plot (Figure 2):
$marginal$plots$g_hpd results
3.5.1.2 Probability of superior performance
Let \(\Omega\) be a subset of the high-performance selected genotypes according to the intensity of selection. A given genotype \(j\) will belong to \(\Omega\) if its genotypic marginal value (\(\hat{g}_j\)) is high (or low) enough compared to its peers. We can emulate the occurrence of \(S\) trials (\(s = 1, 2, \dots, S\)) by leveraging discretized samples of Monte Carlo posterior distributions of the fitted Bayesian models. Then, the probability of the \(j^{\text{th}}\) genotype belonging to \(\Omega\) is its ratio of success (\(\hat{g}_j \in \Omega\)) events over the total number of sampled events [\(S = (\hat{g}_j \in \Omega) + (\hat{g}_j \notin \Omega)]\), as follows:
\[
Pr(\hat{g}_j \in \Omega \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_j^{(s)} \in \Omega \vert y)
\] where \(I(\hat{g}_j^{(s)} \in \Omega \vert y)\) is an indicator variable that can assume two values: (1) if \(\hat{g}_j \in \Omega\) in the \(s^{\text{th}}\) sample, and (0) otherwise. The results provided by prob_sup()
can be accessed using the following commands:
head(results$marginal$df$perfo)
ID prob
36 G36 0.94545
31 G31 0.82250
20 G20 0.79905
9 G09 0.66595
38 G38 0.59905
1 G01 0.54605
$marginal$plots$perfo results
3.5.1.3 Pairwise probability of superior performance
To directly compare a selection candidate with a commercial check, or another promising genotype, we can calculate the pairwise probability of superior performance. This metric computes the probability of a genotype \(j\) performing better than a genotype \(j^\prime\):
\[
Pr(\hat{g}_{j} > \hat{g}_{j^\prime} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{j}^{(s)} > \hat{g}_{j^\prime}^{(s)} \vert y)
\] Note that this equation applies if the direction of the selection is to increase the trait value. If the aim is to decrease it (for example, plant height or the susceptibility of a disease), you can set increase = F
, and the following equation is employed:
\[ Pr(\hat{g}_{j} < \hat{g}_{j^\prime} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{j}^{(s)} < \hat{g}_{j^\prime}^{(s)} \vert y) \] The results are accessed using the following commands:
head(results$marginal$df$pair_perfo)
x prob y
2 G01 0.97360 G02
3 G01 0.77295 G03
4 G01 0.95590 G04
5 G01 0.99120 G05
6 G01 0.74290 G06
7 G01 0.74170 G07
$marginal$plots$pair_perfo results
The results indicates the probability of genotypes at the x-axis performing superiorly in relation to those at the y-axis.
3.5.1.4 Probability of superior stability
Considering genotypes that have low genotype-by-environment interaction variance [\(var(ge_{jk})\)] as stable, the probability of superior stability is given by the following equation:
\[ Pr[var(\widehat{ge}_{jk}) \in \Omega \vert y] = \frac{1}{S} \sum_{s=1}^S I[var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y] \] where \(I[var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y]\) indicates if \(var(\widehat{ge}_{jk}^{(s)})\) exists in \(\Omega\) (1) or not (0) for the \(j^{th}\) genotype on the \(k^{th}\) environment. Note that this probability can only be computed across environments since it depends on \(var(\widehat{ge}_{jk})\). Pairwise probabilities of superior stability are also possible in this context:
- Genotype-by-location interaction
head(results$marginal$df$stabi_gl)
ID prob
23 G23 0.47675
34 G34 0.43345
35 G35 0.38025
8 G08 0.30390
27 G27 0.30015
13 G13 0.29990
$marginal$plots$stabi_gl results
- Genotype-by-region interaction
head(results$marginal$df$stabi_gm)
ID prob
15 G15 0.23735
32 G32 0.23725
34 G34 0.23690
16 G16 0.23250
1 G01 0.23210
23 G23 0.23175
$marginal$plots$stabi_gm results
3.5.1.5 Pairwise probability of superior stability
We can also compare the stability of two selection candidates, or a experimental genotype and a commercial check for example. The following equation is used:
\[ Pr[var(\widehat{ge_{jk}}) < var(\widehat{ge}_{ik}) \vert y] = \frac{1}{S} \sum_{s=1}^S{I [var(\widehat{ge}_{jk}^{(s)}) < var(\widehat{ge}_{ik}^{(s)}) \vert y]} \]
Note that, in this case, we are interested in the genotype that has a lower variance of the genotype-by-environment (or genotype-by-region) interaction effects.
- Genotype-by-environment interaction
head(results$marginal$df$pair_stabi_gl)
x prob y
2 G01 0.55415 G02
3 G01 0.48535 G03
4 G01 0.52830 G04
5 G01 0.62005 G05
6 G01 0.51410 G06
7 G01 0.44720 G07
$marginal$plots$pair_stabi_gl results
The results indicates the probability of genotypes at the x-axis being more stable than the genotype at the y-axis.
- Genotype-by-region interaction
head(results$marginal$df$pair_stabi_gm)
x prob y
2 G01 0.59370 G02
3 G01 0.51150 G03
4 G01 0.52400 G04
5 G01 0.56620 G05
6 G01 0.54615 G06
7 G01 0.50260 G07
$marginal$plots$pair_stabi_gm results
The results indicates the probability of genotypes at the x-axis being more stable than the genotype at the y-axis.
3.5.1.6 Joint probability of superior performance and superior stability
Assuming that the genotypic main effects are independent of the variance of the genotype-by-environment interaction effects, the joint probability of performance and superior stability follows the same ideia as the probability of occurrence of two independent events:
\[ Pr[\hat{g}_j \in \Omega \cap var(\widehat{ge}_{jk}) \in \Omega] = Pr(\hat{g}_j \in \Omega) \times Pr[var(\widehat{ge}_{jk}) \in \Omega] \]
The results are accessed using the following commands:
head(results$marginal$df$joint_prob)
ID level category prob
157 G01 Location Joint 0.1102474950
158 G02 Location Joint 0.0010212300
159 G03 Location Joint 0.0446639700
160 G04 Location Joint 0.0028105000
161 G05 Location Joint 0.0001216475
162 G06 Location Joint 0.0447423700
$marginal$plots$joint_prob results
3.5.2 Conditional output
The probabilities of superior performance can be investigated within individual environments. In other words, if you want to recommend an experimental genotype for a specific location. The conditional
list is divided into two sublists: the first, df
, contains the results compiled into data frames; and the second, plots
, have the corresponding ggplots.
3.5.2.1 Probability of superior performance
It is exactly the same idea as previously stated, but instead of using the marginal genotypic effect of each genotype, we use the conditional genotypic effect (\(g_{jk} = g_j + ge_{jk}\)):
\[ Pr(\hat{g}_{jk} \in \Omega_k \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{jk}^{(s)} \in \Omega_k \vert y) \]
The results are accessed using the following commands:
head(results$conditional$df$prob_env)
env reg gen prob
1 E01 R1 G01 0.17765
2 E01 R1 G02 0.01025
3 E01 R1 G03 0.09220
4 E01 R1 G04 0.01265
5 E01 R1 G05 0.00360
6 E01 R1 G06 0.19790
$conditional$plots$prob_env results
The grey cells are environments in which the genotype specified in the row was not evaluated.
$conditional$plots$prob_reg results
3.5.2.2 Pairwise probability of superior performance
A directly comparison between genotypes evaluated at the same environment. prob_sup()
uses the following equations:
- When
increase = TRUE
\[ Pr(\hat{g}_{jk} > \hat{g}_{j^\prime k} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{jk}^{(s)} > \hat{g}_{j^\prime k}^{(s)} \vert y) \]
- When
increase = FALSE
\[ Pr(\hat{g}_{jk} < \hat{g}_{j^\prime k} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{jk}^{(s)} < \hat{g}_{j^\prime k}^{(s)} \vert y) \]
The pairwise comparisons and the ggplots are stored in lists. Therefore, the results are accessed using the following commands, taking the environment “E05” and the region “R2” as examples:
head(results$conditional$df$pwprob_env$E05)
y x pwprob
1 G02 G01 0.95790
2 G03 G01 0.34910
3 G03 G02 0.03285
4 G04 G01 0.95620
5 G04 G03 0.96010
6 G04 G02 0.51755
$conditional$plots$pwprob_env$E05 results
head(results$conditional$df$pwprob_reg$R2)
y x pwprob
1 G02 G01 0.96788333
2 G03 G01 0.60066667
3 G03 G02 0.09798333
4 G04 G01 0.95316667
5 G04 G03 0.86826667
6 G04 G02 0.36655000
$conditional$plots$pwprob_reg$R2 results
Like the other heatmaps, we are evaluating the probability of genotypes at x-axis being superior than genotypes at the y-axis.
4 Wrap up
The estimated probabilities demonstrated in this tutorial from ProbBreed
are related to some key question that constantly arises in plant breeding:
What is the risk of recommending a selection candidate?
What is the probability of a given selection candidate having good performance across or within environments?
What is the probability of a selection candidate having better performance than a check cultivar check?
How probable is it that a given selection candidate performs similarly across environments?
What are the chances that a given selection candidate is more stable than a cultivar check?
What is the probability that a given selection candidate having a superior and invariable performance across environments?
We hope this package is useful for you to answer some of these questions. Feel free to contact us if there any issues or suggestions for improvement.