Required Files Below

Data Files Here – Store all files in a folder named “RNGP” (Reaction Norm Genomic Prediction - RNGP) This will be your working directory

Description

One Sentence Summary

The aim of this study is to predict grain yield (tons/hect.) of a set of maize hybrids sharing a common parent by using their genomic and reaction norm data.

General Summary

Grain yield is one of the most important agronomic traits of interest when studying maize. Grain yield is a complex trait that is controlled by many small-effect loci. Therefore, it is difficult to narrow down on the individual genes acting on this trait without reliable modeling scenarios and sufficient data. Aside from genetic variation, grain yield performance of genotypes is also impacted by their environment and genotype-by-environment (GxE) interaction effects. GxE effects arise when a genotype performs differently depending on the environment in which it is in. Thus, it is crucial to account for these additional sources of variation in the model to generate accurate predictions for breeding scenarios where certain genotypes/environments have not been observed/experimented.

GxE can be modeled in terms of the differential mean response across environments \((A)\) or in terms of all the combinations of the genotype:environment \((B)\). The former may be considered to be more computationally efficient than the latter because it requires less terms to be iterated through when performing genomic predictions.

Regression on the mean model (Finlay-Wilkinson): \(y\) is the vector for the dependent variable, \(\mu\) is the grand mean, \(G_{i}\) is the effect of the \(ith\) genotype, \(E_{j}\) is the effect of the \(jth\) environment, \(b_{i}E_{j}\) is the reaction norm parameter effect of the \(ith\) genotype in the \(jth\) environment, and \(\varepsilon\) is the residual error not explained by the model.

\[y=\mu+G_{i}+E_{j}+b_{i}E_{j}+\varepsilon\]

Additive effects model: \(y\) is the vector for the dependent variable, \(\mu\) is the grand means, \(G_{i}\) is the effect of the \(ith\) genotype, \(E_{j}\) is the effect of the \(jth\) envvironment, \(GE_{ij}\) is the GxE interaction effect of the \(ith\) genotype in the \(jth\) environment, and \(\varepsilon\) is the residual error not explained by the model.

\[y=\mu+G_{i}+E_{j}+GE_{ij}+\varepsilon\]

This study used the Genomes to Fields (G2F) 2020 and 2021 data set with the intention of conducting genomic prediction for grain yield (tons/hect.) using reaction norm parameters. Three hybrid tester populations derived from the cross between ex-PVP inbreds (PHK76, PHP02, and PHZ51) and the doubled haploid Wisconsin Stiff Stalk MAGIC population (W10004) were analyzed for a total of 174 hybrids. The study consisted of 29 unique environments across the U.S. to get the full scope of GxE patterns. The photothermal ratio (PTR) was used as a biologically-relevant weather parameter to numerically order the environments to obtain quantified values for grain yield stability for each hybrid. Two different genomic prediction models were conducted (refer to \(A\) and \(B\)) to compare the results.

Visualize Environments/Trials

Data File(s) Needed:

  • envRtype.txt

Code

Load envRtype.txt – This data set contains the subset of environments that will be considered for the analysis. The hybrid populations used in the study were planted in these locations. There are 29 total environments (state:year combinations).

#load libraries
pacman::p_load(readr, ggplot2, maps, ggrepel)

#refer to data set in github repository
git_path <- "https://raw.githubusercontent.com/fatmaoz25/ReactionNormGenomicPrediction/refs/heads/main"
locations_file <- "envRtype.txt"
locations_path <- file.path(git_path, locations_file)

#table summary
locations <- read.table(locations_path, header=T)
head(locations)
##         env      lat       lon      start     end100     end125 location year
## 1 DEH1.2020 38.63014 -75.46610 2020-05-14 2020-08-22 2020-09-16     DEH1 2020
## 2 DEH1.2021 38.63500 -75.45500 2021-04-27 2021-08-05 2021-08-30     DEH1 2021
## 3 GAH1.2020 31.50728 -83.55828 2020-04-01 2020-07-10 2020-08-04     GAH1 2020
## 4 GAH1.2021 31.50728 -83.55828 2021-03-25 2021-07-03 2021-07-28     GAH1 2021
## 5 IAH1.2021 41.21390 -91.53710 2021-05-01 2021-08-09 2021-09-03     IAH1 2021
## 6 IAH2.2021 42.06830 -94.85980 2021-04-27 2021-08-05 2021-08-30     IAH2 2021
##              Tester
## 1 PHK76,PHP02,PHZ51
## 2 PHK76,PHP02,PHZ51
## 3             PHZ51
## 4             PHZ51
## 5       PHK76,PHZ51
## 6 PHK76,PHP02,PHZ51
dim(locations)
## [1] 29  9
#generate U.S. map with state boundaries
us_map <- map_data("state")

The figure below displays the environments where each of the hybrid populations were planted and considered for this study. There were 11 environments containing the PHK76 population, 14 environments with PHP02, and 20 environments with PHZ51. Not every tester was featured in every environment because of growth limitations of regionally adapted genotypes. PHK76 is adapted to the Midwest, PHP02 to the North, and PHZ51 to the South.

## [1] "Number of Environments with PHK76: 11"
## [1] "Number of Environments with PHP02: 14"
## [1] "Number of Environments with PHZ51: 20"

Analysis of Variance (ANOVA)

Data File(s) Needed:

  • g2f_2020_phenotypic_clean_data.csv
  • g2f_2021_phenotypic_clean_data.csv

Code

Load g2f_2020_phenotypic_clean_data.csv and g2f_2021_phenotypic_clean_data.csv – The following code combines the 2020 and 2021 G2F data frames for the 29 environments that are of interest for this study.The columns of the data frame are displayed to show the various types of data collected for each observation/hybrid that was planted. For this study, we will be focusing on the column for grain yield (bushels/acre) since this is the trait we want to ultimately analyze.

#load libraries
pacman::p_load(readr, nlme, sjstats, lme4, RColorBrewer, dplyr, ggplot2, tidyverse, ggrepel, caret, MuMIn)

#refer to data set in github repository
git_path <- "https://raw.githubusercontent.com/fatmaoz25/ReactionNormGenomicPrediction/refs/heads/main"
df20_file <- "g2f_2020_phenotypic_clean_data.csv"
df20_path <- file.path(git_path, df20_file)
df20 <- read.csv(df20_path)


df20 <- df20[df20$Field.Location %in% c("DEH1","GAH1","INH1","MNH1","NEH1","NEH2","NEH3","OHH1","TXH1","TXH2","WIH1","WIH2"),] #environments for 2020

df21_file <- "g2f_2021_phenotypic_clean_data.csv"
df21_path <- file.path(git_path, df21_file)
df21 <- read.csv(df21_path)
df21 <- df21[df21$Field.Location %in% c( "DEH1", "GAH1", "ILH1", "MNH1", "NCH1", "NEH1", "SCH1", "TXH2", "TXH3", "WIH1", "WIH2", "WIH3","IAH1", "IAH2", "IAH3", "IAH4", "MIH1"),] #environments for 2021

df <- rbind(df20,df21)
df$Env <- paste(df$Field.Location,df$Year, sep = ".")
df <- df[!df$Replicate %in% c(3,4),]
colnames(df)
##  [1] "Year"                                 
##  [2] "Field.Location"                       
##  [3] "State"                                
##  [4] "City"                                 
##  [5] "Plot.length..center.center.in.feet."  
##  [6] "Plot.area..ft2."                      
##  [7] "Alley.length..in.inches."             
##  [8] "Row.spacing..in.inches."              
##  [9] "Rows.per.plot"                        
## [10] "X..Seed.per.plot"                     
## [11] "Experiment"                           
## [12] "Source"                               
## [13] "Pedigree"                             
## [14] "Family"                               
## [15] "Tester"                               
## [16] "Replicate"                            
## [17] "Block"                                
## [18] "Plot"                                 
## [19] "Plot_ID"                              
## [20] "Range"                                
## [21] "Pass"                                 
## [22] "Date.Plot.Planted..MM.DD.YY."         
## [23] "Date.Plot.Harvested..MM.DD.YY."       
## [24] "Anthesis..MM.DD.YY."                  
## [25] "Silking..MM.DD.YY."                   
## [26] "Anthesis..days."                      
## [27] "Silking..days."                       
## [28] "Plant.Height..cm."                    
## [29] "Ear.Height..cm."                      
## [30] "Stand.Count....of.plants."            
## [31] "Root.Lodging....of.plants."           
## [32] "Stalk.Lodging....of.plants."          
## [33] "Grain.Moisture...."                   
## [34] "Test.Weight..lbs."                    
## [35] "Plot.Weight..lbs."                    
## [36] "Grain.Yield..bu.A."                   
## [37] "Plot.Discarded..enter..yes..or.blank."
## [38] "Comments"                             
## [39] "Filler"                               
## [40] "Snap....of.plants."                   
## [41] "Env"

Next, we have to run an analysis of variance (ANOVA) to partition the variance components for grain yield in order to obtain heritability estimates. Heritability reveals how well a phenotype is at indicating the breeding value of a genotype that has multiple replicates in multi-environment trials (MET). It is calculated as the ratio between the genotypic variance and the phenotypic variance, thus we can estimate how much of the phenotypic variance is attributed to genetic influences.

This study uses the Model II ANOVA in particular, which requires a continuous dependent variable with random categorical predictors. This implies that the levels incorporated for the different predictors are sampled from a random population of all possible levels. Treating all factors as random is ideal for this situation since we are not restricting our inferences to ONLY the environments and genotypes observed in this study.

The data used here contains both cross-sectional and longitudinal clustering. The genotypes are assigned to multiple environment, range, and pass levels (cross-sectional clustering) with two replicates each (longitudinal clustering). This clustering violates independence assumptions of a simple linear regression model where there is only a single fixed effect (complete pooling), so we have to use a mixed effects linear model (MLM) to account for hierarchical data structures (partial pooling).

The following code formats the variables of interest as factors to make them compatible for the lmer() function. Since each of the observations within these variables is labeled as a random, independent number, we do not want the model to assume that these observations are numerical/ordinal (ordered). Instead we want the model to treat them as nominal (unordered) labels. The default estimation method for lmer() is restricted maximum likelihood (REML). REML is an unbiased estimator for variance components because it removes degrees of freedom for every fixed effect estimated for the predictors. Thus, it does not underestimate the variance and works well for smaller sample sizes within the clusters (unlike maximum likelihood (ML)).

The following equation shows the random effects model that was used to dissect the variance components and to calculate the best linear unbiased predictor (BLUP) values for grain yield:

\[Grain \ Yield_{ijklm}=\mu+Genotype_{i}+Environment_{j}+GE_{ij}+Range_{k}+Row_{l}+Rep_{(j)m}+\varepsilon_{ijklm}\]

\(\mu\) is the mean grain yield of the population, \(Genotype_{i}\) is the effect of the genotype, \(Environment_{j}\) is the effect of the environment, \(GE_{ij}\) is the effect of GxE interactions, \(Range_{k}\) is the effect of the range, \(Row_{l}\) is the effect of the row, \(Rep_{(j)m}\) is the effect of the replicate, and \(\varepsilon_{ijklm}\) is the residual error not explained by the model. Since all the predictors are treated as random, they are assumed to be normally distributed

*note: grain yield is converted from bushels/acre to tons/hect. for the model.

df$Range <- as.factor(df$Range)
df$Pass <- as.factor(df$Pass)
df$Replicate  <- as.factor(df$Replicate )
df$Pedigree <- as.factor(df$Pedigree)
df$Env <- as.factor(df$Env)
df$Grain.Yield..bu.A. <- as.numeric(df$Grain.Yield..bu.A.)

anova<- lmer(Grain.Yield..bu.A.*.0673 ~
               (1|Pedigree)+
               (1|Env)+
               (1|Pedigree:Env)+
               (1|Range)+
               (1|Pass)+
               (1|Replicate:Env), df )
summary(anova)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Grain.Yield..bu.A. * 0.0673 ~ (1 | Pedigree) + (1 | Env) + (1 |  
##     Pedigree:Env) + (1 | Range) + (1 | Pass) + (1 | Replicate:Env)
##    Data: df
## 
## REML criterion at convergence: 104045.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.5521 -0.4731  0.0265  0.5070  5.3271 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  Pedigree:Env  (Intercept) 1.27451  1.1289  
##  Pedigree      (Intercept) 1.30753  1.1435  
##  Pass          (Intercept) 1.91912  1.3853  
##  Replicate:Env (Intercept) 0.16515  0.4064  
##  Range         (Intercept) 0.03614  0.1901  
##  Env           (Intercept) 3.78035  1.9443  
##  Residual                  2.69978  1.6431  
## Number of obs: 24171, groups:  
## Pedigree:Env, 16680; Pedigree, 1215; Pass, 178; Replicate:Env, 58; Range, 50; Env, 29
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  10.2983     0.3876   26.57

Next, we have to compute some metrics to assess the performance of the ANOVA model. The coefficient of variation (cv) shows the relative variability of the datapoints about the mean. The cv is .8%, indicating that the data has tight clustering and low dispersion.

The root mean square error (RMSE) tells us how close the predicted values are to the actual values. In this case, the computed rmse is 1.38, which means the model has good to moderate fit.

The \(R^2\) value of 0.75 indicates the fraction of the variance explained by the fixed and random effects, which is moderately high.

## [1] "Coefficient of Variation 0.00888620130300381"
## [1] "rmse 1.37666553946181"
## [1] "R 0.758572987963674"

To determine the heritability for grain yield, the variance components due to each factor must be extracted to plug-in scaled values for the following equation:

\[Heritability=\frac{\sigma_{G}}{ \sigma_{G}+\frac{\sigma_{GE}}{n_{E}}+\frac{\sigma_\varepsilon}{n_{E}*n_{rep}}}\] where \(\sigma_{G}\) is the genotype effects’ variance, \(\sigma_{GE}\) is the GxE effects’ variance, \(\sigma_{\varepsilon}\) is the residual error variance, \(n_{E}\) is the number of environments, and \(n_{rep}\) is the number of reps.

The heritability of 85% is sufficient for a trait as complex as grain yield. This indicates that the expressed phenotypes for each genotype are heritable between the two reps and across different environments.

VC<-as.data.frame(print(VarCorr(anova ), comp=c("Variance")))
##  Groups        Name        Variance
##  Pedigree:Env  (Intercept) 1.274513
##  Pedigree      (Intercept) 1.307525
##  Pass          (Intercept) 1.919123
##  Replicate:Env (Intercept) 0.165155
##  Range         (Intercept) 0.036137
##  Env           (Intercept) 3.780349
##  Residual                  2.699776
VC$Percent<-round(VC$vcov/sum(VC$vcov)*100,2)
      
heritability <- VC[2,6] / (VC[2,6] +  VC[1,6]/length(unique(df$Env)) +  (VC[7,6]/length(unique(df$Env))*2))

VC[,7] <-rmse
VC[,8] <-heritability
VC[,9] <-R
names(VC)[7:9] <- c("Rmse","Heritability","R")
head(VC)
##             grp        var1 var2       vcov     sdcor Percent     Rmse
## 1  Pedigree:Env (Intercept) <NA> 1.27451264 1.1289432   11.40 1.376666
## 2      Pedigree (Intercept) <NA> 1.30752544 1.1434708   11.69 1.376666
## 3          Pass (Intercept) <NA> 1.91912294 1.3853241   17.16 1.376666
## 4 Replicate:Env (Intercept) <NA> 0.16515498 0.4063926    1.48 1.376666
## 5         Range (Intercept) <NA> 0.03613693 0.1900972    0.32 1.376666
## 6           Env (Intercept) <NA> 3.78034869 1.9443119   33.81 1.376666
##   Heritability        R
## 1    0.8503098 0.758573
## 2    0.8503098 0.758573
## 3    0.8503098 0.758573
## 4    0.8503098 0.758573
## 5    0.8503098 0.758573
## 6    0.8503098 0.758573
head(VC)
##             grp        var1 var2       vcov     sdcor Percent     Rmse
## 1  Pedigree:Env (Intercept) <NA> 1.27451264 1.1289432   11.40 1.376666
## 2      Pedigree (Intercept) <NA> 1.30752544 1.1434708   11.69 1.376666
## 3          Pass (Intercept) <NA> 1.91912294 1.3853241   17.16 1.376666
## 4 Replicate:Env (Intercept) <NA> 0.16515498 0.4063926    1.48 1.376666
## 5         Range (Intercept) <NA> 0.03613693 0.1900972    0.32 1.376666
## 6           Env (Intercept) <NA> 3.78034869 1.9443119   33.81 1.376666
##   Heritability        R
## 1    0.8503098 0.758573
## 2    0.8503098 0.758573
## 3    0.8503098 0.758573
## 4    0.8503098 0.758573
## 5    0.8503098 0.758573
## 6    0.8503098 0.758573
dim(VC)
## [1] 7 9

The grain yield BLUPs were calculated by adding the predicted GxE effects for an individual (using coef()) with the conditional mode of an individual’s environmental and genotype effect (using ranef()).

E <- ranef(anova)$'Env'
G <- ranef(anova)$'Pedigree'
GE <- coef(anova)$'Pedigree:Env'
head(E)
##           (Intercept)
## DEH1.2020  -1.2274882
## DEH1.2021   1.1569459
## GAH1.2020  -2.0455058
## GAH1.2021  -0.1697844
## IAH1.2021  -0.7298402
## IAH2.2021   1.1206849
dim(E)
## [1] 29  1
head(G)
##                  (Intercept)
## 2369/LH123HT       1.4257757
## 6046-BECKS         1.5006263
## ARPA W22 (X17EA)  -4.9711441
## B14A/H95          -0.2823990
## B14A/MO17         -0.1026897
## B14A/OH43          0.0660036
dim(E)
## [1] 29  1
head(GE)
##                        (Intercept)
## 2369/LH123HT:DEH1.2020    10.69428
## 2369/LH123HT:DEH1.2021    10.70033
## 2369/LH123HT:GAH1.2020    11.44766
## 2369/LH123HT:GAH1.2021    10.33503
## 2369/LH123HT:IAH1.2021    11.93277
## 2369/LH123HT:IAH2.2021    11.88455
dim(GE)
## [1] 16680     1

After calculating the BLUPs, they are stored in a separate data frame where ONLY the genotypes with a Wisconsin stiff stalk parent are kept.

##       Yield                 interaction          Pedigree       Env Tester
## 1  9.850044 W10004_0006/PHK76:DEH1.2020 W10004_0006/PHK76 DEH1.2020  PHK76
## 2 11.844513 W10004_0006/PHK76:DEH1.2021 W10004_0006/PHK76 DEH1.2021  PHK76
## 3  9.213949 W10004_0006/PHK76:IAH1.2021 W10004_0006/PHK76 IAH1.2021  PHK76
## 4 10.960722 W10004_0006/PHK76:IAH2.2021 W10004_0006/PHK76 IAH2.2021  PHK76
## 5 12.002941 W10004_0006/PHK76:IAH3.2021 W10004_0006/PHK76 IAH3.2021  PHK76
## 6 10.751848 W10004_0006/PHK76:IAH4.2021 W10004_0006/PHK76 IAH4.2021  PHK76
##        Inbred Location Year
## 1 W10004_0006     DEH1 2020
## 2 W10004_0006     DEH1 2021
## 3 W10004_0006     IAH1 2021
## 4 W10004_0006     IAH2 2021
## 5 W10004_0006     IAH3 2021
## 6 W10004_0006     IAH4 2021
## [1] 15490     8

Reaction Norm Modeling

Data Files Needed:

  • Yield.wide.na.csv
  • env.index.phk76.GY.csv
  • env.index.php02.GY.csv
  • env.index.phz51.GY.csv

Code:

The following code goes through each tester population to plot the grain yield plasticity parameters for each hybrid – the fit slopes and intercepts when plotting the hybrid grain yield BLUPs against the PTR values identified from the environmental index search.

  • PHK76 Analysis
pacman::p_load(ggplot2,ggrepel,ggpubr,dplyr,data.table)

git_path <- "https://raw.githubusercontent.com/fatmaoz25/ReactionNormGenomicPrediction/refs/heads/main"

yield_file <- "Yield.wide.na.csv"
yield_path <- file.path(git_path, yield_file)
GY.wide <- read.csv(yield_path)

GY.phk76 <- GY.wide[,c(2,3,4,5,6,7,8,9,10,11,12,13)]
GY.phk76 <- cbind(GY.phk76[,1], stack(GY.phk76[,2:ncol(GY.phk76)]))
names(GY.phk76) <- c("Inbred", "GY", "env")

GY.phk76 <- GY.phk76 %>% tidyr::separate(env, into = c("Tester", "env"), sep = "_")
head(GY.phk76)
##        Inbred        GY Tester       env
## 1 W10004_0007 10.395594  PHK76 DEH1.2020
## 2 W10004_0011  8.371664  PHK76 DEH1.2020
## 3 W10004_0014 11.050319  PHK76 DEH1.2020
## 4 W10004_0018  9.368259  PHK76 DEH1.2020
## 5 W10004_0026 10.437017  PHK76 DEH1.2020
## 6 W10004_0032  9.778872  PHK76 DEH1.2020
dim(GY.phk76)
## [1] 1914    4
phk_file <- "env.index.phk76.GY.csv"
phk_path <- file.path(git_path, phk_file)
env.index.phk76.GY <- read.csv(phk_path)
head(env.index.phk76.GY)
##       X       env window_size start_day end_day mean_temperature      Mean
## 1   226 DEH1.2020           2       100     101        0.8361215  9.406983
## 2  8227 DEH1.2021           2       100     101        0.9914683 11.611270
## 3 16228 IAH1.2021           2       100     101        0.7875916  9.866236
## 4 24229 IAH2.2021           2       100     101        0.9933251 11.465615
## 5 32230 IAH3.2021           2       100     101        0.9527715 11.319659
## 6 40231 IAH4.2021           2       100     101        0.8669274 10.809764
##         cor
## 1 0.9582252
## 2 0.9582252
## 3 0.9582252
## 4 0.9582252
## 5 0.9582252
## 6 0.9582252
dim(env.index.phk76.GY)
## [1] 11  8
GY.phk76.stability <- dplyr::left_join(GY.phk76, env.index.phk76.GY, by="env")
head(GY.phk76.stability)
##        Inbred        GY Tester       env   X window_size start_day end_day
## 1 W10004_0007 10.395594  PHK76 DEH1.2020 226           2       100     101
## 2 W10004_0011  8.371664  PHK76 DEH1.2020 226           2       100     101
## 3 W10004_0014 11.050319  PHK76 DEH1.2020 226           2       100     101
## 4 W10004_0018  9.368259  PHK76 DEH1.2020 226           2       100     101
## 5 W10004_0026 10.437017  PHK76 DEH1.2020 226           2       100     101
## 6 W10004_0032  9.778872  PHK76 DEH1.2020 226           2       100     101
##   mean_temperature     Mean       cor
## 1        0.8361215 9.406983 0.9582252
## 2        0.8361215 9.406983 0.9582252
## 3        0.8361215 9.406983 0.9582252
## 4        0.8361215 9.406983 0.9582252
## 5        0.8361215 9.406983 0.9582252
## 6        0.8361215 9.406983 0.9582252
dim(GY.phk76.stability)
## [1] 1914   11
  • PHP02 Analysis
git_path <- "https://raw.githubusercontent.com/fatmaoz25/ReactionNormGenomicPrediction/refs/heads/main"

yield_file <- "Yield.wide.na.csv"
yield_path <- file.path(git_path, yield_file)
GY.wide <- read.csv(yield_path)

GY.php02 <- GY.wide[,c(2,14:27)]
head(GY.php02)
##        Inbred PHP02_DEH1.2020 PHP02_DEH1.2021 PHP02_IAH2.2021 PHP02_IAH3.2021
## 1 W10004_0007        9.797786        12.84561        11.73630       11.843965
## 2 W10004_0011        9.815967        11.56275        11.23017       11.297544
## 3 W10004_0014       11.129001        14.00365        14.94558       14.603535
## 4 W10004_0018        8.519591        10.47899        10.12657        9.690322
## 5 W10004_0026       10.933065        12.81375        13.40871       12.058114
## 6 W10004_0032        9.693562        12.10590        12.20176       12.389796
##   PHP02_IAH4.2021 PHP02_MIH1.2021 PHP02_MNH1.2020 PHP02_MNH1.2021
## 1        11.59665       14.424982       12.551783        11.60085
## 2        10.37560       12.093223       10.532706        10.96900
## 3        13.52296       15.548537       14.191185        13.62540
## 4        10.06994        9.963197        8.576707        10.02438
## 5        12.39005       13.851457       12.152802        12.09161
## 6        11.26940       13.467011       11.654503        11.44477
##   PHP02_OHH1.2020 PHP02_WIH1.2020 PHP02_WIH1.2021 PHP02_WIH2.2020
## 1        14.45873        12.82516        14.20236       13.888174
## 2        13.58197        11.69032        13.30720        9.696058
## 3        16.49231        14.85796        15.97108       12.645385
## 4        12.64049        10.55475        12.36739       10.381469
## 5        15.08682        13.62246        13.84801       13.674190
## 6        14.18127        11.78967        12.78811       10.761356
##   PHP02_WIH2.2021 PHP02_WIH3.2021
## 1        13.63586       11.557067
## 2        11.87131       10.402453
## 3        14.88880       14.251496
## 4        11.57282        8.698027
## 5        13.77506       12.408980
## 6        11.82213       10.774981
GY.php02 <- cbind(GY.php02[,1], stack(GY.php02[,2:ncol(GY.php02)]))
names(GY.php02) <- c("Inbred", "GY", "env")
GY.php02 <- GY.php02 %>% tidyr::separate(env, into = c("Tester", "env"), sep = "_")

head(GY.php02)
##        Inbred        GY Tester       env
## 1 W10004_0007  9.797786  PHP02 DEH1.2020
## 2 W10004_0011  9.815967  PHP02 DEH1.2020
## 3 W10004_0014 11.129001  PHP02 DEH1.2020
## 4 W10004_0018  8.519591  PHP02 DEH1.2020
## 5 W10004_0026 10.933065  PHP02 DEH1.2020
## 6 W10004_0032  9.693562  PHP02 DEH1.2020
php_file <- "env.index.php02.GY.csv"
php_path <- file.path(git_path, php_file)
env.index.php02.GY <- read.csv(php_path)
head(env.index.php02.GY)
##       X       env window_size start_day end_day mean_temperature      Mean
## 1   193 DEH1.2020           2        67      68        0.6759941  9.872847
## 2  8194 DEH1.2021           2        67      68        1.0813678 12.255644
## 3 40198 IAH2.2021           2        67      68        0.8612614 12.149298
## 4 48199 IAH3.2021           2        67      68        1.1491964 11.808447
## 5 56200 IAH4.2021           2        67      68        0.8160279 11.323348
## 6 80203 MIH1.2021           2        67      68        1.1038191 13.092889
##         cor
## 1 0.6926673
## 2 0.6926673
## 3 0.6926673
## 4 0.6926673
## 5 0.6926673
## 6 0.6926673
GY.php02.stability <- dplyr::left_join(GY.php02, env.index.php02.GY, by="env")
head(GY.php02.stability)
##        Inbred        GY Tester       env   X window_size start_day end_day
## 1 W10004_0007  9.797786  PHP02 DEH1.2020 193           2        67      68
## 2 W10004_0011  9.815967  PHP02 DEH1.2020 193           2        67      68
## 3 W10004_0014 11.129001  PHP02 DEH1.2020 193           2        67      68
## 4 W10004_0018  8.519591  PHP02 DEH1.2020 193           2        67      68
## 5 W10004_0026 10.933065  PHP02 DEH1.2020 193           2        67      68
## 6 W10004_0032  9.693562  PHP02 DEH1.2020 193           2        67      68
##   mean_temperature     Mean       cor
## 1        0.6759941 9.872847 0.6926673
## 2        0.6759941 9.872847 0.6926673
## 3        0.6759941 9.872847 0.6926673
## 4        0.6759941 9.872847 0.6926673
## 5        0.6759941 9.872847 0.6926673
## 6        0.6759941 9.872847 0.6926673
  • PHZ51 Analysis
git_path <- "https://raw.githubusercontent.com/fatmaoz25/ReactionNormGenomicPrediction/refs/heads/main"

yield_file <- "Yield.wide.na.csv"
yield_path <- file.path(git_path, yield_file)
GY.wide <- read.csv(yield_path)

GY.phz51 <- GY.wide[,c(2,28:47)]
head(GY.phz51)
##        Inbred PHZ51_DEH1.2020 PHZ51_DEH1.2021 PHZ51_GAH1.2020 PHZ51_GAH1.2021
## 1 W10004_0007        9.288831        11.21449        8.949733       10.305163
## 2 W10004_0011        7.091116        10.25501        6.591163        8.899289
## 3 W10004_0014        9.398324        11.84821        8.225487        9.566614
## 4 W10004_0018        6.668914        10.32386        6.935572        8.801453
## 5 W10004_0026        7.800206        11.08133        8.321225       10.032927
## 6 W10004_0032        7.589699        11.23896        7.073529        9.342681
##   PHZ51_IAH1.2021 PHZ51_IAH2.2021 PHZ51_IAH3.2021 PHZ51_IAH4.2021
## 1        8.101275        11.82649       12.210923       10.163747
## 2        7.812028        10.34810        8.924262        9.195727
## 3       10.358976        12.02660       11.069983       11.259271
## 4        8.906650        10.80153       10.629378        9.640500
## 5        8.950674        10.78451        9.093160        9.524618
## 6        9.313303        11.44250       10.764499       10.134623
##   PHZ51_NCH1.2021 PHZ51_NEH1.2020 PHZ51_NEH1.2021 PHZ51_NEH2.2020
## 1        8.574247        9.314713       11.192281       10.652657
## 2        7.183664        7.953082        7.573318        7.175881
## 3        8.550132        9.632448       11.335435       10.293771
## 4        7.597329        7.205523        6.610941        9.824061
## 5        8.611617        7.208446        9.142053       12.676529
## 6        8.669857        8.602385       10.711584       10.680057
##   PHZ51_NEH3.2020 PHZ51_SCH1.2021 PHZ51_TXH1.2020 PHZ51_TXH2.2020
## 1        10.59540        7.200764        9.207735        9.392135
## 2        10.14526        5.761004        7.924498        5.921937
## 3        11.48039        8.362518       10.577292        7.474514
## 4        10.13814        6.655994        8.097084        5.973930
## 5        10.18102        7.049889        9.664098        7.461703
## 6        10.26019        7.836149       10.140324        8.036569
##   PHZ51_TXH2.2021 PHZ51_TXH3.2021 PHZ51_WIH2.2020 PHZ51_WIH2.2021
## 1        8.875863        4.202074        9.474609        11.80527
## 2        8.953560        3.310938        9.301504        10.97433
## 3        8.763002        4.377614       12.311423        13.74152
## 4        7.387736        3.414941       11.224346        11.00785
## 5       10.116463        3.720163       10.653223        11.03379
## 6        9.775729        3.715120       12.204420        11.63943
GY.phz51 <- cbind(GY.phz51[,1], stack(GY.phz51[,2:ncol(GY.phz51)]))
names(GY.phz51) <- c("Inbred", "GY", "env")
GY.phz51 <- GY.phz51 %>% tidyr::separate(env, into = c("Tester", "env"), sep = "_")

head(GY.phz51)
##        Inbred       GY Tester       env
## 1 W10004_0007 9.288831  PHZ51 DEH1.2020
## 2 W10004_0011 7.091116  PHZ51 DEH1.2020
## 3 W10004_0014 9.398324  PHZ51 DEH1.2020
## 4 W10004_0018 6.668914  PHZ51 DEH1.2020
## 5 W10004_0026 7.800206  PHZ51 DEH1.2020
## 6 W10004_0032 7.589699  PHZ51 DEH1.2020
phz_file <- "env.index.phz51.GY.csv"
phz_path <- file.path(git_path, phz_file)
env.index.phz51.GY <- read.csv(phz_path)
head(env.index.phz51.GY)
##       X       env window_size start_day end_day mean_temperature      Mean
## 1  4644 DEH1.2020          45        46      90        0.7344577  8.204401
## 2 12645 DEH1.2021          45        46      90        0.8972993 10.899782
## 3 20646 GAH1.2020          45        46      90        0.8173709  7.855889
## 4 28647 GAH1.2021          45        46      90        0.8828835  9.798067
## 5 36648 IAH1.2021          45        46      90        1.0103203  9.140951
## 6 44649 IAH2.2021          45        46      90        0.9488085 11.282331
##         cor
## 1 0.7326232
## 2 0.7326232
## 3 0.7326232
## 4 0.7326232
## 5 0.7326232
## 6 0.7326232
GY.phz51.stability <- dplyr::left_join(GY.phz51, env.index.phz51.GY, by="env")
head(GY.phz51.stability)
##        Inbred       GY Tester       env    X window_size start_day end_day
## 1 W10004_0007 9.288831  PHZ51 DEH1.2020 4644          45        46      90
## 2 W10004_0011 7.091116  PHZ51 DEH1.2020 4644          45        46      90
## 3 W10004_0014 9.398324  PHZ51 DEH1.2020 4644          45        46      90
## 4 W10004_0018 6.668914  PHZ51 DEH1.2020 4644          45        46      90
## 5 W10004_0026 7.800206  PHZ51 DEH1.2020 4644          45        46      90
## 6 W10004_0032 7.589699  PHZ51 DEH1.2020 4644          45        46      90
##   mean_temperature     Mean       cor
## 1        0.7344577 8.204401 0.7326232
## 2        0.7344577 8.204401 0.7326232
## 3        0.7344577 8.204401 0.7326232
## 4        0.7344577 8.204401 0.7326232
## 5        0.7344577 8.204401 0.7326232
## 6        0.7344577 8.204401 0.7326232

The data sets for the individual tester populations from the above code are combined below to create a combined data set to display the reaction norms of each hybrid across the numerically-ordered environments according to increasing mean PTR.

  • Reaction Norm Plotting
GY.stability <- rbind(GY.phk76.stability,GY.php02.stability,GY.phz51.stability)
head(GY.stability)
##        Inbred        GY Tester       env   X window_size start_day end_day
## 1 W10004_0007 10.395594  PHK76 DEH1.2020 226           2       100     101
## 2 W10004_0011  8.371664  PHK76 DEH1.2020 226           2       100     101
## 3 W10004_0014 11.050319  PHK76 DEH1.2020 226           2       100     101
## 4 W10004_0018  9.368259  PHK76 DEH1.2020 226           2       100     101
## 5 W10004_0026 10.437017  PHK76 DEH1.2020 226           2       100     101
## 6 W10004_0032  9.778872  PHK76 DEH1.2020 226           2       100     101
##   mean_temperature     Mean       cor
## 1        0.8361215 9.406983 0.9582252
## 2        0.8361215 9.406983 0.9582252
## 3        0.8361215 9.406983 0.9582252
## 4        0.8361215 9.406983 0.9582252
## 5        0.8361215 9.406983 0.9582252
## 6        0.8361215 9.406983 0.9582252
GY.stability <- GY.stability %>% group_by(env,Tester) %>% mutate(ScaledGY = scale(GY, center = TRUE, scale = TRUE)) %>% as.data.frame()

GY.stability <- GY.stability %>% group_by(Inbred,Tester) %>% dplyr::mutate(Slope = coef(lm(ScaledGY ~ mean_temperature))[2], Intercept = coef(lm(ScaledGY ~ mean_temperature))[1], Predicted.ScaledGY = predict(lm(ScaledGY ~ mean_temperature))) %>% as.data.frame()

GY.stability <- as.data.frame(lapply(GY.stability, unlist))
GY.stability$Slope <- as.numeric(GY.stability$Slope)


result <- GY.stability %>%  group_by(env,Tester) %>% dplyr::mutate(Meann = mean(Predicted.ScaledGY)) %>% as.data.frame()
result <- result %>% distinct(env,Tester, .keep_all = TRUE)
head(result)
##        Inbred       GY Tester       env     X window_size start_day end_day
## 1 W10004_0007 10.39559  PHK76 DEH1.2020   226           2       100     101
## 2 W10004_0007 12.45757  PHK76 DEH1.2021  8227           2       100     101
## 3 W10004_0007 10.80473  PHK76 IAH1.2021 16228           2       100     101
## 4 W10004_0007 12.04884  PHK76 IAH2.2021 24229           2       100     101
## 5 W10004_0007 12.16717  PHK76 IAH3.2021 32230           2       100     101
## 6 W10004_0007 11.76778  PHK76 IAH4.2021 40231           2       100     101
##   mean_temperature      Mean       cor  ScaledGY       Slope Intercept
## 1        0.8361215  9.406983 0.9582252 1.0576008 -0.08205393 0.9387461
## 2        0.9914683 11.611270 0.9582252 0.7582314 -0.08205393 0.9387461
## 3        0.7875916  9.866236 0.9582252 0.9700140 -0.08205393 0.9387461
## 4        0.9933251 11.465615 0.9582252 0.6833282 -0.08205393 0.9387461
## 5        0.9527715 11.319659 0.9582252 0.7458227 -0.08205393 0.9387461
## 6        0.8669274 10.809764 0.9582252 1.1312101 -0.08205393 0.9387461
##   Predicted.ScaledGY         Meann
## 1          0.8701390  7.089187e-17
## 2          0.8573922 -2.055545e-18
## 3          0.8741211  8.607148e-17
## 4          0.8572398 -6.220844e-18
## 5          0.8605674  1.315849e-17
## 6          0.8676113  4.745394e-17
p12<- ggplot(GY.stability, aes(x=mean_temperature, y=Predicted.ScaledGY, group=Inbred, color=Slope)) +
  geom_line(alpha=0.4) +
  # geom_smooth(se=F)+
  geom_point(alpha=0.3)+
  # geom_text( result, mapping=aes(label = Env, x=Mean, y=Meann-3), color='black', angle=90, size=1, alpha=0.8)+
  geom_point(result, mapping=aes(x=mean_temperature, y=2.48 ),color="black" ,shape=24,alpha=0.5   )+
  geom_text_repel(result,mapping=aes(x=mean_temperature, y=2.48, label=env ),color="black",size=2,
                  force_pull   = 0, # do not pull toward data points
                  nudge_y      = 0.05,
                  direction    = "x",
                  angle        = 90,
                  hjust        = 0,
                  segment.size = 0.2,
                  max.iter = 1e4, max.time = 1
  )+
  #scale_color_brewer(palette = "Dark2")+
  # geom_errorbar(aes(ymin=Mean-sd, ymax=Mean+sd), width=.2,
  #                position=position_dodge(0.05))+
  scale_y_continuous(name="Predicted GY")+
  theme(legend.position="right") +
  theme_bw()+
  facet_grid(~Tester,scales ="free_x")+
  scale_x_continuous(name = "Environmental index"  )+
  theme(#legend.background = element_blank(),
    legend.background =  element_rect(fill = alpha("gray80", 0.3) , size=0.2, linetype="solid"),
    legend.key.height = unit(0.5, 'cm'),
    panel.grid.major  = element_blank(),
    panel.grid.minor  = element_blank(),
    legend.key.size = unit(12,"pt"),
    axis.text.y = element_text(angle = 90, vjust=0.8,  hjust=0.5),
    legend.position   = c(0.75, 0.25))+
  scale_colour_gradient2("Slope", low = "navy", mid = "cyan", high = "red", midpoint = mean(GY.stability$Slope, na.rm=T) )
#  guides(color = guide_legend(override.aes = list(size = 1,alpha = 1) ) )+
#  ggtitle("A) Slope of the temporal plant height across environments")

p12

Genomic Prediction

Data Files Needed:

  • All.Pred.Ability.reactionnorm.csv
  • All.Pred.Ability.reactionnorm.Formatted.csv
  • All.Pred.Ability.additive.csv
  • All.Pred.Ability.additive.Formatted.csv

Code:

The scripts for the genomic prediction models (\(A\) and \(B\)) were run using the Texas A&M’s High-Performance Research Computing (HPRC) due to the computational power required. The code for the genomic prediction models are not included within this RMarkdown file, but can be found in the same GitHub repository.

The code below uses the outputs of the genomic prediction models to make the prediction accuracy figures. They are split according to common plant breeding scenarios (CV0, CV00, CV1, and CV2) as found in (Jarquin, 2017).

pacman::p_load(readr, ggplot2, ggrepel)

git_path <- "https://raw.githubusercontent.com/fatmaoz25/ReactionNormGenomicPrediction/refs/heads/main"

combined_data_file <- "All.Pred.Ability.additive.csv"
combined_data_path <- file.path(git_path, combined_data_file)
combined_data <- read.csv(combined_data_path)

df <- combined_data
df <- df %>% group_by(CV,Tester) %>% dplyr::summarise(mean =round(mean(cor),2) ,
                                                      sd =round(sd(cor),2)) %>% as.data.frame()

df$CV <- factor(df$CV ,levels = c("CV1", "CV2", "CV0", "CV00"))
df$facet <- df$CV
df$facet <- gsub("CV1", "Unested hybrids in \ntested environments", df$facet)
df$facet <- gsub("CV2", "Tested hybrids in \ntested environments", df$facet)
df$facet <- gsub("CV00", "Untested hybrids in \nuntested environments", df$facet)
df$facet <- gsub("CV0", "Tested hybrids in \nuntested environments", df$facet)

head(df)
##     CV Tester mean   sd                                       facet
## 1  CV0  PHK76 0.82 0.05   Tested hybrids in \nuntested environments
## 2  CV0  PHP02 0.78 0.06   Tested hybrids in \nuntested environments
## 3  CV0  PHZ51 0.65 0.09   Tested hybrids in \nuntested environments
## 4 CV00  PHK76 0.35 0.11 Untested hybrids in \nuntested environments
## 5 CV00  PHP02 0.34 0.12 Untested hybrids in \nuntested environments
## 6 CV00  PHZ51 0.27 0.12 Untested hybrids in \nuntested environments
df$facet <- factor(df$facet ,levels = c("Tested hybrids in \ntested environments",
                                        "Unested hybrids in \ntested environments",
                                        "Tested hybrids in \nuntested environments",
                                        "Untested hybrids in \nuntested environments"))

p <- ggplot(df, aes(x=Tester, y=mean, fill=Tester)) +
  geom_bar(stat="identity", position=position_dodge()) +
  geom_text(aes(label=paste(sprintf("%.2f", mean),  "\u00B1",  sprintf("%.2f", sd), sep=" "),   y=0.12),
            position=position_dodge(0.9),  angle=90, color="black", size=3.5)+
  geom_errorbar(aes(ymin=mean, ymax=mean+sd), width=.2,
                position=position_dodge(.9)) +
  facet_wrap(~facet)+
  scale_y_continuous("Prediction ability")+
  scale_x_discrete("Population")+
  scale_fill_brewer(palette="Set2")+
  theme(legend.position = c(0.9,0.35),
        legend.background = element_blank())
p

pacman::p_load(readr, ggplot2, ggrepel)

git_path <- "https://raw.githubusercontent.com/fatmaoz25/ReactionNormGenomicPrediction/refs/heads/main"

combined_data_file <- "All.Pred.Ability.reactionnorm.csv"
combined_data_path <- file.path(git_path, combined_data_file)
combined_data <- read.csv(combined_data_path)

df <- combined_data
df <- df %>% group_by(CV,Tester) %>% dplyr::summarise(mean =round(mean(cor),2) ,
                                                      sd =round(sd(cor),2)) %>% as.data.frame()

df$CV <- factor(df$CV ,levels = c("CV1", "CV2", "CV0", "CV00"))
df$facet <- df$CV
df$facet <- gsub("CV1", "Unested hybrids in \ntested environments", df$facet)
df$facet <- gsub("CV2", "Tested hybrids in \ntested environments", df$facet)
df$facet <- gsub("CV00", "Untested hybrids in \nuntested environments", df$facet)
df$facet <- gsub("CV0", "Tested hybrids in \nuntested environments", df$facet)

head(df)
##     CV Tester mean   sd                                       facet
## 1  CV0  phk76 0.74 0.07   Tested hybrids in \nuntested environments
## 2  CV0  php02 0.70 0.07   Tested hybrids in \nuntested environments
## 3  CV0  phz51 0.62 0.08   Tested hybrids in \nuntested environments
## 4 CV00  phk76 0.36 0.12 Untested hybrids in \nuntested environments
## 5 CV00  php02 0.39 0.11 Untested hybrids in \nuntested environments
## 6 CV00  phz51 0.26 0.13 Untested hybrids in \nuntested environments
df$facet <- factor(df$facet ,levels = c("Tested hybrids in \ntested environments",
                                        "Unested hybrids in \ntested environments",
                                        "Tested hybrids in \nuntested environments",
                                        "Untested hybrids in \nuntested environments"))

p <- ggplot(df, aes(x=Tester, y=mean, fill=Tester)) +
  geom_bar(stat="identity", position=position_dodge()) +
  geom_text(aes(label=paste(sprintf("%.2f", mean),  "\u00B1",  sprintf("%.2f", sd), sep=" "),   y=0.12),
            position=position_dodge(0.9),  angle=90, color="black", size=3.5)+
  geom_errorbar(aes(ymin=mean, ymax=mean+sd), width=.2,
                position=position_dodge(.9)) +
  facet_wrap(~facet)+
  scale_y_continuous("Prediction ability")+
  scale_x_discrete("Population")+
  scale_fill_brewer(palette="Set2")+
  theme(legend.position = c(0.9,0.35),
        legend.background = element_blank())
p