Introduction.

This document contains information about candidate varieties to grow to test predictions about CDBN variety performance. There are four types of variety for which I’d like to test these predictions:

1. Varieties that are predicted to perform well across all sites (with low GxE).
2. Varieties that are predicted to perform well in major bean growing areas (with high GxE).
3. Varieties that are predicted to perform well at specific sites.
4. Varieties that are predicted to perform poorly at specific sites.

The tables below also contain information on possible seed sources for these field trials - I am bulking seed in Puerto Rico currently, but many of these varieties are also in the MDP, DDP, or ADP, and you might have seed for these varieties already.

i. Methods, in brief:

These predictions are based on a Finlay-Wilkinson (FW) model fitted using a Bayesian Gibbs sampler and a matrix of variety relatedness (A). The Gibbs sampler shrinks estimates for each variety towards the average performance of the model, and generally gives better predictive power than an ordinary least squares model. The A matrix was calculated in Tassel using its recommended methodology (centered IBS). The SNP matrix used in Tassel was generated from the GBS data collected by Phil McClean, Rian Lee, and Alice MacQueen using the ApeKI enzyme, aligned using bwa mem to the P. vulgaris genome V 2.0, and with SNP calls using NGSEP.

The FW model was fitted for 312 bean varieties across 30 CDBN locations. The 30 locations were picked from 77 CDBN locations by selecting locations that had grown ten check varieties (CELRK, Fleetwood, Montcalm, NW63, Othello, Midnight, Redkloud, UI114, UI59, & Viva) at least once. The year each variety was grown was ignored in this model. Future work will account for the effect of year by incorporating daily weather data into the model.

In all sections below, you can optionally view the R code for by clicking on the “Code” button on the right. Just below are some sections for loading the data and preparing the dataframes.

load_all_experiments(laptop = TRUE)
wbA <- loadWorkbook("FW_GibbsA_Full-312var-30env_for-R-and-Tassel_2018-01-05_v03.xlsx")
FW2b_GibbsA_lst = readWorksheet(wbA, sheet = getSheets(wbA))
FW2b_GibbsA_lst$FW_data <- as_tibble(FW2b_GibbsA_lst$Data)
FW2b_GibbsA_lst$FW_data_var <- as_tibble(FW2b_GibbsA_lst$Varieties)
FW2b_GibbsA_lst$FW_data_env <- as_tibble(FW2b_GibbsA_lst$Environment)
wbB <- loadWorkbook("../../CDBN Variety Info/CDBN_Metadata_PR_2017-10-27.xlsx")
PR_list = readWorksheet(wbB, sheet = getSheets(wbB))
PR_rows <- as_tibble(PR_list$`Shipping Manifest`)
FW_g <- load_Tassel_MLM(path = "FW_GWAS_MLM_Outputs/", phenotype = "FW_GibbsA_312var_30env_g")
FW_b <- load_Tassel_MLM(path = "FW_GWAS_MLM_Outputs/", phenotype = "FW_GibbsA_312var_30env_b")
FW_SDg <- load_Tassel_MLM(path = "FW_GWAS_MLM_Outputs/", phenotype = "FW_GibbsA_312var_30env_SD_g")

# Load the Tassel GWAS outputs for type II stability (deviation of each variety from the FW model slope, b) and the genetic effect (intercept for each variety, g).
Seed_data <- Germplasm %>%
  dplyr::select(CDBN_ID, Seq_ID, Market_class_ahm:Race, Year, GBS_Panel, MDP_ID, In.DDP, Seed.From.1, Seed.From.2)

Seed_data <- Seed_data %>%
  left_join(PR_rows)

FWPred <- FW2b_GibbsA_lst$FW_data %>%
  left_join(Seed_data)

# Join the prediction data with information about possible seed sources - are varieties in the MDP, DDP, or ADP? Are they in the set I'm bulking in Puerto Rico? If not, I likely won't be able to test predictions for these varieties.

ii. How to interpret a Finlay-Wilkinson Plot

The following plot displays Finlay-Wilkinson results for three check varieties: Fleetwood, Viva, and Montcalm. 30 locations from the CDBN are arranged along the x-axis in order of how well bean varieties yield, on average, at that location. Othello, WA (WAOT) is the highest yielding location of any in the dataset, and Lubbock, TX (TXLU) is the worst. The location codes always have two letters indicating the state first, followed by the first two letters of the site name, so usually you can guess pretty accurately what site it is if you know the CDBN locations already. I’ve included all 30 locations on the x-axis of this plot, so it is unreadable in some sections, but I remove some location labels for the remaining plots to make every location readable.

In these plots, the points indicate actual data - here actual yield data from a year in the CDBN at that location. The lines indicate the predictions for variety performance. The dotted line is the predicted average variety performance across all sites. Vertical deviation from this dotted line indicates a genetic effect of that variety on performance (which is the values labeled “g”" in the tables below). A change in the slope of a variety’s line relative to the dotted line indicates a difference in the type II stability of this variety, which is a measure of GxE (the values labeled “b” in the tables below).

1. Consistently yielding varieties across all sites (with low GxE).

Here is a table of the ten varieties with the flattest slopes across 30 CDBN locations that I anticipate having a good number of seed to send out (I have at least one 28 foot row for these in my Puerto Rico growout). The slope of the FW model here is equivalent to 1 + b.

(Low_gxe <- Seed_data %>%
  left_join(FW2b_GibbsA_lst$Varieties) %>%
  filter(Type.of.Bulk %in% c("2x 28 foot row", "28 foot row")) %>%
  arrange(b) %>%
  dplyr::select(CDBN_ID, g, SD_g, b, SD_b, Type.of.Bulk, Market_class_ahm:GBS_Panel, In.DDP:Seed.From.2) %>%
  rename("Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>%
  distinct(CDBN_ID, .keep_all = TRUE) %>%
   filter(CDBN_ID != "Sapphire") %>%
  head(10))

As you can see from the following table, the yields predicted for each variety at the five sites you all are running are very similar (Predicted_Yield), even though the actual yields (Yield_kg_ha) can diverge from these predictions.

Five_Loc <- c("MOCO", "NDHA", "MISA", "NESB", "WAOT")

FWPred %>%
  filter(Location_code %in% Five_Loc & CDBN_ID %in% Low_gxe$CDBN_ID) %>%
  arrange(CDBN_ID, yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Location_code, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, Location_code, .keep_all = TRUE) #%>%
#  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row"))

1a. Candidate varieties with low GxE

Tentatively, here are the three varieties with flat slopes and little GxE that I would like to grow at five CDBN sites this year.

1b. Plots of other candidate varieties

Here are plots of the other six varieties with flat slopes, except for Redkloud, a test variety that we probably already know enough about! Some of these varieties do not have many or any datapoints on the left half of the x-axis here that are informing the model, which makes them more unreliable candidates in my mind. I’d welcome any feedback about line choice.

FW2b_data %>%
  left_join(FW2b_GibbsA_env, by = "Location_code") %>%
  filter(CDBN_ID %in% c("88728", "Fleetside", "Garnet")) %>%
  ggplot(mapping = aes(x = h, y = yhat)) +
  geom_line(aes(group = CDBN_ID, color = CDBN_ID)) +
  geom_point(aes(y = y, color = CDBN_ID), shape = 21) +
  theme(legend.position = c(0.25, .9), axis.text.x = element_text(angle = 45, hjust = 1)) + 
  geom_abline(intercept = 2500, slope = 1, linetype = "dotted") +
  scale_x_continuous(breaks = envlabel$h, labels = envlabel$Location_code) +
  coord_cartesian(ylim=c(-200, 5200)) + 
  labs(x = "Environment", y = "Yield (kg/ha)")

FW2b_data %>%
  left_join(FW2b_GibbsA_env, by = "Location_code") %>%
  filter(CDBN_ID %in% c("Ivory", "JM126", "Mogul")) %>%
  ggplot(mapping = aes(x = h, y = yhat)) +
  geom_line(aes(group = CDBN_ID, color = CDBN_ID)) +
  geom_point(aes(y = y, color = CDBN_ID), shape = 21) +
  theme(legend.position = c(0.25, .9), axis.text.x = element_text(angle = 45, hjust = 1)) + 
  geom_abline(intercept = 2500, slope = 1, linetype = "dotted") +
  scale_x_continuous(breaks = envlabel$h, labels = envlabel$Location_code) +
  coord_cartesian(ylim=c(-200, 5200)) + 
  labs(x = "Environment", y = "Yield (kg/ha)")

2. Best varieties in major bean growing areas (with high GxE).

Here is a table of ten varieties with the steepest slopes that I would like to send out. The idea here is that it might be more valuable to improve beans for the actual areas of the country where they are grown, at the expense of most of the poor sites in the CDBN, which are not in common bean growing regions.

(High_gxe <- Seed_data %>%
  left_join(FW2b_GibbsA_lst$Varieties) %>%
  filter(Type.of.Bulk %in% c("2x 28 foot row", "28 foot row")) %>%
  arrange(b) %>%
  dplyr::select(CDBN_ID, g, SD_g, b, SD_b, Type.of.Bulk, Market_class_ahm:GBS_Panel, In.DDP:Seed.From.2) %>%
  rename("Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>%
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  tail(10))

Many of these varieties are predicted to perform very poorly in Columbia, but quite well elsewhere in the US.

FWPred %>%
  filter(Location_code %in% Five_Loc & CDBN_ID %in% High_gxe$CDBN_ID) %>%
  arrange(CDBN_ID, yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Location_code, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, Location_code, .keep_all = TRUE) #%>%
#  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row"))

2a. Candidate varieties with high GxE

Here are the four varieties I’d like to grow at all five CDBN sites from this category. I could reduce this number if need be.

FW2b_data %>%
  left_join(FW2b_GibbsA_env, by = "Location_code") %>%
  filter(CDBN_ID %in% c("Buster", "AC_Ole", "UI465", "Matterhorn")) %>%
  ggplot(mapping = aes(x = h, y = yhat)) +
  geom_line(aes(group = CDBN_ID, color = CDBN_ID)) +
  geom_point(aes(y = y, color = CDBN_ID), shape = 21) +
  theme(legend.position = c(0.2, .9), axis.text.x = element_text(angle = 45, hjust = 1)) + 
  geom_abline(intercept = 2500, slope = 1, linetype = "dotted") +
  scale_x_continuous(breaks = envlabel$h, labels = envlabel$Location_code) +
  coord_cartesian(ylim=c(-200, 5200)) + 
  labs(x = "Environment", y = "Yield (kg/ha)")

2b. Plots of other candidate varieties

FW2b_data %>%
  left_join(FW2b_GibbsA_env, by = "Location_code") %>%
  filter(CDBN_ID %in% c("115M", "Avalanche", "US1140")) %>%
  ggplot(mapping = aes(x = h, y = yhat)) +
  geom_line(aes(group = CDBN_ID, color = CDBN_ID)) +
  geom_point(aes(y = y, color = CDBN_ID), shape = 21) +
  theme(legend.position = c(0.25, .9), axis.text.x = element_text(angle = 45, hjust = 1)) + 
  geom_abline(intercept = 2500, slope = 1, linetype = "dotted") +
  scale_x_continuous(breaks = envlabel$h, labels = envlabel$Location_code) +
  coord_cartesian(ylim=c(-200, 5200)) + 
  labs(x = "Environment", y = "Yield (kg/ha)")

FW2b_data %>%
  left_join(FW2b_GibbsA_env, by = "Location_code") %>%
  filter(CDBN_ID %in% c("BillZ", "Max", "Mackinac")) %>%
  ggplot(mapping = aes(x = h, y = yhat)) +
  geom_line(aes(group = CDBN_ID, color = CDBN_ID)) +
  geom_point(aes(y = y, color = CDBN_ID), shape = 21) +
  theme(legend.position = c(0.25, .9), axis.text.x = element_text(angle = 45, hjust = 1)) + 
  geom_abline(intercept = 2500, slope = 1, linetype = "dotted") +
  scale_x_continuous(breaks = envlabel$h, labels = envlabel$Location_code) +
  coord_cartesian(ylim=c(-200, 5200)) + 
  labs(x = "Environment", y = "Yield (kg/ha)")

3. Highest yielding varieties at specific sites.

I ordered the sites from best for beans to worst for beans here, according to the FW analysis. That order is: MOCO, NDHA, MISA, NESB, WAOT

For each site, three tables follow. The first is of the five varieties with the highest yield predictions that I could have a good amount of seed for from Puerto Rico. The second is the ten varieties that are in the MDP, DDP, or ADP that other breeders might have seed for. The third is of the 20 varieties predicted to perform the best at that site.

3a. MOCO (Columbia, Missouri)

My top choices for varieties at this site are: 55012, JM126, & Ivory.

5 highest yielding varieties with seed from Puerto Rico

If Alice’s bulk in Puerto Rico goes well, there should be seed from 1-2 28 foot rows for these varieties.

FWPred %>%
  filter(Location_code == "MOCO") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row")) %>%
  head(5)

10 highest yielding varieties in MDP, DDP, and ADP

FWPred %>%
  filter(Location_code == "MOCO") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(GBS_Panel %in% c("MDP","ADP") | In_DDP == "DDP") %>%
  head(10)

20 highest yielding varieties in the CDBN

FWPred %>%
  filter(Location_code == "MOCO") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  head(20)

3b. NDHA (Hatton, North Dakota)

My top choices for varieties at Hatton are: Buster, BillZ, & Yolano.

5 highest yielding varieties with seed from Puerto Rico

If Alice’s bulk in Puerto Rico goes well, there should be seed from 1-2 28 foot rows for these varieties.

FWPred %>%
  filter(Location_code == "NDHA") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row")) %>%
  head(5)

10 highest yielding varieties in MDP, DDP, and ADP

FWPred %>%
  filter(Location_code == "NDHA") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(GBS_Panel %in% c("MDP","ADP") | In_DDP == "DDP") %>%
  head(10)

20 highest yielding varieties in the CDBN

FWPred %>%
  filter(Location_code == "NDHA") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  head(20)

3c. MISA (Saginaw, Michigan)

My top choices for varieties at Saginaw are: UNS_117. MISA has grown almost everything…

5 highest yielding varieties with seed from Puerto Rico

If Alice’s bulk in Puerto Rico goes well, there should be seed from 1-2 28 foot rows for these varieties.

FWPred %>%
  filter(Location_code == "MISA") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row")) %>%
  head(5)

10 highest yielding varieties in MDP, DDP, and ADP

FWPred %>%
  filter(Location_code == "MISA") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(GBS_Panel %in% c("MDP","ADP") | In_DDP == "DDP") %>%
  head(10)

20 highest yielding varieties in the CDBN

FWPred %>%
  filter(Location_code == "MISA") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  head(20)

3d. NESB (Scottsbluff, Nebraska)

My top choices for this category for Scottsbluff are: Buster and Montrose (However, Buster is already in the high GxE set).

5 highest yielding varieties with seed from Puerto Rico

If Alice’s bulk in Puerto Rico goes well, there should be seed from 1-2 28 foot rows for these varieties.

FWPred %>%
  filter(Location_code == "NESB") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row")) %>%
  head(5)

10 highest yielding varieties in MDP, DDP, and ADP

FWPred %>%
  filter(Location_code == "NESB") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(GBS_Panel %in% c("MDP","ADP") | In_DDP == "DDP") %>%
  head(10)

20 highest yielding varieties in the CDBN

FWPred %>%
  filter(Location_code == "NESB") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  head(20)

3e. WAOT (Othello, Washington)

My top choices for this category at Othello are: Max, Jackpot, and Lariat.

5 highest yielding varieties with seed from Puerto Rico

If Alice’s bulk in Puerto Rico goes well, there should be seed from 1-2 28 foot rows for these varieties.

FWPred %>%
  filter(Location_code == "WAOT") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row")) %>%
  head(5)

10 highest yielding varieties in MDP, DDP, and ADP

FWPred %>%
  filter(Location_code == "WAOT") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(GBS_Panel %in% c("MDP","ADP") | In_DDP == "DDP") %>%
  head(10)

20 highest yielding varieties in the CDBN

FWPred %>%
  filter(Location_code == "WAOT") %>%
  arrange(desc(yhat)) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  head(20)

4. Lowest yielding varieties at individual sites.

I ordered the sites from best for beans to worst for beans here, as determined by the Finlay-Wilkinson analysis.

4a. MOCO (Columbia, Missouri)

My top choices for this category for Columbia are: Max, 115M

5 best varieties with seed from Puerto Rico

If Alice’s bulk in Puerto Rico goes well, there should be seed from 1-2 28 foot rows for these varieties.

FWPred %>%
  filter(Location_code == "MOCO") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row")) %>%
  head(5)

10 best varieties in MDP, DDP, and ADP

FWPred %>%
  filter(Location_code == "MOCO") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(GBS_Panel %in% c("MDP","ADP") | In_DDP == "DDP") %>%
  head(10)

20 lowest yielding varieties

FWPred %>%
  filter(Location_code == "MOCO") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  head(20)

4b. NDHA (Hatton, North Dakota)

My top choices for low predicted yield for Hatton are: AC_Calmont, CDC_Expression

5 best varieties with seed from Puerto Rico

If Alice’s bulk in Puerto Rico goes well, there should be seed from 1-2 28 foot rows for these varieties.

FWPred %>%
  filter(Location_code == "NDHA") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row")) %>%
  head(5)

10 lowest yielding varieties in MDP, DDP, and ADP

FWPred %>%
  filter(Location_code == "NDHA") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(GBS_Panel %in% c("MDP","ADP") | In_DDP == "DDP") %>%
  head(10)

20 best varieties in the CDBN

FWPred %>%
  filter(Location_code == "NDHA") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  head(20)

4c. MISA (Saginaw, Michigan)

My top choices for low predicted yield for Saginaw are: Cardinal, CDC_Expression

5 lowest yielding varieties with seed from Puerto Rico

If Alice’s bulk in Puerto Rico goes well, there should be seed from 1-2 28 foot rows for these varieties.

FWPred %>%
  filter(Location_code == "MISA") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row")) %>%
  head(5)

10 lowest yielding varieties in MDP, DDP, and ADP

FWPred %>%
  filter(Location_code == "MISA") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(GBS_Panel %in% c("MDP","ADP") | In_DDP == "DDP") %>%
  head(10)

20 lowest yielding varieties

FWPred %>%
  filter(Location_code == "MISA") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  head(20)

4d. NESB (Scottsbluff, Nebraska)

My top choices for low predicted yield at Scottsbluff are: Mogul & Cardinal.

5 lowest yielding varieties with seed from Puerto Rico

If Alice’s bulk in Puerto Rico goes well, there should be seed from 1-2 28 foot rows for these varieties.

FWPred %>%
  filter(Location_code == "NESB") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row")) %>%
  head(5)

10 lowest yielding varieties in MDP, DDP, and ADP

FWPred %>%
  filter(Location_code == "NESB") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(GBS_Panel %in% c("MDP","ADP") | In_DDP == "DDP") %>%
  head(10)

20 lowest yielding varieties

FWPred %>%
  filter(Location_code == "NESB") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  head(20)

4e. WAOT (Othello, Washington)

My top choices for low predicted yield at Othello are: 88728, Sapphire, Emerson, & Pindak.

5 lowest yielding varieties with seed from Puerto Rico

If Alice’s bulk in Puerto Rico goes well, there should be seed from 1-2 28 foot rows for these varieties.

FWPred %>%
  filter(Location_code == "WAOT") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(PR_Growout_Size %in% c("2x 28 foot row", "28 foot row")) %>%
  head(5)

10 lowest yielding varieties in MDP, DDP, and ADP

FWPred %>%
  filter(Location_code == "WAOT") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  filter(GBS_Panel %in% c("MDP","ADP") | In_DDP == "DDP") %>%
  head(10)

20 lowest yielding varieties

FWPred %>%
  filter(Location_code == "WAOT") %>%
  arrange(yhat) %>%
  dplyr::select(CDBN_ID, y, yhat, SD_yhat, Market_class_ahm, Race, Type.of.Bulk, GBS_Panel, In.DDP, Year, Seed.From.1, Seed.From.2) %>%
  rename("Yield_kg_ha" = y, "Predicted_Yield" = yhat, "SD_of_Pred_Yield" = SD_yhat, "Market_class" = Market_class_ahm, "PR_Growout_Size" = Type.of.Bulk, "In_DDP" = In.DDP, "First_CDBN_Year" = Year, "Seed_Source_1" = Seed.From.1, "Seed_Source_2" = Seed.From.2) %>% 
  distinct(CDBN_ID, .keep_all = TRUE) %>%
  head(20)

5. Summary & Tentative Varieties

Here is a summary of the varieties I’m tentatively thinking to grow to test predictions from the Finlay-Wilkinson model. I decided to focus mostly on Durango varieties as these seem to have the most variation in GxE in the CDBN dataset. In the list below, if a question mark follows the variety name, that means I might not have seed for that variety, but it is in another Diversity Panel - might someone else have seed to grow for it?

So, focusing mostly on Durango varieties, here are the varieties I think should be grown at:

MOCO

High yield: 55012, JM126, Ivory

Low yield: Max

Low GxE: 53026, 55063

High GxE: Buster, AC_Ole, UI465, Matterhorn

NDHA

High yield: Buster, BillZ, Yolano

Low yield: AC_Calmont, CDC_Expression

Low GxE: 53026, 55012, 55063

High GxE: Buster, AC_Ole, UI465, Matterhorn

MISA

High yield: UNS_117

Low yield: CDC_Expression, Cardinal?

Low GxE: 53026, 55012, 55063

High GxE: Buster, AC_Ole, UI465, Matterhorn

NESB

High yield: Buster, Montrose?

Low yield: Mogul, Cardinal?

Low GxE: 53026, 55012, 55063

High GxE: Buster, AC_Ole, UI465, Matterhorn

WAOT

High yield: Max, Jackpot?, Lariat?

Low yield: 88728?, Sapphire, Emerson?, Pindak?

Low GxE: 53026, 55012, 55063

High GxE: Buster, AC_Ole, UI465, Matterhorn

I’d be happy to grow this set of varieties, but please note that this is not my final word on varieties! I am hoping to finish sets of models of variety performance along climate gradients by the end of March. I think this approach will help address some of the non-linearity in the Finlay-Wilkinson model and improve the predictive ability of this model, which right now hovers at ~66%. I’m particularly hoping it will correct how some varieties outperform linear predictions in the NDHA - MISA region of the plots.