This document contains some of the yield predictions made using the CDBN dataset and a Finlay-Wilkinson model.

These predictions are based on a Finlay-Wilkinson (FW) model fitted using a Bayesian Gibbs sampler and a matrix of variety relatedness (A). The Gibbs sampler shrinks estimates for each variety towards the average performance of the model, and generally gives better predictive power than an ordinary least squares model. The A matrix was calculated in Tassel using the recommended methodology (centered IBS). The SNP matrix used in Tassel was generated from the GBS data collected by Phil McClean and myself using the ApeKI enzyme, aligned using bwa mem to the P. vulgaris genome V 2.0, and with SNP calls using NGSEP.

Hatton, ND Predicted Yield Table

This is a table of variety yield predictions for Hatton, ND. You can see that some varieties have actual yield data at Hatton, and some just have predicted yield values. Many of these varieties are also part of other sequenced bean panels, such as the MDP, DDP, and ADP.

Varieties that have only predicted yield values would likely be the most interesting varieties to test at Hatton to validate predictions from this model.

Specific Variety Performance Plots

Here are a few plots of specific varieties across 30 sites in the CDBN.

Check Variety Finlay-Wilkinson Plot

To explain this type of plot, here are Finlay-Wilkinson results for three check varieties: Fleetwood, Viva, and Montcalm. 30 locations from the CDBN are arranged along the x-axis in order of how well bean varieties yield, on average, at that location. So you can see that Othello, WA (WAOT) is the highest yielding location of any in the dataset, and Lubbock, TX (TXLU) is the worst. The location codes always have two letters indicating the state first, followed by the first two letters of the site name, so usually you can guess pretty accurately what site it is if you know the CDBN sites well. Hatton, ND is about 2/3rds of the way along the x-axis.

In this plot, the points indicate actual data - actual yield data from a year in the CDBN at that location. The lines indicate the predictions for variety performance. The dotted line is the predicted average variety performance across all sites. Vertical deviation from this dotted line indicates a genetic effect of that variety on performance. A change in the slope of a variety’s line relative to the dotted line indicates a difference in the type II stability of this variety, which is a measure of GxE.

Puerto Rico Growout Varieties Finlay-Wilkinson Plots

Assuming that the Puerto Rico seed are ok to grow at CDBN sites, here are two plots for varieties I am most likely to have a lot of seed for from that bulk. The first shows varieties with flat slopes (low GxE), and the second varieties with steep slopes (more GxE).

Low GxE is a stated preference for many crop breeders, but high GxE might be good here, in the sense that it would prioritize varieties that are high yielding at the best sites. The best sites here also happen to be representative of the majority of the bean-growing region in the United States.

First, look at candidate varieties where I am growing 2 28-foot rows.

These would be ideal to include if we’re just using my seed, but unfortunately these don’t have the most divergent slopes out of those in the CDBN dataset.

Then, look at candidates where I am growing 1 28 foot row. These look a little better.

Future Directions

I have a late March deadline for the completion of two additional analyses that may prioritize different varieties for testing.

First, on the 12th of February I completed a major new version of the CDBN data with cleaner phenotypes and a host of other changes to make the location, weather, variety, and phenotypic data cleaner and more consistent. I need to redo a number of analyses with this new version.

Second, instead of predicting which genomic regions affect yield along a yield gradient (a Finlay-Wilkinson analysis), I will predict which genomics regions affect phenotypes at different points along climate gradients using various environmental gradient models.

Here is one example of such a model, suggested by Jeff White. What proportion of variation in the number of days to flowering is explained by the cumulative growing degree days (using 8 degrees Celcius as the base temperature, T_base, below which plants do not grow) from planting to flowering?