Sm🄰rt PCA biplots with 𝖇𝖎𝖕𝖑𝖔🆃𝖆𝖇𝖑𝖊 basic Shiny app

Author

James Silva Garcia

Published

April 20, 2023

Welcome

I am glad you are reading this post. Today I want to offer you a practical intro to the world of Principal Component Analysis (PCA) biplots (a.k.a. GGE-biplots in the context of plant breeding). The goal is to obtain meaningful two-way (bi) visualizations (plots) of a numerical data table that originally contained three or more numerical columns. In typical applications, rows in the original data table correspond to samples or subjects, with columns used to capture the average response profile for each subject. Classical applications include analyzing the average yield performance of a collection of varieties tested across experimental sites (useful for interpreting genotype-by-environment interaction) or describing the average performance profile of several agronomic characteristics for a collection of varieties (i.e., a genotype-by-characteristic analysis).

I designed the 𝖇𝖎𝖕𝖑𝖔🆃𝖆𝖇𝖑𝖊 basic Shiny app to provide a friendly user interface to carry out PCA computations and deliver neat biplot visualizations along the way. All you need to do when putting together your input data table is to remember what each row and each column represents for your analysis, making sure the header of the first column is entered as the case-sensitive text RowName; as generated visualizations may include a bunch of overlapping label annotations, it is also recommended that you abbreviate in a meaningful way your sample or subject ID labels and column headers (using no more than 12 characters for each row ID label or column header).

The input data table used in this post corresponds to a fictitious genotype-by-environment table of average yield performance for 40 varieties across six sites as depicted below

All used input data tables used are shared as Appendices at the end of this post (you can copy/paste each table into a corresponding Excel file and save them accordingly). You are welcome to visit the 𝖇𝖎𝖕𝖑𝖔🆃𝖆𝖇𝖑𝖊 basic Shiny app at any time by clicking here. The following sections provide basic guidelines on how to use this app.

Input data upload

To feed the app with data, go to the [Upload data] menu item, and then use the [Browse…] button to navigate your files system and chose an Excel input data file (e.g., “biploTable-Example1A.xlsx”)

Perform PCA

Once your input data table has been uploaded, go to the [Perform PCA] menu item and click the [Run PCA] button to trigger computations. The [PCA summary] results table is displayed by default. For this data, the first two principal componentes (PC) account for about 72% of the total variability (reducing from 6 original dimensions down to just two, preserves ~72 percent of the total original variability), with nearly 59% accounted for by the first PC. The information ratio statistic shown in the last column of the PCA summary can be used to determine how many PC to use for dimensionality reduction: keep PC having InformatioRatio of 1 or beyond; for this sample data, however, for obvious reasons we will use the first two PC to generate biplots.

Annotation design options

You may use the [Annotation design] menu item to customize the look of the different biplot visualization options. As displayed in the following image, several customizations can be made, but general features are described next.

Row annotation

It is not uncommon that a first PC is typically associated with a weighted average performance. To capitalize on this fact, the app calculates an average performance value across the response columns for each entry and uses quantiles estimated from average performance to group subjects into five color-coded average performance categories (80-100%, 60-80%, 40-60%, 20-40%, and 0-20%). Likewise, a Unicode “PointShape” character is designated to each performance category (solid circled numbers, by default); however, a [Custom file] option is provided to enable users to customize the way colors and plotting shapes are assigned to build a proper row annotation design. Default row annotation options were kept to generate our first biplot view (shown later).

Column annotation

The app also generates a default annotation for columns as illustrated below.

Annotation cheatsheet

The annotation cheatsheet (reproduced below) has been provided to helping users designing proper annotations.

Here is a brief explanation on how to use it. By default, the variety GrC12 would be represented with a green (i.e., the p4 colorCode) solid circled number 5 (passed on to the graphical interface using the PointShape Unicode value -10126). In other words, the allocation of the variety GrC12 would be represented with the point shape ➎.

Column-focused biplots

The first set of biplot views is called [Column-focused biplots] and, as indicated by its name, is designed for preserving metrics across columns from the original input data (sites-metric preserving for our current analysis). The first view is called the Column-focused Which-Won-Where (CF-WWW) view biplot and it can be used to study relationships across sites.

Using the default column annotation, sites are represented by empty circled numbers (1 to 6) and connected with purple segments to the biplot origin. The narrower the angle between two segments, the higher the correlation between the corresponding sites; according to this, sites LocA, LocD, and LocF are strongly correlated. Site segments at an angle close to 90° are non-correlated (like LocB and LocC), while site segments at an angle approaching 180° would have high negative correlation.

Likewise, the default row annotation indicates that Varieties are represented by color-coded solid circled numbers to show average performance categories. The out-most varieties are connected with an irregular polygon that encloses all tested varieties; perpendicular dashed lines are drawn from the biplot origin to each of the sides of the polygon and are used to partition the biplot area into several sections. Sites falling within a same section (like sites LocA, LocD, and LocF) share similar characteristics and conform a mega-environment. Although this is not an appropriate view to perform ranking of varieties, this view also indicates that, for example, varieties GrC30 and GrC31 where the top performers in LocB (a fact revealed by markers at a far distance from the origin, with narrow angle with the LocB segment); additionally, GrC24 was a poor performer variety in LocB (far away marker point in the opposite direction of the LocB segment).

As the six sites fell into three different biplot sections, ranking of varieties should be performed (using row-focused biplots, for mega-environments with 3 or more sites) independently. To generate row-focused biplots for the mega-environment containing sites LocA, LocD, and LocF, you are encouraged to go back to the [Upload data] menu item, click the [Table modifications] tab and use the [Select columns to exclude] user input to eliminate sites LocB, LocC, and LocE from the analysis, and then take look on the [Row-focused biplots] menu item (results not discussed here). However, an interesting data-driven approach to enhance our analysis will be proposed later.

The second view is called the Column-focused Column Evaluation View (CF-CEV) biplot and it can be used to rank sites.

The default annotation design is the same than the one used for the CF-WWW view. One additional solid diamond marker is drawn and represents the average column coordinates (ACC, or AEC = average environment coordinates for our current case); a solid black segment across the biplot area that connects the ACC and the origin is drawn. This is called the average column axis (ACA, or AEA = average environment axis for our current case). Another black segment, perpendicular to the ACA is also drawn. Projections from site segments onto the ACC are useful for ranking sites by their variety discriminating ability (outcome summarized using the table below).

Data-driven augmentation approach

PCA biplots are used to try to identify samples with extreme performance. For example, in our multi-environmental variety trial data we would be interested in detecting varieties that outperform their competitors. As our intention is to maximize yield performance, we are interested in detecting varieties that perform well in all locations. One clever way to guesstimate what the profile of such variety would be is to augment our input data with a dummy variety (referred to as the GOOD dummy variety) and defining its yield performance profile using the estimated maximum yield within each site. Additionally, we can also augment our yield data with one more POOR dummy variety using minimum yield within each site. The proposed data-driven row-augmentation process is illustrated below.

Once dummy entries are obtained, we can proceed to perform data-driven column augmentation. One possible way to do this is by adding a DIST dummy site column to accommodate Euclidian distances from the POOR dummy variety (we are looking for entries performing far away from the POOR dummy variety, and in the direction of the GOOD dummy variety).

Note that the GOOD and POOR ID labels are forbidden labels and cannot be used as RowName for other samples. It is also good to emphasize that in other analytical situations, defining the profile for extreme performers may involve the use of a combination of MIN(), MAX(), AVERAGE(), or other statistics. For example, we may be interested in maximizing some columns (like yield and yield components or quality indicators), minimizing others (like disease incidence or any undesired characteristic), while keeping a few at their average level (like average plant height). In other words, your creativity plays a huge role when performing meaningful data-driven augmentation.

Once data-driven augmentation has been completed, we should save the results as a new Excel file. Just remember to close your new data file, before attempting to upload it into our app.

Column-focused biplots for data-driven augmented tables

After uploading the augmented data and performing PCA, we are ready to go back to interpreting some of the biplots that can be generated. Let’s start with the CF-WWW biplot view reproduced below.

We can see that the amount of variability accounted for by the first two PC jumped up to ~80%, with PC1 accounting for ~71%. Yet another important change is that now all sites belong to a same mega-environment. Note also the inclusion of star symbols to represent the POOR and GOOD dummy varieties, located on opposite directions as expected.

Next, let me illustrate how to use the [Annotation design] menu item to change default colors to rather highlight the variety type (GrA, GrB, GrC, or GrD) and the dummy DIST site segment.

Upon loading custom annotation files, the CF-WWW biplot is updated.

Row-focused biplots for data-driven augmented tables

The last thing I want to share are the details about how one of the Row-focused biplots is constructed. The most typical visualization to perform ranking of row entries is called Row-focused Row Evaluation View (or RF-REV) biplot.

The construction of this biplot view is very similar to the one used for building up the CF-CEV biplot, but this time, solid projection lines are drawn from each entry marker point onto the ACA. Depending on the number of row entries being visualized, it might become challenging to use this biplot for visually ranking your entries; in those situations it is better to go back to the [Perform PCA] menu item to take a look on the [Ranking of rows] results table a illustrated below.

Further learning

To learn more about PCA biplot analysis, I recommend you to take a look on the presentation (file: “myIntroduction to Augmented PCA Biplots.pdf”) shared in the [Welcome] menu item. It is also a very good idea to read the publications referenced in my introductory presentation.

I encourage you to share and take advantage of these learnings.

Enjoy the 𝖇𝖎𝖕𝖑𝖔🆃𝖆𝖇𝖑𝖊 basic Shiny app!

Appendices

Example 1A Files

Raw data

RowName	LocA	LocB	LocC	LocD	LocE	LocF
GrA01	10.80	9.95	8.10	8.70	10.20	6.15
GrA02	13.30	9.05	11.05	8.00	8.90	7.95
GrA03	12.50	8.60	8.45	6.80	8.90	7.45
GrA04	12.90	8.75	10.75	7.55	9.05	6.75
GrA05	12.50	8.95	8.35	7.10	8.95	7.05
GrB06	9.70	8.55	8.65	8.55	8.30	6.70
GrC07	12.75	9.15	10.75	7.35	9.30	6.75
GrC08	10.60	9.45	9.30	8.85	10.40	7.00
GrC09	11.25	8.90	11.40	9.95	8.30	6.95
GrC10	10.95	7.70	9.75	7.00	8.30	7.40
GrC11	14.10	11.70	10.75	10.10	7.55	5.80
GrC12	13.75	10.70	11.10	9.30	9.30	7.75
GrC13	10.85	9.45	12.05	9.75	8.65	6.95
GrC14	13.05	8.50	11.00	8.60	9.15	7.65
GrC15	12.05	8.35	9.95	9.00	8.40	7.50
GrC16	9.10	9.55	8.30	6.90	9.10	6.90
GrC17	10.20	9.60	11.05	9.75	6.65	5.60
GrC18	10.95	8.05	9.85	6.05	7.10	5.10
GrC19	12.15	9.30	8.35	6.70	6.80	5.20
GrC20	9.75	9.15	7.85	5.55	5.90	4.70
GrC21	12.40	9.65	9.90	6.70	9.30	5.05
GrC22	9.65	8.45	7.65	6.10	8.10	6.25
GrC23	8.75	7.25	6.75	6.25	5.70	4.20
GrC24	9.70	6.20	7.35	5.20	4.70	3.75
GrC25	7.80	9.20	7.55	6.35	6.45	4.35
GrC26	9.55	8.35	7.35	5.95	7.75	5.50
GrC27	13.30	8.35	9.10	8.85	7.40	5.25
GrC28	12.85	9.75	10.40	9.60	9.95	6.65
GrC29	9.90	10.35	10.10	6.85	8.35	5.85
GrC30	12.90	11.25	9.95	9.95	10.00	6.85
GrC31	14.45	11.20	10.65	8.55	10.35	6.50
GrC32	10.95	9.25	9.90	6.55	9.15	6.25
GrC33	11.35	9.35	9.85	7.35	7.70	5.90
GrD34	10.45	7.45	11.20	8.95	8.10	6.20
GrD35	12.25	9.25	10.45	9.70	8.85	6.85
GrD36	11.25	8.65	10.30	9.10	8.40	6.65
GrD37	10.20	8.30	8.95	6.75	6.80	5.35
GrD38	8.85	8.25	9.40	5.15	7.65	5.30
GrD39	8.25	8.95	6.25	7.95	7.70	4.95
GrD40	10.35	9.25	8.70	5.45	9.75	6.30

Row annotation

RowName	colorCode	PointShape
GrA01	d6	-10125
GrA02	d6	-10126
GrA03	d6	-10124
GrA04	d6	-10125
GrA05	d6	-10124
GrB06	d7	-10123
GrC07	s8	-10125
GrC08	s8	-10125
GrC09	s8	-10125
GrC10	s8	-10124
GrC11	s8	-10126
GrC12	s8	-10126
GrC13	s8	-10126
GrC14	s8	-10126
GrC15	s8	-10125
GrC16	s8	-10123
GrC17	s8	-10124
GrC18	s8	-10123
GrC19	s8	-10123
GrC20	s8	-10122
GrC21	s8	-10124
GrC22	s8	-10123
GrC23	s8	-10122
GrC24	s8	-10122
GrC25	s8	-10122
GrC26	s8	-10122
GrC27	s8	-10123
GrC28	s8	-10126
GrC29	s8	-10124
GrC30	s8	-10126
GrC31	s8	-10126
GrC32	s8	-10124
GrC33	s8	-10123
GrD34	p2	-10124
GrD35	p2	-10125
GrD36	p2	-10125
GrD37	p2	-10122
GrD38	p2	-10122
GrD39	p2	-10122
GrD40	p2	-10123

Column Annotation

ColName	colorCode	PointShape
LocF	d3	-10112
LocE	d4	-10113
LocA	d3	-10114
LocC	p12	-10115
LocD	d3	-10116
LocB	d4	-10117

Example 1B Files

Raw data

RowName	LocA	LocB	LocC	LocD	LocE	LocF	DIST
GrA01	10.80	9.95	8.10	8.70	10.20	6.15	8.67
GrA02	13.30	9.05	11.05	8.00	8.90	7.95	10.24
GrA03	12.50	8.60	8.45	6.80	8.90	7.45	8.17
GrA04	12.90	8.75	10.75	7.55	9.05	6.75	9.30
GrA05	12.50	8.95	8.35	7.10	8.95	7.05	8.17
GrB06	9.70	8.55	8.65	8.55	8.30	6.70	6.94
GrC07	12.75	9.15	10.75	7.35	9.30	6.75	9.41
GrC08	10.60	9.45	9.30	8.85	10.40	7.00	9.19
GrC09	11.25	8.90	11.40	9.95	8.30	6.95	9.59
GrC10	10.95	7.70	9.75	7.00	8.30	7.40	7.36
GrC11	14.10	11.70	10.75	10.10	7.55	5.80	11.27
GrC12	13.75	10.70	11.10	9.30	9.30	7.75	11.56
GrC13	10.85	9.45	12.05	9.75	8.65	6.95	10.03
GrC14	13.05	8.50	11.00	8.60	9.15	7.65	10.12
GrC15	12.05	8.35	9.95	9.00	8.40	7.50	8.89
GrC16	9.10	9.55	8.30	6.90	9.10	6.90	7.03
GrC17	10.20	9.60	11.05	9.75	6.65	5.60	8.29
GrC18	10.95	8.05	9.85	6.05	7.10	5.10	5.89
GrC19	12.15	9.30	8.35	6.70	6.80	5.20	6.47
GrC20	9.75	9.15	7.85	5.55	5.90	4.70	4.19
GrC21	12.40	9.65	9.90	6.70	9.30	5.05	8.46
GrC22	9.65	8.45	7.65	6.10	8.10	6.25	5.40
GrC23	8.75	7.25	6.75	6.25	5.70	4.20	2.16
GrC24	9.70	6.20	7.35	5.20	4.70	3.75	2.20
GrC25	7.80	9.20	7.55	6.35	6.45	4.35	3.94
GrC26	9.55	8.35	7.35	5.95	7.75	5.50	4.68
GrC27	13.30	8.35	9.10	8.85	7.40	5.25	8.14
GrC28	12.85	9.75	10.40	9.60	9.95	6.65	10.54
GrC29	9.90	10.35	10.10	6.85	8.35	5.85	7.55
GrC30	12.90	11.25	9.95	9.95	10.00	6.85	11.22
GrC31	14.45	11.20	10.65	8.55	10.35	6.50	11.82
GrC32	10.95	9.25	9.90	6.55	9.15	6.25	7.78
GrC33	11.35	9.35	9.85	7.35	7.70	5.90	7.34
GrD34	10.45	7.45	11.20	8.95	8.10	6.20	8.07
GrD35	12.25	9.25	10.45	9.70	8.85	6.85	9.71
GrD36	11.25	8.65	10.30	9.10	8.40	6.65	8.49
GrD37	10.20	8.30	8.95	6.75	6.80	5.35	5.20
GrD38	8.85	8.25	9.40	5.15	7.65	5.30	5.13
GrD39	8.25	8.95	6.25	7.95	7.70	4.95	5.10
GrD40	10.35	9.25	8.70	5.45	9.75	6.30	7.34
GOOD	14.45	11.70	12.05	10.10	10.40	7.95	13.52
POOR	7.80	6.20	6.25	5.15	4.70	3.75	0.00

Row annotation

RowName	colorCode	PointShape
GrA01	d6	-10125
GrA02	d6	-10126
GrA03	d6	-10124
GrA04	d6	-10125
GrA05	d6	-10124
GrB06	d7	-10123
GrC07	s8	-10125
GrC08	s8	-10125
GrC09	s8	-10125
GrC10	s8	-10124
GrC11	s8	-10126
GrC12	s8	-10126
GrC13	s8	-10126
GrC14	s8	-10126
GrC15	s8	-10125
GrC16	s8	-10123
GrC17	s8	-10124
GrC18	s8	-10123
GrC19	s8	-10123
GrC20	s8	-10122
GrC21	s8	-10124
GrC22	s8	-10123
GrC23	s8	-10122
GrC24	s8	-10122
GrC25	s8	-10122
GrC26	s8	-10122
GrC27	s8	-10123
GrC28	s8	-10126
GrC29	s8	-10124
GrC30	s8	-10126
GrC31	s8	-10126
GrC32	s8	-10124
GrC33	s8	-10123
GrD34	p2	-10124
GrD35	p2	-10125
GrD36	p2	-10125
GrD37	p2	-10122
GrD38	p2	-10122
GrD39	p2	-10122
GrD40	p2	-10123

Column annotation

ColName	colorCode	PointShape
LocA	d3	-10114
LocB	d3	-10117
LocC	d3	-10115
LocD	d3	-10116
LocE	d3	-10113
LocF	d3	-10112
DIST	p10	-9679

--- title: "`r paste0('Sm', intToUtf8(127280),'rt PCA biplots with ',intToUtf8(c(120199,120206,120213,120209,120212,127363,120198,120199,120209,120202)),' basic Shiny app')`" author: James Silva Garcia date: "`r Sys.Date()`" format: html: code-tools: true code-fold: show code-link: true code-summary: "Hide code" code-copy: true code-line-numbers: true link-external-newwindow: true link-citations: true execute: eval: true include: true warning: false toc: true toc-depth: 2 toc-location: body toc-title: Contents --- ```{r} #| include: false library(tidyverse) ``` ## Welcome I am glad you are reading this post. Today I want to offer you a practical intro to the world of Principal Component Analysis (PCA) biplots (a.k.a. GGE-biplots in the context of plant breeding). The goal is to obtain meaningful two-way (bi) visualizations (plots) of a numerical data table that originally contained three or more numerical columns. In typical applications, [**rows**]{style="color:#33A02C"} in the original data table correspond to [**samples**]{style="color:#33A02C"} or [**subjects**]{style="color:#33A02C"}, with [**columns**]{style="color:#7570B3"} used to capture the [**average response**]{style="color:#7570B3"} profile for each [**subject**]{style="color:#33A02C"}. Classical applications include analyzing the average yield performance of a collection of [**varieties**]{style="color:#33A02C"} tested across [**experimental sites**]{style="color:#7570B3"} (useful for interpreting [**genotype**]{style="color:#33A02C"}-by-[**environment**]{style="color:#7570B3"} interaction) or describing the average performance profile of several [**agronomic characteristics**]{style="color:#7570B3"} for a collection of [**varieties**]{style="color:#33A02C"} (i.e., a [**genotype**]{style="color:#33A02C"}-by-[**characteristic**]{style="color:#7570B3"} analysis). I designed the [**`r intToUtf8(c(120199,120206,120213,120209,120212,127363,120198,120199,120209,120202))`**]{style="color:#B15928"} basic Shiny app to provide a friendly user interface to carry out PCA computations and deliver neat biplot visualizations along the way. All you need to do when putting together your input data table is to remember what each [**row**]{style="color:#33A02C"} and each [**column**]{style="color:#7570B3"} represents for your analysis, making sure the header of the first [**column**]{style="color:#7570B3"} is entered as the case-sensitive text [**RowName**]{style="color:#7570B3"}; as generated visualizations may include a bunch of overlapping label annotations, it is also recommended that you abbreviate in a meaningful way your [**sample**]{style="color:#33A02C"} or [**subject**]{style="color:#33A02C"} ID labels and [**column**]{style="color:#7570B3"} headers (using no more than 12 characters for each [**row ID label**]{style="color:#33A02C"} or [**column header**]{style="color:#7570B3"}). The input data table used in this post corresponds to a fictitious [**genotype**]{style="color:#33A02C"}-by-[**environment**]{style="color:#7570B3"} table of average yield performance for [**40 varieties**]{style="color:#33A02C"} across [**six sites**]{style="color:#7570B3"} as depicted below ![](images/biploTable-basic-01-InputTable.png) All used input data tables used are shared as Appendices at the end of this post (you can copy/paste each table into a corresponding Excel file and save them accordingly). You are welcome to visit the [**`r intToUtf8(c(120199,120206,120213,120209,120212,127363,120198,120199,120209,120202))`**]{style="color:#B15928"} basic Shiny app at any time by clicking [here](https://jamessilva0070.shinyapps.io/biploTable-basic/?_ga=2.157332282.1893158060.1681831120-315715406.1681410224). The following sections provide basic guidelines on how to use this app. ## Input data upload To feed the app with data, go to the \[**Upload data**\] menu item, and then use the \[**Browse...**\] button to navigate your files system and chose an Excel input data file (e.g., "biploTable-Example1A.xlsx") ![](images/biploTable-basic-02-InputDataUpload.png) ## Perform PCA Once your input data table has been uploaded, go to the \[**Perform PCA**\] menu item and click the \[**Run PCA**\] button to trigger computations. The \[**PCA summary**\] results table is displayed by default. For this data, the first two principal componentes (PC) account for about 72% of the total variability (reducing from 6 original dimensions down to just two, preserves \~72 percent of the total original variability), with nearly 59% accounted for by the first PC. The information ratio statistic shown in the last column of the PCA summary can be used to determine how many PC to use for dimensionality reduction: keep PC having InformatioRatio of 1 or beyond; for this sample data, however, for obvious reasons we will use the first two PC to generate biplots. ![](images/biploTable-basic-03-OriginalPCAsummary.png) ## Annotation design options You may use the \[**Annotation design**\] menu item to customize the look of the different biplot visualization options. As displayed in the following image, several customizations can be made, but general features are described next. ![](images/biploTable-basic-04-AnnotationDesign.png) ### Row annotation It is not uncommon that a first PC is typically associated with a weighted average performance. To capitalize on this fact, the app calculates an average performance value across the response columns for each entry and uses quantiles estimated from average performance to group subjects into five color-coded average performance categories ([**80-100%**]{style="color:#33A02C"}, [**60-80%**]{style="color:#1F78B4"}, [**40-60%**]{style="color:#666666"}, [**20-40%**]{style="color:#FF7F00"}, and [**0-20%**]{style="color:#E31A1C"}). Likewise, a Unicode "PointShape" character is designated to each performance category (solid circled numbers, by default); however, a \[**Custom file**\] option is provided to enable users to customize the way colors and plotting shapes are assigned to build a proper row annotation design. Default row annotation options were kept to generate our first biplot view (shown later). ### Column annotation The app also generates a default annotation for columns as illustrated below. ![](images/biploTable-basic-05-ColumnAnnotation.png) ## Annotation cheatsheet The annotation cheatsheet (reproduced below) has been provided to helping users designing proper annotations. ![](images/biploTable-basic-06-AnnotationCheatsheet.png) Here is a brief explanation on how to use it. By default, the variety [**GrC12**]{style="color:#33A02C"} would be represented with a [**green**]{style="color:#33A02C"} (i.e., the [**p4**]{style="color:#33A02C"} colorCode) solid circled number 5 (passed on to the graphical interface using the **PointShape** Unicode value **-10126**). In other words, the allocation of the variety [**GrC12**]{style="color:#33A02C"} would be represented with the point shape [**`r intToUtf8(10126)`**]{style="color:#33A02C"}. ## Column-focused biplots The first set of biplot views is called \[**Column-focused biplots**\] and, as indicated by its name, is designed for preserving metrics across columns from the original input data ([**sites**]{style="color:#7570B3"}-metric preserving for our current analysis). The first view is called the Column-focused Which-Won-Where (CF-WWW) view biplot and it can be used to study relationships across [**sites**]{style="color:#7570B3"}. ![](images/biploTable-basic-07-WWW.png) Using the default column annotation, [**sites**]{style="color:#7570B3"} are represented by empty circled numbers (1 to 6) and connected with [**purple segments**]{style="color:#7570B3"} to the biplot origin. The narrower the angle between two segments, the higher the correlation between the corresponding [**sites**]{style="color:#7570B3"}; according to this, [**sites**]{style="color:#7570B3"} [**LocA**]{style="color:#7570B3"}, [**LocD**]{style="color:#7570B3"}, and [**LocF**]{style="color:#7570B3"} are strongly correlated. [**Site**]{style="color:#7570B3"} segments at an angle close to 90° are non-correlated (like [**LocB**]{style="color:#7570B3"} and [**LocC**]{style="color:#7570B3"}), while [**site**]{style="color:#7570B3"} segments at an angle approaching 180° would have high negative correlation. Likewise, the default row annotation indicates that **Varieties** are represented by **color-coded solid circled numbers** to show average performance categories. The out-most varieties are connected with an irregular polygon that encloses all tested varieties; perpendicular dashed lines are drawn from the biplot origin to each of the sides of the polygon and are used to partition the biplot area into several sections. [**Sites**]{style="color:#7570B3"} falling within a same section (like [**sites**]{style="color:#7570B3"} [**LocA**]{style="color:#7570B3"}, [**LocD**]{style="color:#7570B3"}, and [**LocF**]{style="color:#7570B3"}) share similar characteristics and conform a mega-environment. Although this is not an appropriate view to perform ranking of varieties, this view also indicates that, for example, varieties [**GrC30**]{style="color:#33A02C"} and [**GrC31**]{style="color:#33A02C"} where the top performers in [**LocB**]{style="color:#7570B3"} (a fact revealed by markers at a far distance from the origin, with narrow angle with the [**LocB**]{style="color:#7570B3"} segment); additionally, [**GrC24**]{style="color:#E31A1C"} was a poor performer variety in [**LocB**]{style="color:#7570B3"} (far away marker point in the opposite direction of the [**LocB**]{style="color:#7570B3"} segment). As the six [**sites**]{style="color:#7570B3"} fell into three different biplot sections, ranking of varieties should be performed (using row-focused biplots, for mega-environments with 3 or more [**sites**]{style="color:#7570B3"}) independently. To generate row-focused biplots for the mega-environment containing [**sites**]{style="color:#7570B3"} [**LocA**]{style="color:#7570B3"}, [**LocD**]{style="color:#7570B3"}, and [**LocF**]{style="color:#7570B3"}, you are encouraged to go back to the \[**Upload data**\] menu item, click the \[**Table modifications**\] tab and use the \[**Select columns to exclude**\] user input to eliminate [**sites**]{style="color:#7570B3"} [**LocB**]{style="color:#7570B3"}, [**LocC**]{style="color:#7570B3"}, and [**LocE**]{style="color:#7570B3"} from the analysis, and then take look on the \[**Row-focused biplots**\] menu item (results not discussed here). However, an interesting data-driven approach to enhance our analysis will be proposed later. The second view is called the Column-focused Column Evaluation View (CF-CEV) biplot and it can be used to rank [**sites**]{style="color:#7570B3"}. ![](images/biploTable-basic-08-CEV.png) The default annotation design is the same than the one used for the CF-WWW view. One additional solid diamond marker is drawn and represents the [**average column coordinates**]{style="color:#7570B3"} (ACC, or [**AEC = average environment coordinates**]{style="color:#7570B3"} for our current case); a solid black segment across the biplot area that connects the ACC and the origin is drawn. This is called the [**average column axis**]{style="color:#7570B3"} (ACA, or [**AEA = average environment axis**]{style="color:#7570B3"} for our current case). Another black segment, perpendicular to the ACA is also drawn. Projections from [**site segments**]{style="color:#7570B3"} onto the ACC are useful for ranking [**sites**]{style="color:#7570B3"} by their variety discriminating ability (outcome summarized using the table below). ![](images/biploTable-basic-09-ColRanking.png) ## Data-driven augmentation approach PCA biplots are used to try to identify samples with extreme performance. For example, in our multi-environmental variety trial data we would be interested in detecting varieties that outperform their competitors. As our intention is to maximize yield performance, we are interested in detecting varieties that perform well in all locations. One clever way to guesstimate what the profile of such variety would be is to augment our input data with a dummy variety (referred to as the [**GOOD**]{style="color:#33A02C"} dummy variety) and defining its yield performance profile using the estimated [**maximum yield**]{style="color:#33A02C"} within each site. Additionally, we can also augment our yield data with one more [**POOR**]{style="color:#E31A1C"} dummy variety using [**minimum yield**]{style="color:#E31A1C"} within each site. The proposed data-driven row-augmentation process is illustrated below. ![](images/biploTable-basic-10-RowAugmentation.png) Once dummy entries are obtained, we can proceed to perform data-driven column augmentation. One possible way to do this is by adding a [**DIST**]{style="color:#6A3D9A"} dummy [**site column**]{style="color:#6A3D9A"} to accommodate Euclidian distances from the [**POOR**]{style="color:#E31A1C"} dummy variety (we are looking for entries performing far away from the [**POOR**]{style="color:#E31A1C"} dummy variety, and in the direction of the [**GOOD**]{style="color:#33A02C"} dummy variety). ![](images/biploTable-basic-11-ColAugmentation.png) Note that the [**GOOD**]{style="color:#33A02C"} and [**POOR**]{style="color:#E31A1C"} ID labels are forbidden labels and cannot be used as RowName for other samples. It is also good to emphasize that in other analytical situations, defining the profile for extreme performers may involve the use of a combination of MIN(), MAX(), AVERAGE(), or other statistics. For example, we may be interested in maximizing some columns (like yield and yield components or quality indicators), minimizing others (like disease incidence or any undesired characteristic), while keeping a few at their average level (like average plant height). In other words, your creativity plays a huge role when performing meaningful data-driven augmentation. Once data-driven augmentation has been completed, we should save the results as a new Excel file. Just remember to close your new data file, before attempting to upload it into our app. ## Column-focused biplots for data-driven augmented tables After uploading the augmented data and performing PCA, we are ready to go back to interpreting some of the biplots that can be generated. Let's start with the CF-WWW biplot view reproduced below. ![](images/biploTable-basic-12-CFWWWaug.png) We can see that the amount of variability accounted for by the first two PC jumped up to \~80%, with PC1 accounting for \~71%. Yet another important change is that now all [**sites**]{style="color:#7570B3"} belong to a same mega-environment. Note also the inclusion of star symbols to represent the [**POOR**]{style="color:#E31A1C"} and [**GOOD**]{style="color:#33A02C"} dummy varieties, located on opposite directions as expected. Next, let me illustrate how to use the \[**Annotation design**\] menu item to change default colors to rather highlight the variety type ([**GrA**]{style="color:#E6AB02"}, [**GrB**]{style="color:#A6761D"}, [**GrC**]{style="color:#B3B3B3"}, or [**GrD**]{style="color:#1F78B4"}) and the dummy [**DIST site**]{style="color:#6A3D9A"} segment. ![](images/biploTable-basic-13-CustomAnnotations.png) Upon loading custom annotation files, the CF-WWW biplot is updated. ![](images/biploTable-basic-14-CFWWWaugCustom.png) ## Row-focused biplots for data-driven augmented tables The last thing I want to share are the details about how one of the Row-focused biplots is constructed. The most typical visualization to perform ranking of row entries is called **Row-focused Row Evaluation View** (or RF-REV) biplot. ![](images/biploTable-basic-15-RFREVaugCustom.png) The construction of this biplot view is very similar to the one used for building up the CF-CEV biplot, but this time, solid projection lines are drawn from each entry marker point onto the ACA. Depending on the number of row entries being visualized, it might become challenging to use this biplot for visually ranking your entries; in those situations it is better to go back to the \[**Perform PCA**\] menu item to take a look on the \[**Ranking of rows**\] results table a illustrated below. ![](images/biploTable-basic-16-RFREVaugRanking.png) ## Further learning To learn more about PCA biplot analysis, I recommend you to take a look on the presentation (file: "myIntroduction to Augmented PCA Biplots.pdf") shared in the \[**Welcome**\] menu item. It is also a very good idea to read the publications referenced in my introductory presentation. I encourage you to share and take advantage of these learnings. Enjoy the [**`r intToUtf8(c(120199,120206,120213,120209,120212,127363,120198,120199,120209,120202))`**]{style="color:#B15928"} basic Shiny app! ## Appendices ### Example 1A Files Raw data | RowName | LocA | LocB | LocC | LocD | LocE | LocF | |---------|------:|------:|------:|------:|------:|-----:| | GrA01 | 10.80 | 9.95 | 8.10 | 8.70 | 10.20 | 6.15 | | GrA02 | 13.30 | 9.05 | 11.05 | 8.00 | 8.90 | 7.95 | | GrA03 | 12.50 | 8.60 | 8.45 | 6.80 | 8.90 | 7.45 | | GrA04 | 12.90 | 8.75 | 10.75 | 7.55 | 9.05 | 6.75 | | GrA05 | 12.50 | 8.95 | 8.35 | 7.10 | 8.95 | 7.05 | | GrB06 | 9.70 | 8.55 | 8.65 | 8.55 | 8.30 | 6.70 | | GrC07 | 12.75 | 9.15 | 10.75 | 7.35 | 9.30 | 6.75 | | GrC08 | 10.60 | 9.45 | 9.30 | 8.85 | 10.40 | 7.00 | | GrC09 | 11.25 | 8.90 | 11.40 | 9.95 | 8.30 | 6.95 | | GrC10 | 10.95 | 7.70 | 9.75 | 7.00 | 8.30 | 7.40 | | GrC11 | 14.10 | 11.70 | 10.75 | 10.10 | 7.55 | 5.80 | | GrC12 | 13.75 | 10.70 | 11.10 | 9.30 | 9.30 | 7.75 | | GrC13 | 10.85 | 9.45 | 12.05 | 9.75 | 8.65 | 6.95 | | GrC14 | 13.05 | 8.50 | 11.00 | 8.60 | 9.15 | 7.65 | | GrC15 | 12.05 | 8.35 | 9.95 | 9.00 | 8.40 | 7.50 | | GrC16 | 9.10 | 9.55 | 8.30 | 6.90 | 9.10 | 6.90 | | GrC17 | 10.20 | 9.60 | 11.05 | 9.75 | 6.65 | 5.60 | | GrC18 | 10.95 | 8.05 | 9.85 | 6.05 | 7.10 | 5.10 | | GrC19 | 12.15 | 9.30 | 8.35 | 6.70 | 6.80 | 5.20 | | GrC20 | 9.75 | 9.15 | 7.85 | 5.55 | 5.90 | 4.70 | | GrC21 | 12.40 | 9.65 | 9.90 | 6.70 | 9.30 | 5.05 | | GrC22 | 9.65 | 8.45 | 7.65 | 6.10 | 8.10 | 6.25 | | GrC23 | 8.75 | 7.25 | 6.75 | 6.25 | 5.70 | 4.20 | | GrC24 | 9.70 | 6.20 | 7.35 | 5.20 | 4.70 | 3.75 | | GrC25 | 7.80 | 9.20 | 7.55 | 6.35 | 6.45 | 4.35 | | GrC26 | 9.55 | 8.35 | 7.35 | 5.95 | 7.75 | 5.50 | | GrC27 | 13.30 | 8.35 | 9.10 | 8.85 | 7.40 | 5.25 | | GrC28 | 12.85 | 9.75 | 10.40 | 9.60 | 9.95 | 6.65 | | GrC29 | 9.90 | 10.35 | 10.10 | 6.85 | 8.35 | 5.85 | | GrC30 | 12.90 | 11.25 | 9.95 | 9.95 | 10.00 | 6.85 | | GrC31 | 14.45 | 11.20 | 10.65 | 8.55 | 10.35 | 6.50 | | GrC32 | 10.95 | 9.25 | 9.90 | 6.55 | 9.15 | 6.25 | | GrC33 | 11.35 | 9.35 | 9.85 | 7.35 | 7.70 | 5.90 | | GrD34 | 10.45 | 7.45 | 11.20 | 8.95 | 8.10 | 6.20 | | GrD35 | 12.25 | 9.25 | 10.45 | 9.70 | 8.85 | 6.85 | | GrD36 | 11.25 | 8.65 | 10.30 | 9.10 | 8.40 | 6.65 | | GrD37 | 10.20 | 8.30 | 8.95 | 6.75 | 6.80 | 5.35 | | GrD38 | 8.85 | 8.25 | 9.40 | 5.15 | 7.65 | 5.30 | | GrD39 | 8.25 | 8.95 | 6.25 | 7.95 | 7.70 | 4.95 | | GrD40 | 10.35 | 9.25 | 8.70 | 5.45 | 9.75 | 6.30 | Row annotation | RowName | colorCode | PointShape | |---------|-----------|-----------:| | GrA01 | d6 | -10125 | | GrA02 | d6 | -10126 | | GrA03 | d6 | -10124 | | GrA04 | d6 | -10125 | | GrA05 | d6 | -10124 | | GrB06 | d7 | -10123 | | GrC07 | s8 | -10125 | | GrC08 | s8 | -10125 | | GrC09 | s8 | -10125 | | GrC10 | s8 | -10124 | | GrC11 | s8 | -10126 | | GrC12 | s8 | -10126 | | GrC13 | s8 | -10126 | | GrC14 | s8 | -10126 | | GrC15 | s8 | -10125 | | GrC16 | s8 | -10123 | | GrC17 | s8 | -10124 | | GrC18 | s8 | -10123 | | GrC19 | s8 | -10123 | | GrC20 | s8 | -10122 | | GrC21 | s8 | -10124 | | GrC22 | s8 | -10123 | | GrC23 | s8 | -10122 | | GrC24 | s8 | -10122 | | GrC25 | s8 | -10122 | | GrC26 | s8 | -10122 | | GrC27 | s8 | -10123 | | GrC28 | s8 | -10126 | | GrC29 | s8 | -10124 | | GrC30 | s8 | -10126 | | GrC31 | s8 | -10126 | | GrC32 | s8 | -10124 | | GrC33 | s8 | -10123 | | GrD34 | p2 | -10124 | | GrD35 | p2 | -10125 | | GrD36 | p2 | -10125 | | GrD37 | p2 | -10122 | | GrD38 | p2 | -10122 | | GrD39 | p2 | -10122 | | GrD40 | p2 | -10123 | Column Annotation | ColName | colorCode | PointShape | |---------|-----------|-----------:| | LocF | d3 | -10112 | | LocE | d4 | -10113 | | LocA | d3 | -10114 | | LocC | p12 | -10115 | | LocD | d3 | -10116 | | LocB | d4 | -10117 | ### Example 1B Files Raw data | RowName | LocA | LocB | LocC | LocD | LocE | LocF | DIST | |---------|------:|------:|------:|------:|------:|-----:|------:| | GrA01 | 10.80 | 9.95 | 8.10 | 8.70 | 10.20 | 6.15 | 8.67 | | GrA02 | 13.30 | 9.05 | 11.05 | 8.00 | 8.90 | 7.95 | 10.24 | | GrA03 | 12.50 | 8.60 | 8.45 | 6.80 | 8.90 | 7.45 | 8.17 | | GrA04 | 12.90 | 8.75 | 10.75 | 7.55 | 9.05 | 6.75 | 9.30 | | GrA05 | 12.50 | 8.95 | 8.35 | 7.10 | 8.95 | 7.05 | 8.17 | | GrB06 | 9.70 | 8.55 | 8.65 | 8.55 | 8.30 | 6.70 | 6.94 | | GrC07 | 12.75 | 9.15 | 10.75 | 7.35 | 9.30 | 6.75 | 9.41 | | GrC08 | 10.60 | 9.45 | 9.30 | 8.85 | 10.40 | 7.00 | 9.19 | | GrC09 | 11.25 | 8.90 | 11.40 | 9.95 | 8.30 | 6.95 | 9.59 | | GrC10 | 10.95 | 7.70 | 9.75 | 7.00 | 8.30 | 7.40 | 7.36 | | GrC11 | 14.10 | 11.70 | 10.75 | 10.10 | 7.55 | 5.80 | 11.27 | | GrC12 | 13.75 | 10.70 | 11.10 | 9.30 | 9.30 | 7.75 | 11.56 | | GrC13 | 10.85 | 9.45 | 12.05 | 9.75 | 8.65 | 6.95 | 10.03 | | GrC14 | 13.05 | 8.50 | 11.00 | 8.60 | 9.15 | 7.65 | 10.12 | | GrC15 | 12.05 | 8.35 | 9.95 | 9.00 | 8.40 | 7.50 | 8.89 | | GrC16 | 9.10 | 9.55 | 8.30 | 6.90 | 9.10 | 6.90 | 7.03 | | GrC17 | 10.20 | 9.60 | 11.05 | 9.75 | 6.65 | 5.60 | 8.29 | | GrC18 | 10.95 | 8.05 | 9.85 | 6.05 | 7.10 | 5.10 | 5.89 | | GrC19 | 12.15 | 9.30 | 8.35 | 6.70 | 6.80 | 5.20 | 6.47 | | GrC20 | 9.75 | 9.15 | 7.85 | 5.55 | 5.90 | 4.70 | 4.19 | | GrC21 | 12.40 | 9.65 | 9.90 | 6.70 | 9.30 | 5.05 | 8.46 | | GrC22 | 9.65 | 8.45 | 7.65 | 6.10 | 8.10 | 6.25 | 5.40 | | GrC23 | 8.75 | 7.25 | 6.75 | 6.25 | 5.70 | 4.20 | 2.16 | | GrC24 | 9.70 | 6.20 | 7.35 | 5.20 | 4.70 | 3.75 | 2.20 | | GrC25 | 7.80 | 9.20 | 7.55 | 6.35 | 6.45 | 4.35 | 3.94 | | GrC26 | 9.55 | 8.35 | 7.35 | 5.95 | 7.75 | 5.50 | 4.68 | | GrC27 | 13.30 | 8.35 | 9.10 | 8.85 | 7.40 | 5.25 | 8.14 | | GrC28 | 12.85 | 9.75 | 10.40 | 9.60 | 9.95 | 6.65 | 10.54 | | GrC29 | 9.90 | 10.35 | 10.10 | 6.85 | 8.35 | 5.85 | 7.55 | | GrC30 | 12.90 | 11.25 | 9.95 | 9.95 | 10.00 | 6.85 | 11.22 | | GrC31 | 14.45 | 11.20 | 10.65 | 8.55 | 10.35 | 6.50 | 11.82 | | GrC32 | 10.95 | 9.25 | 9.90 | 6.55 | 9.15 | 6.25 | 7.78 | | GrC33 | 11.35 | 9.35 | 9.85 | 7.35 | 7.70 | 5.90 | 7.34 | | GrD34 | 10.45 | 7.45 | 11.20 | 8.95 | 8.10 | 6.20 | 8.07 | | GrD35 | 12.25 | 9.25 | 10.45 | 9.70 | 8.85 | 6.85 | 9.71 | | GrD36 | 11.25 | 8.65 | 10.30 | 9.10 | 8.40 | 6.65 | 8.49 | | GrD37 | 10.20 | 8.30 | 8.95 | 6.75 | 6.80 | 5.35 | 5.20 | | GrD38 | 8.85 | 8.25 | 9.40 | 5.15 | 7.65 | 5.30 | 5.13 | | GrD39 | 8.25 | 8.95 | 6.25 | 7.95 | 7.70 | 4.95 | 5.10 | | GrD40 | 10.35 | 9.25 | 8.70 | 5.45 | 9.75 | 6.30 | 7.34 | | GOOD | 14.45 | 11.70 | 12.05 | 10.10 | 10.40 | 7.95 | 13.52 | | POOR | 7.80 | 6.20 | 6.25 | 5.15 | 4.70 | 3.75 | 0.00 | Row annotation | RowName | colorCode | PointShape | |---------|-----------|-----------:| | GrA01 | d6 | -10125 | | GrA02 | d6 | -10126 | | GrA03 | d6 | -10124 | | GrA04 | d6 | -10125 | | GrA05 | d6 | -10124 | | GrB06 | d7 | -10123 | | GrC07 | s8 | -10125 | | GrC08 | s8 | -10125 | | GrC09 | s8 | -10125 | | GrC10 | s8 | -10124 | | GrC11 | s8 | -10126 | | GrC12 | s8 | -10126 | | GrC13 | s8 | -10126 | | GrC14 | s8 | -10126 | | GrC15 | s8 | -10125 | | GrC16 | s8 | -10123 | | GrC17 | s8 | -10124 | | GrC18 | s8 | -10123 | | GrC19 | s8 | -10123 | | GrC20 | s8 | -10122 | | GrC21 | s8 | -10124 | | GrC22 | s8 | -10123 | | GrC23 | s8 | -10122 | | GrC24 | s8 | -10122 | | GrC25 | s8 | -10122 | | GrC26 | s8 | -10122 | | GrC27 | s8 | -10123 | | GrC28 | s8 | -10126 | | GrC29 | s8 | -10124 | | GrC30 | s8 | -10126 | | GrC31 | s8 | -10126 | | GrC32 | s8 | -10124 | | GrC33 | s8 | -10123 | | GrD34 | p2 | -10124 | | GrD35 | p2 | -10125 | | GrD36 | p2 | -10125 | | GrD37 | p2 | -10122 | | GrD38 | p2 | -10122 | | GrD39 | p2 | -10122 | | GrD40 | p2 | -10123 | Column annotation | ColName | colorCode | PointShape | |---------|-----------|-----------:| | LocA | d3 | -10114 | | LocB | d3 | -10117 | | LocC | d3 | -10115 | | LocD | d3 | -10116 | | LocE | d3 | -10113 | | LocF | d3 | -10112 | | DIST | p10 | -9679 |