Matthew Peterson’s Code Blog

Code Seen in Videos

Welcome to my running code blog where I will provide code seen in the videos below!

I will try to adapt the script to universally available data such as the iris package in R, but if that cannot be accomplished I will provide the code I used but will omit specific data names or file paths while still maintaining the integrity of the code that you can use and adapt to your data.

This will be a living, breathing document that I intend to update as needed. If you have questions or comments feel free to ask them below!

Pairwise Comparisons

(As seen in the video “Make a grouped boxplot with pairwise comparisons (ggplot2)!”

You will need to use the ggplot2 package

library(ggplot2)

Step 1, clear your global environment

rm(list = ls())

Step 2, load the data set you intend to analyze

Data Name <- read.csv("Direct/Path/To/Your/Data/Data_name.csv")

First let’s form our grouped boxplot

ggplot(data = Data, aes(x=________, y=_________, fill = _________))+

geom_boxplot(aes(x=_______, y=___________, fill=____________))+

scale_fill_viridis_d()+

theme_classic()

Now let’s add the anova

ggplot(data = Data, aes(x=________, y=_________, fill = _________))+

geom_boxplot(aes(x=_______, y=___________, fill=____________))+

scale_fill_viridis_d()+

theme_classic()+

stat_compare_means(method = "anova"

Now we create the comparisons we want

my_comparisons<- list(c("Afghan", "China"), c("Afghan", "India"), c("Afghan", "Viridis"))

NOTE: You should list your comparisons for your specific data as shown above

Finally, let’s add some comparisons

ggplot(data = Dry, aes(x=___________, y=_____________, fill = _________))+

geom_boxplot(aes(x=Accession, y=AVG_Dry_Wt., fill=Treatment))+

scale_fill_viridis_d()+

theme_classic()+

stat_compare_means(method = "anova", label.y = 750)+

stat_compare_means(comparisons = my_comparisons, method = "t.test")

For clean figures it may be best to simply create an anova table to exhibit statistical comparisons

anova(lm(Y-variable ~X-Variable*Treatment, data = Dry))

Principal Component Analysis in R

In this tutorial we will follow step by step how to make a PCA plot based on the R-bloggers webpage that will be linked below.

view(iris)

PCA in R
PCA plots are based on independent variables so any dependent variable must be removed

pc<-prcomp(iris[-5], center = TRUE, scale = TRUE)

The scale function is used for normalization

pc$scale

Here we will print the results which will get standard deviations and loadings

print(pc)

summary(pc)

The first principal components explain the variability

Making The Bi-Plot
you will need the devtools package

library(devtools)

install_github("vqv/ggbiplot")

library(ggbiplot)

g <- ggbiplot(pc, obs.scale = 1, var.scale = 1, groups = iris$Species, ellipse = TRUE, circle = TRUE, ellipse.prob = 0.68)

pc is the dataset we made which contains our principal components
obs.scale and var.scale are set to 1 so no scale is applied
we group by species so we can observe the differences within the iris dataset
ellipse is true so it draws and ellipse around the group of points in the plot, which represent the probability density
circle is true for plotting a unity circle in the plot which will represent the correlation structure of the variables
ellipse.prob is a set parameter to make an ellipse that encompasses around 68% of the observations for each group

g <- g + scale_color_discrete(name = '')

g <- g + theme(legend.direction = 'horizontal',

               legend.position = 'top')

print(g)

As we can see, PC1 explains about 73.7% and PC2 explains about 22,1%
Arrows that are closer to each other indicates higher amounts of correlation
A biplot is an important tool in PCA to understand what is going on in the dataset

Code Seen in Videos

Pairwise Comparisons

(As seen in the video “Make a grouped boxplot with pairwise comparisons (ggplot2)!”

Principal Component Analysis in R

Making The Bi-Plot