Matthew Peterson’s Code Blog

Code Seen in Videos

Welcome to my running code blog where I will provide code seen in the videos below!

I will try to adapt the script to universally available data such as the iris package in R, but if that cannot be accomplished I will provide the code I used but will omit specific data names or file paths while still maintaining the integrity of the code that you can use and adapt to your data.

This will be a living, breathing document that I intend to update as needed. If you have questions or comments feel free to ask them below!

Pairwise Comparisons

(As seen in the video “Make a grouped boxplot with pairwise comparisons (ggplot2)!”

  • You will need to use the ggplot2 package
library(ggplot2)
  • Step 1, clear your global environment
rm(list = ls())
  • Step 2, load the data set you intend to analyze
Data Name <- read.csv("Direct/Path/To/Your/Data/Data_name.csv")
  • First let’s form our grouped boxplot
ggplot(data = Data, aes(x=________, y=_________, fill = _________))+

geom_boxplot(aes(x=_______, y=___________, fill=____________))+

scale_fill_viridis_d()+

theme_classic()
  • Now let’s add the anova
ggplot(data = Data, aes(x=________, y=_________, fill = _________))+

geom_boxplot(aes(x=_______, y=___________, fill=____________))+

scale_fill_viridis_d()+

theme_classic()+

stat_compare_means(method = "anova"
  • Now we create the comparisons we want
my_comparisons<- list(c("Afghan", "China"), c("Afghan", "India"), c("Afghan", "Viridis"))
  • NOTE: You should list your comparisons for your specific data as shown above
  • Finally, let’s add some comparisons
ggplot(data = Dry, aes(x=___________, y=_____________, fill = _________))+

geom_boxplot(aes(x=Accession, y=AVG_Dry_Wt., fill=Treatment))+

scale_fill_viridis_d()+

theme_classic()+

stat_compare_means(method = "anova", label.y = 750)+

stat_compare_means(comparisons = my_comparisons, method = "t.test")
  • For clean figures it may be best to simply create an anova table to exhibit statistical comparisons
anova(lm(Y-variable ~X-Variable*Treatment, data = Dry))

Principal Component Analysis in R

  • In this tutorial we will follow step by step how to make a PCA plot based on the R-bloggers webpage that will be linked below.
view(iris)
  • PCA in R

  • PCA plots are based on independent variables so any dependent variable must be removed

pc<-prcomp(iris[-5], center = TRUE, scale = TRUE)
  • The scale function is used for normalization
pc$scale
  • Here we will print the results which will get standard deviations and loadings
print(pc)
summary(pc)
  • The first principal components explain the variability

    Making The Bi-Plot

  • you will need the devtools package

library(devtools)

install_github("vqv/ggbiplot")

library(ggbiplot)

g <- ggbiplot(pc, obs.scale = 1, var.scale = 1, groups = iris$Species, ellipse = TRUE, circle = TRUE, ellipse.prob = 0.68)
  • pc is the dataset we made which contains our principal components

  • obs.scale and var.scale are set to 1 so no scale is applied

  • we group by species so we can observe the differences within the iris dataset

  • ellipse is true so it draws and ellipse around the group of points in the plot, which represent the probability density

  • circle is true for plotting a unity circle in the plot which will represent the correlation structure of the variables

  • ellipse.prob is a set parameter to make an ellipse that encompasses around 68% of the observations for each group

g <- g + scale_color_discrete(name = '')

g <- g + theme(legend.direction = 'horizontal',

               legend.position = 'top')

print(g)
  • As we can see, PC1 explains about 73.7% and PC2 explains about 22,1%

  • Arrows that are closer to each other indicates higher amounts of correlation

  • A biplot is an important tool in PCA to understand what is going on in the dataset