Matthew Peterson’s Code Blog
Code Seen in Videos
Welcome to my running code blog where I will provide code seen in the videos below!
I will try to adapt the script to universally available data such as the iris package in R, but if that cannot be accomplished I will provide the code I used but will omit specific data names or file paths while still maintaining the integrity of the code that you can use and adapt to your data.
This will be a living, breathing document that I intend to update as needed. If you have questions or comments feel free to ask them below!
Pairwise Comparisons
(As seen in the video “Make a grouped boxplot with pairwise comparisons (ggplot2)!”
- You will need to use the ggplot2 package
library(ggplot2)- Step 1, clear your global environment
rm(list = ls())- Step 2, load the data set you intend to analyze
Data Name <- read.csv("Direct/Path/To/Your/Data/Data_name.csv")
- First let’s form our grouped boxplot
ggplot(data = Data, aes(x=________, y=_________, fill = _________))+
geom_boxplot(aes(x=_______, y=___________, fill=____________))+
scale_fill_viridis_d()+
theme_classic()- Now let’s add the anova
ggplot(data = Data, aes(x=________, y=_________, fill = _________))+
geom_boxplot(aes(x=_______, y=___________, fill=____________))+
scale_fill_viridis_d()+
theme_classic()+
stat_compare_means(method = "anova"- Now we create the comparisons we want
my_comparisons<- list(c("Afghan", "China"), c("Afghan", "India"), c("Afghan", "Viridis"))- NOTE: You should list your comparisons for your specific data as shown above
- Finally, let’s add some comparisons
ggplot(data = Dry, aes(x=___________, y=_____________, fill = _________))+
geom_boxplot(aes(x=Accession, y=AVG_Dry_Wt., fill=Treatment))+
scale_fill_viridis_d()+
theme_classic()+
stat_compare_means(method = "anova", label.y = 750)+
stat_compare_means(comparisons = my_comparisons, method = "t.test")- For clean figures it may be best to simply create an anova table to exhibit statistical comparisons
anova(lm(Y-variable ~X-Variable*Treatment, data = Dry))Principal Component Analysis in R
- In this tutorial we will follow step by step how to make a PCA plot based on the R-bloggers webpage that will be linked below.
view(iris)PCA in R
PCA plots are based on independent variables so any dependent variable must be removed
pc<-prcomp(iris[-5], center = TRUE, scale = TRUE)- The scale function is used for normalization
pc$scale- Here we will print the results which will get standard deviations and loadings
print(pc)summary(pc)The first principal components explain the variability
Making The Bi-Plot
you will need the devtools package
library(devtools)
install_github("vqv/ggbiplot")
library(ggbiplot)
g <- ggbiplot(pc, obs.scale = 1, var.scale = 1, groups = iris$Species, ellipse = TRUE, circle = TRUE, ellipse.prob = 0.68)pc is the dataset we made which contains our principal components
obs.scale and var.scale are set to 1 so no scale is applied
we group by species so we can observe the differences within the iris dataset
ellipse is true so it draws and ellipse around the group of points in the plot, which represent the probability density
circle is true for plotting a unity circle in the plot which will represent the correlation structure of the variables
ellipse.prob is a set parameter to make an ellipse that encompasses around 68% of the observations for each group
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(g)As we can see, PC1 explains about 73.7% and PC2 explains about 22,1%
Arrows that are closer to each other indicates higher amounts of correlation
A biplot is an important tool in PCA to understand what is going on in the dataset