Matthew Peterson’s Code Blog
Code Seen in Videos
Welcome to my running code blog where I will provide code seen in the videos below!
I will try to adapt the script to universally available data such as the iris package in R, but if that cannot be accomplished I will provide the code I used but will omit specific data names or file paths while still maintaining the integrity of the code that you can use and adapt to your data.
This will be a living, breathing document that I intend to update as needed. If you have questions or comments feel free to ask them below!
Pairwise Comparisons
(As seen in the video “Make a grouped boxplot with pairwise comparisons (ggplot2)!”
- You will need to use the ggplot2 package
library(ggplot2)- Step 1, clear your global environment
rm(list = ls())- Step 2, load the data set you intend to analyze
Data Name <- read.csv("Direct/Path/To/Your/Data/Data_name.csv")
- First let’s form our grouped boxplot
ggplot(data = Data, aes(x=________, y=_________, fill = _________))+
geom_boxplot(aes(x=_______, y=___________, fill=____________))+
scale_fill_viridis_d()+
theme_classic()- Now let’s add the anova
ggplot(data = Data, aes(x=________, y=_________, fill = _________))+
geom_boxplot(aes(x=_______, y=___________, fill=____________))+
scale_fill_viridis_d()+
theme_classic()+
stat_compare_means(method = "anova"- Now we create the comparisons we want
my_comparisons<- list(c("Afghan", "China"), c("Afghan", "India"), c("Afghan", "Viridis"))- NOTE: You should list your comparisons for your specific data as shown above
- Finally, let’s add some comparisons
ggplot(data = Dry, aes(x=___________, y=_____________, fill = _________))+
geom_boxplot(aes(x=Accession, y=AVG_Dry_Wt., fill=Treatment))+
scale_fill_viridis_d()+
theme_classic()+
stat_compare_means(method = "anova", label.y = 750)+
stat_compare_means(comparisons = my_comparisons, method = "t.test")- For clean figures it may be best to simply create an anova table to exhibit statistical comparisons
anova(lm(Y-variable ~X-Variable*Treatment, data = Dry))Principal Component Analysis in R
(As seen in the video “Principal Component Analysis in R”)
- In this tutorial we will follow step by step how to make a PCA plot based on the R-bloggers webpage that will be linked below.
view(iris)PCA in R
PCA plots are based on independent variables so any dependent variable must be removed
pc<-prcomp(iris[-5], center = TRUE, scale = TRUE)- The scale function is used for normalization
pc$scale- Here we will print the results which will get standard deviations and loadings
print(pc)summary(pc)The first principal components explain the variability
Making The Bi-Plot
you will need the devtools package
library(devtools)
install_github("vqv/ggbiplot")
library(ggbiplot)
g <- ggbiplot(pc, obs.scale = 1, var.scale = 1, groups = iris$Species, ellipse = TRUE, circle = TRUE, ellipse.prob = 0.68)pc is the dataset we made which contains our principal components
obs.scale and var.scale are set to 1 so no scale is applied
we group by species so we can observe the differences within the iris dataset
ellipse is true so it draws and ellipse around the group of points in the plot, which represent the probability density
circle is true for plotting a unity circle in the plot which will represent the correlation structure of the variables
ellipse.prob is a set parameter to make an ellipse that encompasses around 68% of the observations for each group
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(g)As we can see, PC1 explains about 73.7% and PC2 explains about 22,1%
Arrows that are closer to each other indicates higher amounts of correlation
A biplot is an important tool in PCA to understand what is going on in the dataset
Making a Box and Whisker Plot
(As seen in the video “RStudio Tutorial (Geom Boxplot with Kruskal Wallis)”)
Using ggplot and the existing iris dataset within RStudio choose which 2 values you wish to compare. In this example we are looking to compare species and petal length.
The first thing we need to do is to ensure we can plot a box and whisker plot before jumping into our comparisons
View(iris) ggplot(iris, aes(x=Species, y=Petal.Length, fill=Species))+ geom_boxplot()+ theme_classic()+ labs(y="Petal Length")Next we need to create our comparison list
my_comparisons_iris<- list(c("setosa", "versicolor"), c("setosa", "virginica"))Next we need to integrate our comparisons into our box and whisker plot and tell ggplot to compare the mean values within our comparison list.
ggplot(iris, aes(x=Species, y=Petal.Length, fill=Species))+ geom_boxplot()+ theme_classic()+ labs(y="Petal Length")+ stat_compare_means(comparisons = my_comparisons_iris, method = "wilcox.test")+ stat_compare_means()Note: set your comparison method to whichever statistical test you need, in this example we use a Wilcox test.
Maxing a Boxplot with an ANOVA
(As seen in the video “RStudio ggplot tutorial (Geom Boxplot with ANOVA)”)
Using ggplot and the existing dataset iris use the following code to create a boxplot with an ANOVA comparison
ggplot(data = iris, aes(x=Species, y=Petal.Length, fill =Species))+ geom_boxplot()+ theme_classic()+ scale_color_viridis_d()+ labs(x = "Species", y= "Petal.Length")+ theme(axis.title.x = element_text(size = 16), axis.title.y = element_text(size = 16))+ stat_compare_means(method = "anova")Adjust the text size using the theme line and choose your comparison test with stat_compare_means
Making an Animated Plot
(As seen in the video “Animated Plot (make a .gif in R!)”)
You will need to install the following packages
install.packages("ggplot2") install.packages("gganimate") install.packages("gapminder") devtools::install_github("dgrtwo/gganimate") install.packages("animation") install.packages("gifski") install.packages("magick") library(ggplot2) library(gganimate) library(gapminder) library(magick) library(animation) library(gifski)This tutorial uses the preexisting data set “gapminder” which looks at countries, life expectancy, population, and GDP every 5 years between 1952 and 2007.
p <- ggplot(INSERTDATASET HERE, aes(X variable, Y variable, frame = motion capture variable, color = continent, size = pop)) + geom_point(alpha = 0.7) + # Add scatter plot with transparency transition_time(year) + scale_x_log10()+# Animate over time scale_color_viridis_d() + # Color scale by continent scale_size_continuous(range = c(1, 10)) + # Adjust size range for population labs(title = "Year: {frame_time}") + # Title with frame time theme_minimal() + # Minimal theme for better visibility guides(size = "none") # Remove legend for sizeThe plot was put into variable “p” to condense the code. The code above will require you to select your own x and y variables as well as the frame variable. If you need to refer to my video on this click the link at the start of this section.
Now we can move onto animating the plot and save it.
animate(p, renderer = gifski_renderer(), width = 800, height = 600, res = 100) anim_save("Creat your own file path here/animation3.gif", animate(p, renderer = gifski_renderer(), width = 800, height = 600, res = 100))This is how you can successfully create a .gif using RStudio!