Matthew Peterson’s Code Blog

Code Seen in Videos

Welcome to my running code blog where I will provide code seen in the videos below!

I will try to adapt the script to universally available data such as the iris package in R, but if that cannot be accomplished I will provide the code I used but will omit specific data names or file paths while still maintaining the integrity of the code that you can use and adapt to your data.

This will be a living, breathing document that I intend to update as needed. If you have questions or comments feel free to ask them below!

Pairwise Comparisons

(As seen in the video “Make a grouped boxplot with pairwise comparisons (ggplot2)!

  • You will need to use the ggplot2 package
library(ggplot2)
  • Step 1, clear your global environment
rm(list = ls())
  • Step 2, load the data set you intend to analyze
Data Name <- read.csv("Direct/Path/To/Your/Data/Data_name.csv")
  • First let’s form our grouped boxplot
ggplot(data = Data, aes(x=________, y=_________, fill = _________))+

geom_boxplot(aes(x=_______, y=___________, fill=____________))+

scale_fill_viridis_d()+

theme_classic()
  • Now let’s add the anova
ggplot(data = Data, aes(x=________, y=_________, fill = _________))+

geom_boxplot(aes(x=_______, y=___________, fill=____________))+

scale_fill_viridis_d()+

theme_classic()+

stat_compare_means(method = "anova"
  • Now we create the comparisons we want
my_comparisons<- list(c("Afghan", "China"), c("Afghan", "India"), c("Afghan", "Viridis"))
  • NOTE: You should list your comparisons for your specific data as shown above
  • Finally, let’s add some comparisons
ggplot(data = Dry, aes(x=___________, y=_____________, fill = _________))+

geom_boxplot(aes(x=Accession, y=AVG_Dry_Wt., fill=Treatment))+

scale_fill_viridis_d()+

theme_classic()+

stat_compare_means(method = "anova", label.y = 750)+

stat_compare_means(comparisons = my_comparisons, method = "t.test")
  • For clean figures it may be best to simply create an anova table to exhibit statistical comparisons
anova(lm(Y-variable ~X-Variable*Treatment, data = Dry))

Principal Component Analysis in R

(As seen in the video “Principal Component Analysis in R”)

  • In this tutorial we will follow step by step how to make a PCA plot based on the R-bloggers webpage that will be linked below.
view(iris)
  • PCA in R

  • PCA plots are based on independent variables so any dependent variable must be removed

pc<-prcomp(iris[-5], center = TRUE, scale = TRUE)
  • The scale function is used for normalization
pc$scale
  • Here we will print the results which will get standard deviations and loadings
print(pc)
summary(pc)
  • The first principal components explain the variability

    Making The Bi-Plot

  • you will need the devtools package

library(devtools)

install_github("vqv/ggbiplot")

library(ggbiplot)

g <- ggbiplot(pc, obs.scale = 1, var.scale = 1, groups = iris$Species, ellipse = TRUE, circle = TRUE, ellipse.prob = 0.68)
  • pc is the dataset we made which contains our principal components

  • obs.scale and var.scale are set to 1 so no scale is applied

  • we group by species so we can observe the differences within the iris dataset

  • ellipse is true so it draws and ellipse around the group of points in the plot, which represent the probability density

  • circle is true for plotting a unity circle in the plot which will represent the correlation structure of the variables

  • ellipse.prob is a set parameter to make an ellipse that encompasses around 68% of the observations for each group

g <- g + scale_color_discrete(name = '')

g <- g + theme(legend.direction = 'horizontal',

               legend.position = 'top')

print(g)
  • As we can see, PC1 explains about 73.7% and PC2 explains about 22,1%

  • Arrows that are closer to each other indicates higher amounts of correlation

  • A biplot is an important tool in PCA to understand what is going on in the dataset

Making a Box and Whisker Plot

(As seen in the video “RStudio Tutorial (Geom Boxplot with Kruskal Wallis)”)

  • Using ggplot and the existing iris dataset within RStudio choose which 2 values you wish to compare. In this example we are looking to compare species and petal length.

  • The first thing we need to do is to ensure we can plot a box and whisker plot before jumping into our comparisons

    View(iris)
    
    ggplot(iris, aes(x=Species, y=Petal.Length, fill=Species))+
      geom_boxplot()+
      theme_classic()+
      labs(y="Petal Length")
  • Next we need to create our comparison list

    my_comparisons_iris<- list(c("setosa", "versicolor"), c("setosa", "virginica"))
  • Next we need to integrate our comparisons into our box and whisker plot and tell ggplot to compare the mean values within our comparison list.

    ggplot(iris, aes(x=Species, y=Petal.Length, fill=Species))+
      geom_boxplot()+
      theme_classic()+
      labs(y="Petal Length")+
      stat_compare_means(comparisons = my_comparisons_iris, method = "wilcox.test")+
      stat_compare_means()
  • Note: set your comparison method to whichever statistical test you need, in this example we use a Wilcox test.

Maxing a Boxplot with an ANOVA

(As seen in the video “RStudio ggplot tutorial (Geom Boxplot with ANOVA)”)

  • Using ggplot and the existing dataset iris use the following code to create a boxplot with an ANOVA comparison

    ggplot(data = iris, aes(x=Species, y=Petal.Length, fill =Species))+
      geom_boxplot()+
      theme_classic()+
      scale_color_viridis_d()+
      labs(x = "Species", y= "Petal.Length")+ 
      theme(axis.title.x = element_text(size = 16), axis.title.y = element_text(size = 16))+
      stat_compare_means(method = "anova")
  • Adjust the text size using the theme line and choose your comparison test with stat_compare_means

Making an Animated Plot

(As seen in the video “Animated Plot (make a .gif in R!)”)

  • You will need to install the following packages

    install.packages("ggplot2") 
    install.packages("gganimate")
    install.packages("gapminder")
    devtools::install_github("dgrtwo/gganimate")
    install.packages("animation")
    install.packages("gifski")
    install.packages("magick")
    
    
    library(ggplot2)
    library(gganimate)
    library(gapminder)
    library(magick)
    library(animation)
    library(gifski)
  • This tutorial uses the preexisting data set “gapminder” which looks at countries, life expectancy, population, and GDP every 5 years between 1952 and 2007.

    p <- ggplot(INSERTDATASET HERE, aes(X variable, Y variable, frame = motion capture variable, color = continent, size = pop)) +
      geom_point(alpha = 0.7) +  # Add scatter plot with transparency
      transition_time(year) + 
      scale_x_log10()+# Animate over time
      scale_color_viridis_d() +  # Color scale by continent
      scale_size_continuous(range = c(1, 10)) +  # Adjust size range for population
      labs(title = "Year: {frame_time}") +  # Title with frame time
      theme_minimal() +  # Minimal theme for better visibility
      guides(size = "none")  # Remove legend for size
  • The plot was put into variable “p” to condense the code. The code above will require you to select your own x and y variables as well as the frame variable. If you need to refer to my video on this click the link at the start of this section.

  • Now we can move onto animating the plot and save it.

    animate(p, renderer = gifski_renderer(), width = 800, height = 600, res = 100)
    
    
    anim_save("Creat your own file path here/animation3.gif", animate(p, renderer = gifski_renderer(), width = 800, height = 600, res = 100))
  • This is how you can successfully create a .gif using RStudio!