Today is our last day of our basic R introduction. We will be plotting our data using ggplot2!
Similar to the last few weeks, let’s begin by loading the packages that we will need today.
library(tidyverse)
library(here)
Great. Now I will read in the data the same way we’ve done before.
#list files
files <- list.files(here::here(),full.names = TRUE)[1:3]
#read in data from files
worms <- purrr::map_dfr(files, ~readr::read_csv(.x))
| X1 | Label | Area | Angle | Length |
|---|---|---|---|---|
| 1 | p01-growth-H01-2X_B01.TIF | 81 | 0.000 | 80.435 |
| 2 | p01-growth-H01-2X_B01.TIF | 5 | 0.000 | 3.875 |
| 3 | p01-growth-H01-2X_B01.TIF | 77 | 0.000 | 76.811 |
| 4 | p01-growth-H01-2X_B01.TIF | 6 | 36.027 | 5.101 |
And now I’m going to tidy and process the data into an appropriate format. Remember last week we did this in six steps but as I mentioned, we can use pipes (%>%) to accomplish this in one large block of code. This is what I did below.
tidydata <- worms %>%
dplyr::mutate(row_num = row_number()) %>%
dplyr::select(row_num, Label, Length) %>%
tidyr::separate(Label, into=c("Plate", "Experiment", "Hour", "Magnification", "Well"), sep="[[:punct:]]") %>%
dplyr::mutate(Hour = stringr::str_extract(Hour, pattern = "[:digit:]{2}")) %>%
dplyr::group_by(Animal = rep(row_number(), length.out = n(), each = 2)) %>%
dplyr::mutate_at(vars(row_num), ~dplyr::case_when(Length < 60 ~ "Width",
Length >= 60 ~ "Length")) %>%
tidyr::pivot_wider(names_from = row_num, values_from = Length) %>%
tidyr::separate(Well, into=c("Row","Column"), sep=c("(?<=[A-Za-z])(?=[0-9])")) %>%
dplyr::mutate(Radius = Width/2,
Volume = pi*Radius^2*Length,
Area = 2*pi*Radius*Length + 2*pi*Radius) %>%
dplyr::mutate(Length = 3.2937*Length,
Width = 3.2937*Width,
Radius = 3.2937*Radius,
Volume = 3.2937*Volume,
Area = 3.2937*Area)
| Plate | Experiment | Hour | Magnification | Row | Column | Animal | Length | Width | Radius | Volume | Area |
|---|---|---|---|---|---|---|---|---|---|---|---|
| p01 | growth | 01 | 2X | B | 01 | 1 | 264.9288 | 12.76309 | 6.381544 | 3124.370 | 3265.252 |
| p01 | growth | 01 | 2X | B | 01 | 2 | 252.9924 | 16.80116 | 8.400582 | 5170.208 | 4107.052 |
| p01 | growth | 01 | 2X | B | 01 | 3 | 276.7070 | 16.73200 | 8.365998 | 5608.381 | 4468.613 |
| p01 | growth | 01 | 2X | B | 01 | 4 | 231.9127 | 13.76108 | 6.880539 | 3179.445 | 3087.219 |
Doesn’t using pipes make things so much more streamlined?
Now we will start plotting. This is a relatively straightforward process.
The first thing we need to do is tell R that we are setting up to plot. We do this by calling ggplot2::ggplot(). This will open up a blank plot to the right of your Rstudio session (under the Plots tab).
ggplot2::ggplot()
Now we will add components to this blank canvas by using +.
The first thing I want to add are the aesthetics. This will tell R what information you want to plot.
Lets tell R which data we want to plot. I want to plot the data held in the variable tidydata.
ggplot2::ggplot(tidydata)
I will not actually execute this block but give it a try yourself. Notice that this does not actually add anything to our blank plot. We must first add aesthetics.
Let’s start by plotting Length. In this case we want x to be Hour and y to be Length
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Length)
Notice now we have axes labels!
The last thing we need to do is tell R what type of geometric object we want to plot. Let’s try plotting just simple points. We will use geom_point()
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Length) + geom_point()
And there we have it. Now we have the basics we can play around with the aesthetics/geometric object.
Let’s start by adding to the aesthetics. Lets try changing the size of the points.
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Length, size = 4) + geom_point()
By adding the designation for size to the aes() argument, notice that R puts this information in the plot legend. To avoid this we could place the size information directly in the geom_point() argument.
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Length) + geom_point(size = 4)
Typically we place information in aes() when we want to use information thats in our dataframe (tidydata). For example, let’s say we want to color the points based on the Column they are from. In this case we would place color in aes(). If we placed it in the geom_point() argument, R would throw an error – feel free to try it out.
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Length, color = Column) + geom_point(size = 4)
Not only can we change the aesthetics but we can also change the geometric object we are plotting. Lets try making a boxplot rather than points. This is as simple as changing the last argument from geom_point() to geom_boxplot()
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Length, color = Column) + geom_boxplot()
Because we still have the color defined in aes() notice that there is a separate boxplot for each Column. Try removing the color from aes() and see what happens.
So for our purposes I like to use a geometric object similar to geom_point() but that won’t result in points laying directly on top of each other. I like to use geom_jitter()…
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Length) + geom_jitter()
Using geom_jitter() I also like to specify how much wiggle (or jitter) the points have. I like to keep their jitter pretty narrow. This can be changed by adding a width designation…
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Length) + geom_jitter(width = 0.2)
We can also layer two geometric shapes on top of each other! Let’s try adding geom_boxplot() to geom_jitter(). This is as simple as tacking it on the end:
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Length) + geom_boxplot() + geom_jitter(width = 0.2)
However, the order that you add componenets to the plot will be the order they are added. As a personal preference I like having points in front of boxplots – this is why I add the boxplot first and the geom_jitter second.
The last thing I want to talk about is adding axes labels and titles to the plot. For this we use the argument labs.
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Length) + geom_boxplot() + geom_jitter(width = 0.2) +
labs(x = "Time (Hours)", y = "Animal Length (um)", title = "Animal Length over Time")
And that’s really about it. You can play around with what you want to plot in x and y, as well as aesthetics like size and color.
alpha just makes objects more transparent. The smaller the alpha the more transparent the object
ggplot2::ggplot(tidydata) + aes(x = Hour, y = Volume) + geom_boxplot(size = 0.5) + geom_jitter(size = 0.6, alpha = 0.8, width = 0.2) +
labs(x = "Time (Hours)", y = "Animal Volume", title = "Animal Volume over Time")
And with these basics and a few extra things, that’s how I can make plots like this, with all of your data put together.