Today we are going to tidy and process our data into a suitable format for future plotting.
I will break down the steps for manipulating and tidying the data into 6 main steps below.
Similar to last week, let’s begin by loading the packages that we will need today.
library(tidyverse)
library(here)
Great. Now I will read in the data the same way we did last week.
This step only needs to be done once
#list files
files <- list.files(here::here(),full.names = TRUE)[1:3]
files
## [1] "/Users/grad/Box/Joy_worm_images/Joy/20200519_jordan_H01.csv"
## [2] "/Users/grad/Box/Joy_worm_images/Joy/20200521_jordan_H02.csv"
## [3] "/Users/grad/Box/Joy_worm_images/Joy/20200526_jordan_H03.csv"
#read in data from files
worms <- purrr::map_dfr(files, ~readr::read_csv(.x))
| X1 | Label | Area | Angle | Length |
|---|---|---|---|---|
| 1 | p01-growth-H01-2X_B01.TIF | 81 | 0.000 | 80.435 |
| 2 | p01-growth-H01-2X_B01.TIF | 5 | 0.000 | 3.875 |
| 3 | p01-growth-H01-2X_B01.TIF | 77 | 0.000 | 76.811 |
| 4 | p01-growth-H01-2X_B01.TIF | 6 | 36.027 | 5.101 |
This first step has two parts.
We are going to use dplyr::mutate to add a column called Row. At this point we are just going to assign this new column with the actual row number. So Row 5 will have the value 5, Row 50 will be 50, Row 100 will be 100 and so on. We will be using this Row for other things later on.
We will use dplyr::select to select only the following columns: Row, Label, and Length. These are the only three we need for our next steps.
Notice, we are using pipes for the first time! It makes coding much easier since we can now do these two steps back to back. R will already know what to use as an input for each function so you only need to designate the dataframe worms once.
step1 <- worms %>%
dplyr::mutate(Row = row_number()) %>%
dplyr::select(Row, Label, Length)
| Row | Label | Length |
|---|---|---|
| 1 | p01-growth-H01-2X_B01.TIF | 80.435 |
| 2 | p01-growth-H01-2X_B01.TIF | 3.875 |
| 3 | p01-growth-H01-2X_B01.TIF | 76.811 |
| 4 | p01-growth-H01-2X_B01.TIF | 5.101 |
We talked about how the column Label holds a lot of information that we would like to separate into multiple columns. We will do this with two main steps.
step2 <- step1 %>%
tidyr::separate(Label, into=c("Plate", "Experiment", "Hour", "Magnification", "Well"), sep="[[:punct:]]") %>%
dplyr::mutate(Hour = stringr::str_extract(Hour, pattern = "[:digit:]{2}"))
## Warning: Expected 5 pieces. Additional pieces discarded in 150 rows [1, 2, 3, 4,
## 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
| Row | Plate | Experiment | Hour | Magnification | Well | Length |
|---|---|---|---|---|---|---|
| 1 | p01 | growth | 01 | 2X | B01 | 80.435 |
| 2 | p01 | growth | 01 | 2X | B01 | 3.875 |
| 3 | p01 | growth | 01 | 2X | B01 | 76.811 |
| 4 | p01 | growth | 01 | 2X | B01 | 5.101 |
Also notice that we get a warning out of this run. R is alerting us that it expected to separate the column Label into 6 pieces but we only gave it 5 new columns. It is telling us that it discarded the additional information it separated. In our case this is perfectly okay. However you can image that in some other case this warning would be helpful to alerting you that you forgot to include enough columns to contain all of your information.
When you guys were collecting measurements, you were careful to always measure the length of an animal first and the width of the same animal second. So in our dataframe every two rows corresponds to a single animal. We now need to make that designation. This will be done in a single step.
We will use dplyr::group_by to both group the data and create a new column to show these groupoings. (I actually just learned how to do this a few weeks ago and tbh I am not entirely sure how it works but it works so that’s what matters)
step3 <- step2 %>%
dplyr::group_by(Animal = rep(row_number(), length.out = n(), each = 2))
| Row | Plate | Experiment | Hour | Magnification | Well | Length | Animal |
|---|---|---|---|---|---|---|---|
| 1 | p01 | growth | 01 | 2X | B01 | 80.435 | 1 |
| 2 | p01 | growth | 01 | 2X | B01 | 3.875 | 1 |
| 3 | p01 | growth | 01 | 2X | B01 | 76.811 | 2 |
| 4 | p01 | growth | 01 | 2X | B01 | 5.101 | 2 |
Now we should have a dataframe that has a new column called Animal that shows a single animal for each Length and Width measurement.
So now that we have grouped the data by Animal we want to actually designate which value is Length and which is Width. To do this we are going to use dplyr::mutate again. But now we are going to change the column Row that we created in step 1.
We need a simple way to tell R which measurements should be Length and which should be Width. What I came up with is telling R that we have two possible scenarios (or cases). A case where:
The value in the Length column is less than 60 – and therefore is a Width measurement
or ….
The value in Length is greater than or equal to 60 – and therefore is a Length measurement.
To apply these two conditions in code we use the function dplyr::case_when
step4 <- step3 %>%
dplyr::mutate(Row = dplyr::case_when(Length < 60 ~ "Width",
Length >= 60 ~ "Length"))
| Row | Plate | Experiment | Hour | Magnification | Well | Length | Animal |
|---|---|---|---|---|---|---|---|
| Length | p01 | growth | 01 | 2X | B01 | 80.435 | 1 |
| Width | p01 | growth | 01 | 2X | B01 | 3.875 | 1 |
| Length | p01 | growth | 01 | 2X | B01 | 76.811 | 2 |
| Width | p01 | growth | 01 | 2X | B01 | 5.101 | 2 |
And just like that we now know which Rows correspond to Length measurements and which correspond to Width. I tested this out with everyones data and it should uniformly work for you all
Now we are getting towards the end of the “tidy & manipulate” section. The final thing we want to do here is spread out the data to make it wider. We would like to have two separate columns for Length and Width. To do so we will use the function tidyr::pivot_wider.
step5 <- step4 %>%
tidyr::pivot_wider(names_from = Row, values_from = Length)
| Plate | Experiment | Hour | Magnification | Well | Animal | Length | Width |
|---|---|---|---|---|---|---|---|
| p01 | growth | 01 | 2X | B01 | 1 | 80.435 | 3.875 |
| p01 | growth | 01 | 2X | B01 | 2 | 76.811 | 5.101 |
| p01 | growth | 01 | 2X | B01 | 3 | 84.011 | 5.080 |
| p01 | growth | 01 | 2X | B01 | 4 | 70.411 | 4.178 |
The last thing we are going to do is a bit of data processing. We talked about wanting to not only be able to plot what happens Length and Width over time but also Volume. As such, we will first need to create a Volume column.
Again we are using dplyr::mutate to add a new column. We will actually create 2 new columns. The first will be Radius. This time instead of assigning a new column to a single value we will be assigning it to an equation.
We know that the Radius of an object is simply its Width/2. So all we need to tell R is to do this calculation. Similarly we will do this to assign the Volume column (except in this case the equation is slightly longer).
Note: R already knows pi stands for the long mathematical constant 3.14…
step6 <- step5 %>%
dplyr::mutate(Radius = Width/2,
Volume = pi*Radius^2*Length)
| Plate | Experiment | Hour | Magnification | Well | Animal | Length | Width | Radius | Volume |
|---|---|---|---|---|---|---|---|---|---|
| p01 | growth | 01 | 2X | B01 | 1 | 80.435 | 3.875 | 1.9375 | 948.5896 |
| p01 | growth | 01 | 2X | B01 | 2 | 76.811 | 5.101 | 2.5505 | 1569.7263 |
| p01 | growth | 01 | 2X | B01 | 3 | 84.011 | 5.080 | 2.5400 | 1702.7601 |
| p01 | growth | 01 | 2X | B01 | 4 | 70.411 | 4.178 | 2.0890 | 965.3110 |
Great! So we could basically be done now. But I want to do one last thing. The measurements you took from my images were all in pixels. We will now convert pixels to microns. This again is done using dplyr::mutate (a function with extreme versatility and application if you haven’t realized already).
Essentially we are replacing the values already held in these columns.
tidydata <- step6 %>%
dplyr::mutate(Length = 3.2937*Length,
Width = 3.2937*Width,
Radius = 3.2937*Radius,
Volume = 3.2937*Volume)
| Plate | Experiment | Hour | Magnification | Well | Animal | Length | Width | Radius | Volume |
|---|---|---|---|---|---|---|---|---|---|
| p01 | growth | 01 | 2X | B01 | 1 | 264.9288 | 12.76309 | 6.381544 | 3124.370 |
| p01 | growth | 01 | 2X | B01 | 2 | 252.9924 | 16.80116 | 8.400582 | 5170.208 |
| p01 | growth | 01 | 2X | B01 | 3 | 276.7070 | 16.73200 | 8.365998 | 5608.381 |
| p01 | growth | 01 | 2X | B01 | 4 | 231.9127 | 13.76108 | 6.880539 | 3179.445 |
Great job! Now we have a tidy dataframe that is ready for us to plot. We will learn plotting basics next week.
Try these steps out on your own this week. Now that you have been introduced to piping (%>%) can you figure out how to string all these steps into a single block of code?
ie:
tidydata <- worms %>%
(step 1 code) %>%
(step 2 code) %>%
(step 3 code) %>%
etc...