File Download & Packages

Before we can begin to work with data, we must first import data we wish to use and install packages to run the code we create. To do this, follow the instructions on File Download Instructions.

Ensure that you have the tidyverse installed in RStudio to enable the functions found here, follow the example below.

 install.packages("tidyverse")
library(tidyverse)

To read in the data table you wish to work with, follow instructions in “File Download Instructions” section 2, “Reading Data”.

Creating a Data Frame

Once you have your data table that you plan to use, you can skip to section 3, Pivoting a Data Set. However, for educational purposes, I will create a sample data set to be used for the remainder of this document.

Gene_ID <- c("2401A01", "2401A03", "2401A04")
Condition_1 <- c(0.01, 0.05, 0.1)
Condition_2 <- c(0.5, 0.1, 1)

Data <- data.frame(Gene_ID, Condition_1, Condition_2)

print (Data)
##   Gene_ID Condition_1 Condition_2
## 1 2401A01        0.01         0.5
## 2 2401A03        0.05         0.1
## 3 2401A04        0.10         1.0

Pivoting a Data Set

Now, we can start to manipulate our data set. In some instances, a package may require a data frame to be in a vertical or horizontal orientation. Generally, NCBI data will be formatted in a matrix which will be incompatible with ggplot. We can fix this using a method called “pivoting,” where we can change our matrix into a format that works with ggplot aptly named a pivot table.

To start, use either the pivot_longer, or pivot_wider function to manipulate your data. Pay attention to which one is being used, because ‘longer’ essentially rotates wide horizontal data to a vertical format, whilst ‘wider’ makes vertical data become horizontal.

Pivot Longer

Pivot longer is a function that selects columns to be turned into rows in your data matrix. When working with “pivot_longer,” make sure you set it up correctly with the following: use cols to move any pre-existing columns in the matrix to a new column in the pivoted table, names_to in order to assign a name to a new column created, and the values_to function to create a name for the column that your table values will fill.

Small things to remember, when pivoting large data structures, use the cols = argument to specify the columns that you wish to be pivoted. If you wish for a column to not be pivoted, but all others to pivot, you can use the ! in front of the name of the column you wish to remain unchanged.

In the below code, we will show you the original file, and the file output when pivoted. The resulting table might be labeled tibble, which is a type of modified data frame in R, it is still usable for all function types. Notice that the Gene ID column remains unchanged and that the names of the rows created by pivot_longer are now set to Conditions and their values to Expression.

b <- Data

print(b)
##   Gene_ID Condition_1 Condition_2
## 1 2401A01        0.01         0.5
## 2 2401A03        0.05         0.1
## 3 2401A04        0.10         1.0
f <- pivot_longer(b, 
                  cols = !"Gene_ID",
                  names_to = "Conditions",
                  values_to = "Expression")
print(f)
## # A tibble: 6 × 3
##   Gene_ID Conditions  Expression
##   <chr>   <chr>            <dbl>
## 1 2401A01 Condition_1       0.01
## 2 2401A01 Condition_2       0.5 
## 3 2401A03 Condition_1       0.05
## 4 2401A03 Condition_2       0.1 
## 5 2401A04 Condition_1       0.1 
## 6 2401A04 Condition_2       1

Object f, shown above is a long format table of a data set. This object will be referenced again in the section 3.B “Pivot Wider.”

Pivot Wider

If we have a data set that is longer than it is wide, then we could use pivot_wider. When using pivot_wider, its important to note that the syntax to use pivot wider is different from pivot longer.

Pivot_wider by default will use the columns specified when creating a longer data frame. When you choose a column to pivot, all names in the cells will be grouped by names, and turned into columns. To fill those new columns with data, simply use the values_from function to dictate the data that will fill the new cells of the created columns. We are working with the object f again, and here we will set up object g to be the width wise table.

print(f)
## # A tibble: 6 × 3
##   Gene_ID Conditions  Expression
##   <chr>   <chr>            <dbl>
## 1 2401A01 Condition_1       0.01
## 2 2401A01 Condition_2       0.5 
## 3 2401A03 Condition_1       0.05
## 4 2401A03 Condition_2       0.1 
## 5 2401A04 Condition_1       0.1 
## 6 2401A04 Condition_2       1
g <- pivot_wider(f,
                 names_from = "Conditions",
                 values_from = "Expression"
                 )

print(g)
## # A tibble: 3 × 3
##   Gene_ID Condition_1 Condition_2
##   <chr>         <dbl>       <dbl>
## 1 2401A01        0.01         0.5
## 2 2401A03        0.05         0.1
## 3 2401A04        0.1          1