Heat maps tutorial via ggplot2

Background - heat maps are another very common and very useful visual when it comes to genetic data, while there are many ways to create them, add to them, and enhance them, we are going to be working through the basics with the following tutorial: https://jcoliver.github.io/learn-r/009-expression-heatmaps.html

Objective: complete all of the tutorial steps. NOTE: some of the steps will result in errors or issues with the heat map - these are INTENTIONAL - they are meant to introduce you to some common errors when making this kind of figure. Be sure that you “#” (comment out) lines that are no longer useful before you upload the completed file. Place your code below.

IMPORTANT: I have provided you with the data you need to complete this tutorial, it is found on Canvas as “expression.csv”. When you copy and use the code from the tutorial, you will need to rewrite the code that brings in the file and calls it “exp_data”, you will have to use your own pathway and the file name will be “expression.csv”

library("tidyr")
library("ggplot2")
exp_data <- read.csv(file = "expression.csv", 
                     stringsAsFactors = FALSE)
str(exp_data)
## 'data.frame':    10 obs. of  12 variables:
##  $ subject  : chr  "GSM1684095" "GSM1684096" "GSM1684097" "GSM1684098" ...
##  $ treatment: chr  "control" "influenza" "control" "influenza" ...
##  $ IFNA5    : num  83.1 10096.5 97.8 8181 81.7 ...
##  $ IFNA13   : num  107 18974 128 15647 103 ...
##  $ IFNA2    : num  195 24029 129 23060 101 ...
##  $ SPIN1    : num  121 108 127 124 104 ...
##  $ ZNF451   : num  569 432 304 320 271 ...
##  $ IFNA16   : num  190 23060 170 21248 101 ...
##  $ RASSF1   : num  353 353 308 267 309 ...
##  $ IFNW1    : num  95.4 8665.9 97 6903.5 94.5 ...
##  $ MSR1     : num  107 109 95 126 105 ...
##  $ MIR1976  : num  104 106.3 82.8 108.9 91.4 ...
#exp_long <- pivot_longer(data = exp_data,
                         #cols = everything(),
                         #names_to = "gene",
                         #values_to = "expression")
exp_long <- pivot_longer(data = exp_data, 
                         cols = -c(subject, treatment),
                         names_to = "gene", 
                         values_to = "expression")
head(exp_long)
## # A tibble: 6 × 4
##   subject    treatment gene   expression
##   <chr>      <chr>     <chr>       <dbl>
## 1 GSM1684095 control   IFNA5        83.1
## 2 GSM1684095 control   IFNA13      107. 
## 3 GSM1684095 control   IFNA2       195. 
## 4 GSM1684095 control   SPIN1       121. 
## 5 GSM1684095 control   ZNF451      569. 
## 6 GSM1684095 control   IFNA16      190.
exp_heatmap <- ggplot(data = exp_long, mapping = aes(x = subject,
                                                     y = gene,
                                                     fill = expression)) +
  geom_tile()
exp_heatmap

exp_long$log.expression <- log(exp_long$expression)

exp_heatmap <- ggplot(data = exp_long, mapping = aes(x = subject,
                                                     y = gene,
                                                     fill = log.expression)) +
  geom_tile()

exp_heatmap

exp_heatmap <- ggplot(data = exp_long, mapping = aes(x = subject,
                                                     y = gene,
                                                     fill = log.expression)) +
  geom_tile() +
  xlab(label = "Subject") + # Add a nicer x-axis title
  theme(axis.title.y = element_blank(), # Remove the y-axis title
        axis.text.x = element_text(angle = 45, vjust = 0.5)) # Rotate the x-axis labels

exp_heatmap

exp_heatmap <- ggplot(data = exp_long, mapping = aes(x = subject,
                                                     y = gene,
                                                     fill = log.expression)) +
  geom_tile() +
  xlab(label = "Subject") +
  # facet_grid makes two panels, one for control, one for flu:
  facet_grid(~ treatment, switch = "x", scales = "free_x", space = "free_x") + 
  theme(axis.title.y = element_blank(),
        axis.text.x = element_text(angle = 45, vjust = 0.5))

exp_heatmap

Questions: 1. How does clustering improve the heatmap? Assuming that “clustering” is referring to the separation of the control cells from the flu cells;this helps the reader focus on the sample data patterns while still having the controls present. The separation also helps analyze the data more clearly, and avoids misinterpretations. 2. Interpret the final figure you make! Tell me what it shows. The final figure shows the gene expression of both uninfected (control) and infected with influenza. The subjects are noted on the X-axis and the gene of interest is on the y-axis. each cell shows the log expression of that gene signified by the hue of the cell in comparison to the log expression color scale. In this heatmap we see increased log expression of IFNW1, IFNA5, IFNA2, IFNA16, and IFNA13 in the influenza infected cells when compared to the uninfected control cells