Background - heat maps are another very common and very useful visual when it comes to genetic data, while there are many ways to create them, add to them, and enhance them, we are going to be working through the basics with the following tutorial: https://jcoliver.github.io/learn-r/009-expression-heatmaps.html
IMPORTANT: I have provided you with the data you need to complete this tutorial, it is found on Canvas as “expression.csv”. When you copy and use the code from the tutorial, you will need to rewrite the code that brings in the file and calls it “exp_data”, you will have to use your own pathway and the file name will be “expression.csv”
library("tidyr")
library("ggplot2")
exp_data <- read.csv(file = "expression.csv",
stringsAsFactors = FALSE)
str(exp_data)
## 'data.frame': 10 obs. of 12 variables:
## $ subject : chr "GSM1684095" "GSM1684096" "GSM1684097" "GSM1684098" ...
## $ treatment: chr "control" "influenza" "control" "influenza" ...
## $ IFNA5 : num 83.1 10096.5 97.8 8181 81.7 ...
## $ IFNA13 : num 107 18974 128 15647 103 ...
## $ IFNA2 : num 195 24029 129 23060 101 ...
## $ SPIN1 : num 121 108 127 124 104 ...
## $ ZNF451 : num 569 432 304 320 271 ...
## $ IFNA16 : num 190 23060 170 21248 101 ...
## $ RASSF1 : num 353 353 308 267 309 ...
## $ IFNW1 : num 95.4 8665.9 97 6903.5 94.5 ...
## $ MSR1 : num 107 109 95 126 105 ...
## $ MIR1976 : num 104 106.3 82.8 108.9 91.4 ...
#exp_long <- pivot_longer(data = exp_data,
#cols = everything(),
#names_to = "gene",
#values_to = "expression")
exp_long <- pivot_longer(data = exp_data,
cols = -c(subject, treatment),
names_to = "gene",
values_to = "expression")
head(exp_long)
## # A tibble: 6 × 4
## subject treatment gene expression
## <chr> <chr> <chr> <dbl>
## 1 GSM1684095 control IFNA5 83.1
## 2 GSM1684095 control IFNA13 107.
## 3 GSM1684095 control IFNA2 195.
## 4 GSM1684095 control SPIN1 121.
## 5 GSM1684095 control ZNF451 569.
## 6 GSM1684095 control IFNA16 190.
exp_heatmap <- ggplot(data = exp_long, mapping = aes(x = subject,
y = gene,
fill = expression)) +
geom_tile()
exp_heatmap
exp_long$log.expression <- log(exp_long$expression)
exp_heatmap <- ggplot(data = exp_long, mapping = aes(x = subject,
y = gene,
fill = log.expression)) +
geom_tile()
exp_heatmap
exp_heatmap <- ggplot(data = exp_long, mapping = aes(x = subject,
y = gene,
fill = log.expression)) +
geom_tile() +
xlab(label = "Subject") + # Add a nicer x-axis title
theme(axis.title.y = element_blank(), # Remove the y-axis title
axis.text.x = element_text(angle = 45, vjust = 0.5)) # Rotate the x-axis labels
exp_heatmap
exp_heatmap <- ggplot(data = exp_long, mapping = aes(x = subject,
y = gene,
fill = log.expression)) +
geom_tile() +
xlab(label = "Subject") +
# facet_grid makes two panels, one for control, one for flu:
facet_grid(~ treatment, switch = "x", scales = "free_x", space = "free_x") +
theme(axis.title.y = element_blank(),
axis.text.x = element_text(angle = 45, vjust = 0.5))
exp_heatmap
Questions: 1. How does clustering improve the heatmap? Assuming that “clustering” is referring to the separation of the control cells from the flu cells;this helps the reader focus on the sample data patterns while still having the controls present. The separation also helps analyze the data more clearly, and avoids misinterpretations. 2. Interpret the final figure you make! Tell me what it shows. The final figure shows the gene expression of both uninfected (control) and infected with influenza. The subjects are noted on the X-axis and the gene of interest is on the y-axis. each cell shows the log expression of that gene signified by the hue of the cell in comparison to the log expression color scale. In this heatmap we see increased log expression of IFNW1, IFNA5, IFNA2, IFNA16, and IFNA13 in the influenza infected cells when compared to the uninfected control cells