notebook setup

Below, we load the two libraries used to make our plots. Make sure to un-comment and install these if you do not already have them.

We then create a saved variable with a list of ggplot variables. These are cosmetic features that are not really necessary but look pretty to me. Feel free to change to what looks better! I am not a graphic designer :)

We will use this list of features later when we create our plots.

#install.packages(c("ggplot2","gridExtra"))
library(ggplot2);library(gridExtra)

ggbar <- list(
  theme(axis.text.x = element_text(face="bold", color="gray18", size=16,angle=20),
          axis.text.y = element_text(face="bold", color="royalblue4", 
          size=16, angle=25),
          axis.title=element_text(size=14,face="italic"),
          plot.title = element_text(size=16,face="bold.italic"),
))

example 1

Here, we show some plots of how gene expression changes in two genes following exposure to the chemical Thioacetamide (TAA), following multiple dose exposure and at multiple time points.

We create the dataframes with timepoint and dose data for the genes c-myc and ccnd1. The four timepoints are 4 days, 8 days, 15 days, 29 days. The three dose levels are low, medium, and high. After dataframes are created, we add new column names, and have to re-factor the timepoints (days) because they appear as continuous levels when they are actually discrete categories.

Next, we use ggplot2 to make barplots. Each plot shows gene expression fold change at each of the four time points, at a single dose exposure. E.g. the first plot shows fold change at each time point day for the gene c-myc at only the low-dose exposure to TAA.

# create first data frame, with all data for gene c-myc. Here, we have 4 time points and 3 dose levels
TAA_cmyc <- as.data.frame(cbind(c(4,8,15,29),c(-0.50946,-0.244988,0.197505,0.734166),c(-0.783383,0.540850,0.266845,0.613256),c(-0.197337,0.581566,0.905205,1.304628)))

#second data frame has data for gene ccnd1
TAA_ccnd1 <- as.data.frame(cbind(c(4,8,15,29),c(0.124177,-0.046817, 0.209758,   0.132733),c(-0.108983,  0.543465,   0.660554,   0.907896),c(0.62012,    0.997977,   1.506773,   1.820049)))

#rename columns of each dataframe
colnames(TAA_cmyc) <-  c("days","low","medium","high")
colnames(TAA_ccnd1) <-  c("days","low","medium","high")

#below, I had to create factors for and re-level the days of exposure. This is because I added the exposure time points as numerical values even though they are not continous, as they are discrete days at which gene expression was evaluated. In short, this was a hacky solution to creating this graph to skip some other re-factoring steps and still keep the exposure days in order and considered as discrete categories. 
TAA_cmyc$days <- as.character(TAA_cmyc$days)
TAA_cmyc$days <- factor(TAA_cmyc$days, levels=unique(TAA_cmyc$days))

TAA_ccnd1$days <- as.character(TAA_ccnd1$days)
TAA_ccnd1$days <- factor(TAA_ccnd1$days, levels=unique(TAA_ccnd1$days))

#below, we use ggplot2 to create barplots visualzing the relative fold change in expression for these genes. We actually have data for 6 plots, but I only show 4 here as an example. 

#taa1: This plot shows the fold change for gene c-myc, commonly implicated in cancer pathways. Here, we display data for c-myc at each of four time points (these are on the x-axis), and only at the low dose exposure to Thioacetamide. We set the x-axis as days, and the y-axis is the data for each of these time points only at the low dose exposure to TAA. The `fill=low` inside the `aes()` function makes the bars colored by their exposure level and adds the pretty blue color range. `legend.position = "none"` makes it so the legend is not present, because it's not necessary here. We already have labels on the x-axis. The `geom_hline` adds a red line across the y-intercept at 0 as a reference point. 
taa1 <- ggplot(data=TAA_cmyc,aes(x=days,y=as.numeric(as.character(low)),fill=low)) + 
  geom_bar(stat="identity")+ggbar+ theme(legend.position ="none")+xlab("Days of Exposure") +
  ylab("FC in Gene Expression") +
  ggtitle("TAA Significant Gene Expression \n in c-myc following LOW dose")+ geom_hline(yintercept=0, linetype="dashed", color = "red", size=1.4)

taa2 <- ggplot(data=TAA_cmyc,aes(x=days,y=medium,fill=medium)) + 
  geom_bar(stat="identity")+ggbar+ theme(legend.position ="none")+xlab("Days of Exposure") +
  ylab("FC in Gene Expression") +
  ggtitle("TAA Significant Gene Expression \n in c-myc following MED dose")+ geom_hline(yintercept=0, linetype="dashed", color = "red", size=1.4)

taa3 <- ggplot(data=TAA_cmyc,aes(x=days,y=high,fill=high)) + 
  geom_bar(stat="identity")+ggbar+ theme(legend.position ="none")+xlab("Days of Exposure") +
  ylab("FC in Gene Expression") +
  ggtitle("TAA Significant Gene Expression \n in c-myc following HIGH dose")+ geom_hline(yintercept=0, linetype="dashed", color = "red", size=1.4)

taa4 <- ggplot(data=TAA_ccnd1,aes(x=days,y=low,fill=low)) + 
  geom_bar(stat="identity")+ggbar+ theme(legend.position ="none")+xlab("Days of Exposure") +
  ylab("FC in Gene Expression") +
  ggtitle("TAA Significant Gene Expression \n in ccnd1 following LOW dose") + geom_hline(yintercept=0, linetype="dashed", color = "red", size=1.4)


#grid arrange allows us to put the plots together neatly on a single page
# using `ncol` defines how many columns will be on this page, and `nrow` defines the number of rows. This will apply even if you do not have enough plots to fill that arrangement, and will leave the unfilled spaces empty
grid.arrange(taa1,taa2,taa3, taa4,ncol=2,nrow=3)

Example 2

Here, four plots are created to show the change in expression of four genes following a 24 hour in vivo exposure to Aflatoxin B1.

For ggplot, the use of “scale_x_continuous” was a hacky quick fix for me to re-label the x-axis. I used 1, 2, and 3 as the three dose categories (low, medium, high dose), and ggplot took this as actual numeric values. I did this just to make it easier for myself. Then I used the scale_x_continuous function to tell the plot to break at those three values, and to place the new dose labels over them. The geom_hline function applies a horizontal line on the plot. I specificed the intercept as zero and made it dashed red.

#below: create 4 data frames for the 4 different genes affected by aflatoxin exposure
av <- as.data.frame(cbind(c(1,2,3),c(0.001768,0.148331, 0.681379)))
ap <- as.data.frame(cbind(c(1,2,3),c(-0.437336, -0.487386,  -0.695991)))
al <- as.data.frame(cbind(c(1,2,3),c(-0.181970, -0.248476,  -0.780721)))
ac <- as.data.frame(cbind(c(1,2,3),c(0.318290,  1.221728,   1.852082)))

#below: rename the columns for each of the dataframes (FC = fold change)
colnames(av) <-  c("dose","FC")
colnames(ap) <-  c("dose","FC")
colnames(al) <-  c("dose","FC")
colnames(ac) <-  c("dose","FC")

#AV1 = plot of how gene vhl has a significant fold change in gene expression following 24 hour exposure to aflatoxin B1

av1 <- ggplot(data=av,aes(x=dose,y=FC,fill=FC)) + 
  geom_bar(stat="identity")+ggbar+ theme(legend.position ="none")+xlab("Dose Level") +
  ylab("FC in Gene Expression") +
  ggtitle("AFB1 Sig Gene Expression \nin vhl following 24 hr exposure")+
       scale_x_continuous(breaks=c(1, 2, 3),
                  labels=c("Low", "Medium", "High"))+ geom_hline(yintercept=0, linetype="dashed", color = "red", size=1.4)

#AP1 = plot of how gene pRb has a significant fold change in gene expression following 24 hour exposure to aflatoxin B1
ap1 <- ggplot(data=ap,aes(x=dose,y=FC,fill=FC)) + 
  geom_bar(stat="identity")+ggbar+ theme(legend.position ="none")+xlab("Dose Level") +
  ylab("FC in Gene Expression") +
  ggtitle("AFB1 Sig Gene Expression \nin pRb following 24 hr exposure")+ geom_hline(yintercept=0, linetype="dashed", color = "red", size=1.4)+ylim(-.75,.1)+
       scale_x_continuous(breaks=c(1, 2, 3),
                  labels=c("Low", "Medium", "High"))

#gene lrp affected by Aflatoxin b1
al1 <- ggplot(data=al,aes(x=dose,y=FC,fill=FC)) + 
  geom_bar(stat="identity")+ggbar+ theme(legend.position ="none")+xlab("Dose Level") +
  ylab("FC in Gene Expression") +
  ggtitle("AFB1 Sig Gene Expression \nin lrp following 24 hr exposure")+ geom_hline(yintercept=0, linetype="dashed", color = "red", size=1.4)+ylim(-.8,.1)+
       scale_x_continuous(breaks=c(1, 2, 3),
                  labels=c("Low", "Medium", "High"))

#gene ccnd1 affected by aflatoxin b1
ac1 <- ggplot(data=ac,aes(x=dose,y=FC,fill=FC)) + 
  geom_bar(stat="identity")+ggbar+ theme(legend.position ="none")+xlab("Dose Level") +
  ylab("FC in Gene Expression") +
  ggtitle("AFB1 Sig Gene Expression \nin ccnd1 following 24 hr exposure")+ geom_hline(yintercept=0, linetype="dashed", color = "red", size=1.4)+
       scale_x_continuous(breaks=c(1, 2, 3),
                  labels=c("Low", "Medium", "High"))

#the function below arranges the four plots in a pretty way to a single page
grid.arrange(ac1,av1,al1,ap1,ncol=2)