This plot explains how to make balloon plots in R. The data is for the expression of a single gene in different strains of bacteria (user generated data, not from original experiment). It could very well be the number of colonies but the same approach can be used for several types of biological or non-biological data, and the plots are very informative.
We will be using reshape and ggplot2 libraries. In the later part while using reshape, i will explain the difference between long and wide form of data as well.
library(reshape2)
## Warning: package 'reshape2' was built under R version 3.1.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
Next we will load the file and print it on console to see how the data looks like.
balloon<-read.csv("D:\\Balloon\\balloon.csv")
balloon
## Genera S1 S3 S4 S5 S6 S2 S7 S8 S9 S10
## 1 Prevotella 97 5370 7892 23 8310 7379 6238 20019 153 20103
## 2 Treponema 28 3760 8004 79 3379 14662 18977 18674 12 29234
## 3 Fusobacterium 11 5551 11017 30 38058 10085 2674 15306 23 10857
## 4 Selenomonas 40 2087 19712 7 1133 148 2198 1472 36 2869
## 5 Veillonella 5 533 5115 0 2506 1502 27 1898 15 4923
## 6 Porphyromonas 13 873 2695 34 17811 5222 2999 9600 15 14206
## 7 Streptococcus 10 1330 7451 0 12103 1010 174 1683 6 1415
## 8 Leptotrichia 24 5877 13611 2 403 2463 1197 2221 41 4574
## 9 Aggregatibacter 0 1213 301 2 668 4790 5268 3435 0 649
## 10 Succiniclasticum 16 44 2557 0 3 28 5 1109 30 2160
This is the wide form of the data. To convert it in the long form, we will use melt command from reshape2 library. This will convert the data into long form using Genera as id variable.
balloon_melted<-melt(balloon)
## Using Genera as id variables
head(balloon_melted)
## Genera variable value
## 1 Prevotella S1 97
## 2 Treponema S1 28
## 3 Fusobacterium S1 11
## 4 Selenomonas S1 40
## 5 Veillonella S1 5
## 6 Porphyromonas S1 13
In the final part, use the ggplot2 library to make the balloon plots. Also, we want to specify that the size of the balloons should be propotionate to the values. We are using geom_point since each of the balloon basically represents a datapoint on a scatter plot.
p <- ggplot(balloon_melted, aes(x =variable, y = Genera))
p+geom_point( aes(size=value))+theme(panel.background=element_blank(), panel.border = element_rect(colour = "blue", fill=NA, size=1))
Now, i want to make 3 changes to the plot. I want to arrange the genera alphabetically, change color of balloons, and put labels on x and y axes. Here is how the final code with all the changes looks like.
library(reshape2)
library(ggplot2)
#Load file
balloon<-read.csv("D:\\Sree\\sree.csv")
#Arrange the genera in alphabetical order
balloon$Genera <- factor(balloon$Genera, levels = unique(balloon$Genera))
#Melt the data from wide to long format
balloon_melted<-melt(balloon,sort=F)
## Using Genera as id variables
p <- ggplot(balloon_melted, aes(x =variable, y = Genera))
p+geom_point( aes(size=value),shape=21, colour="black", fill="skyblue")+
theme(panel.background=element_blank(), panel.border = element_rect(colour = "blue", fill=NA, size=1))+
scale_size_area(max_size=20)+
#Add labels to axes
labs(x="Expression", y="Genera")