This plot explains how to make balloon plots in R. The data is for the expression of a single gene in different strains of bacteria (user generated data, not from original experiment). It could very well be the number of colonies but the same approach can be used for several types of biological or non-biological data, and the plots are very informative.

We will be using reshape and ggplot2 libraries. In the later part while using reshape, i will explain the difference between long and wide form of data as well.

library(reshape2)
## Warning: package 'reshape2' was built under R version 3.1.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3

Next we will load the file and print it on console to see how the data looks like.

balloon<-read.csv("D:\\Balloon\\balloon.csv")
balloon
##              Genera S1   S3    S4 S5    S6    S2    S7    S8  S9   S10
## 1        Prevotella 97 5370  7892 23  8310  7379  6238 20019 153 20103
## 2         Treponema 28 3760  8004 79  3379 14662 18977 18674  12 29234
## 3     Fusobacterium 11 5551 11017 30 38058 10085  2674 15306  23 10857
## 4       Selenomonas 40 2087 19712  7  1133   148  2198  1472  36  2869
## 5       Veillonella  5  533  5115  0  2506  1502    27  1898  15  4923
## 6     Porphyromonas 13  873  2695 34 17811  5222  2999  9600  15 14206
## 7     Streptococcus 10 1330  7451  0 12103  1010   174  1683   6  1415
## 8      Leptotrichia 24 5877 13611  2   403  2463  1197  2221  41  4574
## 9   Aggregatibacter  0 1213   301  2   668  4790  5268  3435   0   649
## 10 Succiniclasticum 16   44  2557  0     3    28     5  1109  30  2160

This is the wide form of the data. To convert it in the long form, we will use melt command from reshape2 library. This will convert the data into long form using Genera as id variable.

balloon_melted<-melt(balloon)
## Using Genera as id variables
head(balloon_melted)
##          Genera variable value
## 1    Prevotella       S1    97
## 2     Treponema       S1    28
## 3 Fusobacterium       S1    11
## 4   Selenomonas       S1    40
## 5   Veillonella       S1     5
## 6 Porphyromonas       S1    13

In the final part, use the ggplot2 library to make the balloon plots. Also, we want to specify that the size of the balloons should be propotionate to the values. We are using geom_point since each of the balloon basically represents a datapoint on a scatter plot.

p <- ggplot(balloon_melted, aes(x =variable, y = Genera)) 
p+geom_point( aes(size=value))+theme(panel.background=element_blank(), panel.border = element_rect(colour = "blue", fill=NA, size=1))

Now, i want to make 3 changes to the plot. I want to arrange the genera alphabetically, change color of balloons, and put labels on x and y axes. Here is how the final code with all the changes looks like.

library(reshape2)
library(ggplot2)
#Load file
balloon<-read.csv("D:\\Sree\\sree.csv")
#Arrange the genera in alphabetical order
balloon$Genera <- factor(balloon$Genera, levels = unique(balloon$Genera))
#Melt the data from wide to long format
balloon_melted<-melt(balloon,sort=F)
## Using Genera as id variables
p <- ggplot(balloon_melted, aes(x =variable, y = Genera)) 
p+geom_point( aes(size=value),shape=21, colour="black", fill="skyblue")+
  theme(panel.background=element_blank(), panel.border = element_rect(colour = "blue", fill=NA, size=1))+
  scale_size_area(max_size=20)+
  #Add labels to axes
  labs(x="Expression", y="Genera")