Introduction to ggplot2
Ggplot2 is now one of the most popular graphics systems using in R, which was created by Hadey Wickham. This graphic system is based on the grammar of graphics, which allows users to place multiple layers in the same plot.
These tutorials are somewhat based on the structure of ggplot2 visualization course from datacamp website
As you may know that ggplot2 is designed to build a plot with different layers like various spatial map layers stacked together if you are familar with geospatial data. We first define ggplot() to be a base layer, then different geometrical layers can be placed on the base layer using geom_ or stat_.
Now it is time to get some hands-on experience with ggplot2
Getting to know dataset
In this tutorial, there are various datasets such as diamonds, mtcars and cars 93 from packages like ggplot2, MASS and caret
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
# Constructing a scatter plot Petal length vs Petal width
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.2.5
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Warning: package 'ggplot2' was built under R version 3.2.5
## Warning: package 'tibble' was built under R version 3.2.5
## Warning: package 'tidyr' was built under R version 3.2.5
## Warning: package 'readr' was built under R version 3.2.5
## Warning: package 'purrr' was built under R version 3.2.5
## Warning: package 'dplyr' was built under R version 3.2.5
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
library(MASS)
## Warning: package 'MASS' was built under R version 3.2.5
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
library(caret)
## Warning: package 'caret' was built under R version 3.2.5
## Loading required package: lattice
## Warning: package 'lattice' was built under R version 3.2.5
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
p<-ggplot(data=iris,aes(x=Petal.Length,y=Petal.Width))
p + geom_point() + geom_smooth(se=F) +theme_bw()
## `geom_smooth()` using method = 'loess'
# se=F means that it will remove 95% CI
The scatter plot can be enhanced by adding color, shape, size and title
p + geom_point(aes(color=Species)) + xlab("Petal Length") + ylab("Petal Width") + ggtitle("Petal Length vs Petal Width by Species")
# Smooth line can be added by passing the geom_smooth() function
p + geom_point(aes(color=Species)) + xlab("Petal Length") + ylab("Petal Width") + ggtitle("Petal Length vs Petal Width by Species") + geom_smooth(se=F,method="lm")
# method="lm" means that linear regression model
Building a plot which shows various regression lines
p + geom_point(aes(color=Species),size=1) + xlab("Petal Length") + ylab("Petal Width") + ggtitle("Petal Length vs Petal Width by Species") + geom_smooth(method="lm",aes(col="All"),group=1,se=F) + geom_smooth(method="loess",aes(color=Species),se=F)
Stats outside Geom_
# stat_summary() function
p1<-ggplot(data=iris,aes(x=Species,y=Petal.Length))
p1 + xlab("Species") + ylab("Petal Length") + ggtitle("Errorbar Plot") + stat_summary(fun.data=mean_sdl,fun.args = list(mult=1),col=4) + theme(plot.title = element_text(hjust=0.5))
# adding y mean and errobar (mean and sd)
p2<-ggplot(data=iris,aes(x=Species,y=Petal.Length,color=Species))
p2 + stat_summary(fun.y = mean,geom = "point") + stat_summary(fun.data = mean_sdl,fun.args = list(mult=1),geom="errorbar",width=0.1)
# geom="point" can be replaced by "bar"
We sometimes want to zoom in certain portion of the plot. For this purpose, ggplot2 is built a various functions for this job.
p3<-ggplot(data=iris,aes(y=Petal.Length,x=Petal.Width))
p3 +geom_point(aes(color=Species))
# To zoom x axis only for setosa, this can be achieved by scale_x_continuous(limits=..), xlim() or coord_cartesian
p3 +geom_point(aes(color=Species)) + coord_cartesian(xlim=c(0,0.8)) + ggtitle("Zoom in Example")
Facet
# facet with columns
p3 +geom_point(aes(color=Species)) + facet_grid(~Species)
# facet with rows
p3 +geom_point(aes(color=Species)) + facet_grid(Species~.)
# In this case, facet with column may be more appropriate
library(RColorBrewer)
myCol <- rbind(brewer.pal(9, "Blues")[c(3,6,8)],
brewer.pal(9, "Reds")[c(3,6,8)])
p3 +geom_point(aes(color=Species)) + scale_color_manual(values = myCol)
Adding or modifying text,line and rectangular using theme() function
theme( title=element_text() text=element_text() plot.title=element_text() legend.text=element_text() legend.title=element_text() axis.title=element_text() axis.text.x=element_text() axis.text=element_text() axis.title.text=element_text() strip.text.x=element_text() strip.text=element_text() legend.position=“bottom” …)
theme(line=element_line() axis.ticks=element_line() axis.ticks.x=element_line() axis.line.x=element_line() …)
*geom_errorbar
p4<- ggplot(data=mtcars,aes(x=cyl,y=hp),fill="blue",alpha=0.4)
p4 + geom_bar(stat="identity") + geom_errorbar(aes(ymin=min(hp)-sd(hp),ymax=max(hp)+sd(hp)),width=0.1)
Conclusion
Ggplot2 is very powerful tool for data visualization