Preamble
The aim of this notes is to get familiar with grammar of graphics to make use of ggplot2. We shall look at the following aspects
- Obtaining graph
- Modifying Aesthetics
- Faceting
ggplot2
ggplot2 is one of the packages which comes with tidyverse(A package for Data Science) The purpose of visualization is getting insight about data. Data visualization have made simple using ggplot2.
ggplot2 works based on the idea called grammar of graphics, which means we have to consider following components while creating plots.
- Data
Dataset which we use - Aesthetics
Customising look and feel - Geometry
Scaling the plot in a way that it displays data better - Facet
Grouping data based on various characteristics so that individual impacts are determined
We shall look at how to visualize data using examples. For this let us Consider broad data types
- Metric data
- Categorical data
We shall see how to make beautiful visualizations based on data
We need to install and load ggplot2 package
library(ggplot2)Dataset used
ggplot2 package have a data called diamonds, let us use the data for visualization
data("diamonds", package = "ggplot2") The data provides prices of over 50,000 round cut diamonds
| Variable | Description | Type |
|---|---|---|
| price | price of diamond in US dollars ($326–$18,823) | Numeric |
| carat | weight of the diamond (0.2–5.01) | Numeric |
| cut | quality of the cut | Str |
| color | diamond colour, from D (best) to J (worst) | Str |
| clarity | a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) | Str |
| x | length in mm (0–10.74) | Numeric |
| y | width in mm (0–58.9) | Numeric |
| z | depth in mm (0–31.8) | Numeric |
| depth | total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79) | Numeric |
| table | width of top of diamond relative to widest point (43–95) | Numeric |
Metric variable
Initial ggplot
For creating a plot using ggplot2 we need to provide data in ggplot(data,aes(x=variable we wish to display by x axis,y=variable we wish to display by y axis))
Then we specify the geometry of plot
Scatter plot
We wish to plot a single metric variable, for example, “price”, for this we require an index set to create the plot So we consider {a sequence 1,2,…..437} 437 is the length of the variable considered
x= seq(1,length(diamonds$price))Step 1
ggplot(diamonds, aes(x=x,y=price)) Step 2 Choosing a geometry to plot here we use points
ggplot(diamonds, aes(x=x,y=price))+geom_point() Customising aesthetics to plot
Three options are available for us to modify
- size
- shape
- color
ggplot(diamonds, aes(x=x,y=price))+geom_point() +
geom_point(size=4,shape=2,color='blue') A plot can be saved so that it can be retrieved, let’s see how we can use this
#saving as a variable g
g=ggplot(diamonds, aes(x=depth,y=price))+geom_point() +
geom_point(size=4,shape=4,color='blue') Title and axes specification
g1=g+ labs(title="Diamond data",
subtitle = "cut depth vs price ",
x="depth",
y="Price",caption="Scatter plot")
g1Aesthetics for axes labels and plot titles can be customized using theme()
- face, sets the font face (“plain”, “italic”, “bold”, “bold.italic”)
g3=g1+theme(
plot.title = element_text(color="red", size=14, face="bold.italic"),
plot.subtitle = element_text(color="green4", size=14, face="italic"),
axis.title.x = element_text(color="blue", size=14, face="bold"),
axis.title.y = element_text(color="brown", size=14, face="bold"),
plot.caption = element_text(color="steelblue", size=14, face="italic"))
g3Title positioning, tick mark specification can be done using following arguments in theme
hjustandvjustare used for horizontal / vertical justificationvjust controls the vertical spacing between title (or label) and plot.
hjust controls the horizontal spacing. Setting it to 0.5 centers the title.
angle is used to change the orientation
g1+
theme(plot.title=element_text(hjust=0.5,vjust=0.5,
color="tomato",size=14, face="bold.italic"),
plot.subtitle=element_text(size=15, face="bold",hjust=0.5),
plot.caption=element_text(size=15,face="bold.italic",color="blue3"),
axis.title.x=element_text(vjust=10, hjust=0.1,size=10),
axis.title.y=element_text(size=15,angle = 270),
axis.text.x=element_text(size=10, angle = 30,vjust=.5),
axis.text.y=element_text(size=10)) Axes limits can be modified using coord_cartesian
g3+
coord_cartesian(xlim=c(55, 80),ylim=c(400, 500))We can customise Axis break points using scale_x_continuous() and scale_y_continuous()
g3+scale_x_continuous(breaks=seq(40,80,2))g3+scale_y_continuous(breaks=seq(0,20000,1000))Facet
In some Scenario we may wish to look at how data represents various groups of objects, for this we use facet option in ggplot. here we wish to know how the price is distributed for combination of color,cut and clarity. We use histogram for this
facet_plot=ggplot(diamonds,aes(price,fill=color))+
geom_histogram()+facet_grid(clarity~cut)
facet_plotCategorical variable
If we wish to know about single categorical variable, counting the number of times particular category is present will be our action of interest. To Visualize this we use bar plot
ggplot(diamonds,aes(x=cut))+
geom_bar()Changing color and displaying the count on the box and giving title, subtitle and axis labels can be done using
ggplot(diamonds,aes(x=cut))+
geom_bar(col="red2",fill="green3")+
geom_text(aes(label = ..count..), stat = "count",
vjust=1, size=3.5,color='yellow')+
labs(title="Distribution of cut quality",
subtitle = "",
x="Cut category",
y="Count")Horizontal bar plot can also be obtained
ggplot(diamonds,aes(x=cut))+
geom_bar(col="red2",fill="green3")+
coord_flip()Grouped bar chart can be obtained using following command, here we use ‘color’ for grouping
ggplot(diamonds,aes(x=cut,fill=as.factor(color)))+
geom_bar() +
labs(title="Distribution of Cut quality",
subtitle = "Group: Color",
x="Cut category",
y="Count")To compare data over two categories we can use stacked bar chart, it can be obtained using
ggplot(diamonds,aes(x=cut,fill=as.factor(color)))+
geom_bar(position=position_dodge()) +
labs(title="Distribution of Cut quality",
subtitle = "Group: Color",
x="Cut category",
y="Count")Pie diagram can be obrained using
g<-ggplot(diamonds,
aes(x = factor(""), fill = cut) ) +
geom_bar() +
coord_polar(theta = "y") +
scale_x_discrete("")
gColors can be customised using
g+scale_fill_brewer(palette = "Set2")Removal of White circle and plot titles
ggplot(diamonds,
aes(x = factor(""), fill = cut) ) +
geom_bar() +
coord_polar(theta = "y") +
scale_x_discrete("")+
scale_y_discrete("")+
theme(axis.ticks=element_blank(),
axis.title=element_blank(),
axis.text.y=element_blank(),
axis.text.x=element_blank(),
panel.grid = element_blank(),
legend.position = "bottom")+
labs(title = "Cut category",
caption="Diamonds dataset")Final note
This notes provides fundamental structure of ggplot2 providing example for
- Initializing plot
- Customising aesthetics
- Title and Axes labeling
- Faceting
The same rules can be applied to produce various graphs of interest. Based on the requirement we may produce suitable visualization, above all these we must try to get insight about data that is our aim.