ggplot uses the concept of Grammar of Graphics (Willinson 2005) to incrementally build a plot to your specification. Aesthetic mapping combined with other capabilities in ggplot can offer a number of powerful and easy to use graphics formatting options. In this vignette, we will be creating scatter plots with a focus on adding visual sophistication.
The following libraries will be used in examples.
library(dplyr)
library(ggthemes)
library(ggplot2)
data("mpg")
glimpse(mpg)
## Observations: 234
## Variables: 11
## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "...
## $ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 qua...
## $ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0,...
## $ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1...
## $ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6...
## $ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)...
## $ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4",...
## $ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 1...
## $ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 2...
## $ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p",...
## $ class <chr> "compact", "compact", "compact", "compact", "comp...
In this example, we will be using the Fuel economy data from 1999 and 2008 for 38 popular models of cars.
Now let’s use ggplot to create a scatter plot for miles per gallon based on engine capacity.
# city miles depending on the engine capacity
ggplot(data=mpg, aes(x=displ, y=cty)) +
geom_point()
As you can see, the plot is visually not very appealing. Now let’s take a closer look at the properties of aes
. x and y properties are required which are the x and y coordinates of the plot but other properties are optional. Other properties include;
In the plot below, we are dynamically changing the colour
property based on number of cylinders.
# city miles depending on the engine capacity categorised by no. of cylinders
ggplot(data=mpg, aes(x=displ, y=cty)) +
geom_point(aes(colour = factor(cyl)))
There are 25 different shapes for data points. These shapes can be evaluated by using the following R code.
# 25 shapes
ggplot(data.frame(x = 1:5 , y = 1:25, z = 1:25), aes(x, y)) +
geom_point(aes(shape = z), size = 4, colour = "Black", fill = "Green") +
scale_shape_identity()
To use fill
you must use shapes 21-25 a shown in green above. We will use the fill
property in the next section.
Now let’s use the shape to distinguish different cylinder types. We will also use the size
property to make the data points larger.
ggplot(data=mpg, aes(x=displ, y=cty)) +
geom_point(aes(colour = factor(cyl), shape=factor(cyl)),size = 3)
To use variables, the property must be within the aes
statement as shown above. Fixed properties must be outside the aes
statement.
Now let’s combine the shape
and fill
properties. Note that as we are using fixed values and these properties are outside the aes
statement. We cannot use a variable for shape
as fill
is only supported by the 21-25 shapes. If we used a variable, as shapes are assigned incrementally, to represent 4 cylinder types, shapes 1-4 are used which do not support fill.
ggplot(data=mpg, aes(x=displ, y=cty)) +
geom_point(aes(colour = factor(cyl)), shape=21, size = 3, fill = "Black" )
The line colour of the data points are not very clear so we can add the stroke property.
ggplot(data=mpg, aes(x=displ, y=cty)) +
geom_point(aes(colour = factor(cyl)),size = 3, shape=21, fill = "Black", stroke = 2)
Setting transparency can be useful for large data sets. Here we are using the alpha
property to highlight vehicle class.
ggplot(data=mpg, aes(x=displ, y=cty)) +
geom_point(aes(colour = factor(cyl), alpha =class),size = 3, stroke = 1)
Now that we have our scatter plot formatted, let’s look to add a best fit line.
ggplot(data=mpg, aes(x=displ, y=cty)) +
geom_point(aes(colour = factor(cyl), alpha =class),size = 3, stroke = 1) +
geom_smooth(method = "loess")
You can ignore the standard error by using geom_smooth(method= "loess", se=FALSE)
Using ggplot themes you can quickly enhance the overall appearance of your plot. Below we are using the theme_dark
. Let’s also add a title and rename the x
, y
axis and the legends.
ggplot(data=mpg, aes(x=displ, y=cty)) +
geom_point(aes(colour = factor(cyl), alpha =class),size = 3, stroke = 1) +
geom_smooth(method= "loess") +
ggtitle("City Miles Based on Vehicle Cylinder Type\n") +
labs(x="Cylinder Size",y="City miles per gallon\n", alpha="Vehicle Class", colour="No. of cylinders") +
theme_dark()