The ggplot2 package incorporates Hadley Wickham’s Layered Grammar of Graphics to make beautifully-visualized data for further analysis. This code-through project outlines the basics of making a ggplot2 visualization, and uses the mpg dataset included with the ggplot2 library.
—
1. Defining the Data
For this example, I examine the relationship between highway fuel mileage (mpg) and engine displacement (size, in liters) of vehicles of different types (class).
# Load the ggplot2 data library
library(ggplot2)
# Confirm data in mpg dataset
head(mpg)
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(~ f 18 29 p comp~
## 2 audi a4 1.8 1999 4 manua~ f 21 29 p comp~
## 3 audi a4 2 2008 4 manua~ f 20 31 p comp~
## 4 audi a4 2 2008 4 auto(~ f 21 30 p comp~
## 5 audi a4 2.8 1999 6 auto(~ f 16 26 p comp~
## 6 audi a4 2.8 1999 6 manua~ f 18 26 p comp~
# Using the ggplot(df) function, add the mpg dataset, aesthetics with aes, and the variables
ggplot(mpg, aes(x=hwy, y=displ, color=class))
You will notice that nothing is plotted. This code lacks geoms, the layer data that make the ggplot2 package so excellent.
2. Adding geoms
geom_point and geom_smooth are the layers that will be used for the X and Y aesthetic. geom_smooth does not have the class color defined in it, and this is so that the classes stay together under the same line.
library(ggplot2)
ggplot(mpg) + geom_point(aes(x=hwy, y=displ, color=class)) + geom_smooth(aes(x=hwy, y=displ)) # add layers
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
3. Labels
Using the labs() function for labeling, add descriptive detail to the plot by making custom title and axis labels.
library(ggplot2)
mpg.plot <- ggplot(mpg) + geom_point(aes(x=hwy, y=displ, color=class)) + geom_smooth(aes(x=hwy, y=displ)) + labs(title="Engine Size v. Fuel Mileage by Vehicle Type", x="MPG", y="Engine Size (L)") # Labels added
print(mpg.plot)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
But, it could still look better.
4. Make it look better with a Theme
Use the vector created in the last block mpg.plot and essentially build on it by adding a nice touch with the theme() function. This allows for more customization and a sharper looking product. Note that scale_color_discrete() is important to use for naming the color-defined legend.
mpg.plot1 <- mpg.plot + theme(plot.title=element_text(size=20, face="bold"), # title text
axis.text.x=element_text(size=12), # x-axis text
axis.text.y=element_text(size=12), # y-axis text
axis.title.x=element_text(size=16), # x title text
axis.title.y=element_text(size=16)) + # y title text
scale_color_discrete(name="Vehicle Type") # legend title
print(mpg.plot1) # print the plot
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Now, that is a good-looking output. It is easy to determine the trend and relationships. Generally, as engine size decreases, highway fuel mileage increases. Compact and Subcompact cars are some of the most fuel efficient, while pickups and SUVs are not. Makes sense.
—
I hope you enjoyed this code through. If you have any questions please reach out at my email