The ggplot2 package incorporates Hadley Wickham’s Layered Grammar of Graphics to make beautifully-visualized data for further analysis. This code-through project outlines the basics of making a ggplot2 visualization, and uses the mpg dataset included with the ggplot2 library.


1. Defining the Data

For this example, I examine the relationship between highway fuel mileage (mpg) and engine displacement (size, in liters) of vehicles of different types (class).

# Load the ggplot2 data library

library(ggplot2)

# Confirm data in mpg dataset
head(mpg)
## # A tibble: 6 x 11
##   manufacturer model displ  year   cyl trans  drv     cty   hwy fl    class
##   <chr>        <chr> <dbl> <int> <int> <chr>  <chr> <int> <int> <chr> <chr>
## 1 audi         a4      1.8  1999     4 auto(~ f        18    29 p     comp~
## 2 audi         a4      1.8  1999     4 manua~ f        21    29 p     comp~
## 3 audi         a4      2    2008     4 manua~ f        20    31 p     comp~
## 4 audi         a4      2    2008     4 auto(~ f        21    30 p     comp~
## 5 audi         a4      2.8  1999     6 auto(~ f        16    26 p     comp~
## 6 audi         a4      2.8  1999     6 manua~ f        18    26 p     comp~
# Using the ggplot(df) function, add the mpg dataset, aesthetics with aes, and the variables

ggplot(mpg, aes(x=hwy, y=displ, color=class))


You will notice that nothing is plotted. This code lacks geoms, the layer data that make the ggplot2 package so excellent.

2. Adding geoms

geom_point and geom_smooth are the layers that will be used for the X and Y aesthetic. geom_smooth does not have the class color defined in it, and this is so that the classes stay together under the same line.

library(ggplot2)
ggplot(mpg) + geom_point(aes(x=hwy, y=displ, color=class)) + geom_smooth(aes(x=hwy, y=displ)) # add layers
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'



3. Labels

Using the labs() function for labeling, add descriptive detail to the plot by making custom title and axis labels.

library(ggplot2)
mpg.plot <- ggplot(mpg) + geom_point(aes(x=hwy, y=displ, color=class)) + geom_smooth(aes(x=hwy, y=displ)) + labs(title="Engine Size v. Fuel Mileage by Vehicle Type", x="MPG", y="Engine Size (L)")   # Labels added
print(mpg.plot)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'


But, it could still look better.

4. Make it look better with a Theme

Use the vector created in the last block mpg.plot and essentially build on it by adding a nice touch with the theme() function. This allows for more customization and a sharper looking product. Note that scale_color_discrete() is important to use for naming the color-defined legend.

mpg.plot1 <- mpg.plot + theme(plot.title=element_text(size=20, face="bold"),  # title text
                  axis.text.x=element_text(size=12),  # x-axis text
                  axis.text.y=element_text(size=12),  # y-axis text
                  axis.title.x=element_text(size=16),  # x title text
                  axis.title.y=element_text(size=16)) +  # y title text
  scale_color_discrete(name="Vehicle Type")  # legend title
print(mpg.plot1)  # print the plot
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'


Now, that is a good-looking output. It is easy to determine the trend and relationships. Generally, as engine size decreases, highway fuel mileage increases. Compact and Subcompact cars are some of the most fuel efficient, while pickups and SUVs are not. Makes sense.

I hope you enjoyed this code through. If you have any questions please reach out at my email