Introduction

This is a basic demonstration of GGPlot (Grammar of Graphics) graphs.

Dataset

I will be using various datasets:
- MPG data (mpg)
- Diamond data (diamonds)

Graphs

Scatter plot showing engine displacement’s impact on highway consumption, with linear regression line and confidence intervals

g <- ggplot(mpg, aes(displ, hwy))
g+geom_point(color = "darkblue", size = 4, alpha = 1/2) + 
    geom_smooth(method = "lm") + 
    ggtitle("Highway Consumption vs. Displacement") + 
    labs(x = "Displacement", y = "Highway Mileage")

Scatter plot showing engine displacement’s impact on highway consumption, broken down by drive type, with linear regression line and no confidence intervals

g+geom_point(color = "darkgreen", size = 4, alpha = 1/2) + 
    geom_smooth(method = "lm", se = FALSE) + 
    ggtitle("Highway Consumption vs. Displacement, by drive type") + 
    labs(x = "Displacement", y = "Highway Mileage") +
    facet_grid(.~drv)

Scatter plot showing engine displacement’s impact on highway consumption, with drive type indicated by different colours, and using the Times theme

g+geom_point(aes(color = drv), size = 4) + 
    theme_bw(base_family = "Times")

Scatter plot showing engine displacement’s impact on highway consumption, with year of data collection indicated by different colours, and facets added for number of cylinders and drive type

g <- ggplot(mpg, aes(displ,hwy,color = factor(year)))
g+geom_point(size = 4, alpha = 1/2) + 
    facet_grid(drv~cyl, margins = TRUE) + 
    ggtitle("Highway Consumption vs. Displacement, by drive type 
            and dataset year") + 
    labs(x = "Displacement", y = "Highway Mileage")

Exploring the diamond dataset
We look at the structure and top rows of the dataset to look at data structures and see which variables are available:

head(diamonds, 5)
## # A tibble: 5 x 10
##   carat     cut color clarity depth table price     x     y     z
##   <dbl>   <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23   Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
## 3  0.23    Good     E     VS1  56.9    65   327  4.05  4.07  2.31
## 4  0.29 Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
## 5  0.31    Good     J     SI2  63.3    58   335  4.34  4.35  2.75
summary(diamonds)
##      carat               cut        color        clarity     
##  Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065  
##  1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258  
##  Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194  
##  Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171  
##  3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066  
##  Max.   :5.0100                     I: 5422   VVS1   : 3655  
##                                     J: 2808   (Other): 2531  
##      depth           table           price             x         
##  Min.   :43.00   Min.   :43.00   Min.   :  326   Min.   : 0.000  
##  1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710  
##  Median :61.80   Median :57.00   Median : 2401   Median : 5.700  
##  Mean   :61.75   Mean   :57.46   Mean   : 3933   Mean   : 5.731  
##  3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540  
##  Max.   :79.00   Max.   :95.00   Max.   :18823   Max.   :10.740  
##                                                                  
##        y                z         
##  Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 4.720   1st Qu.: 2.910  
##  Median : 5.710   Median : 3.530  
##  Mean   : 5.735   Mean   : 3.539  
##  3rd Qu.: 6.540   3rd Qu.: 4.040  
##  Max.   :58.900   Max.   :31.800  
## 
str(diamonds)
## Classes 'tbl_df', 'tbl' and 'data.frame':    53940 obs. of  10 variables:
##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

We do a basic scatter plot of diamond depth and price to see what the basic representation of the data looks like:

g <- ggplot(diamonds, aes(depth, price))
g + geom_point(alpha = 1/3, color = "blue", size = 3) + 
    ggtitle("Diamond Price by Colour Depth") + 
    labs(x = "Colour Depth", y = "Price")

Faceted scatter plot showing diamon colour depth vs. price, sub-faceted by cut and quantile-grouping of carats
First, cut points are calculated by quantiles of the carat variable, and then a secornd carat variable is created using the cut points.

cutpoints <- quantile(diamonds$carat, seq(0, 1, length = 4), na.rm = TRUE)
diamonds$car2 <- cut(diamonds$carat, cutpoints)

We then compile a faceted scatter plot showing all relevant facets, considering colour depth vs. price.

g <- ggplot(diamonds, aes(depth, price))
g + geom_point(alpha = 1/4, color = "darkgreen", size = 2) + 
    facet_grid(car2 ~ cut, margins = TRUE) + 
    ggtitle("Diamond Price by Colour Depth, 
                faceted by cut (horizontal) and carat quantiles (vertical)") + 
    labs(x = "Colour Depth", y = "Price")

Boxplot of diamond price by weight in carat, and faceted by the diamond cut factor

ggplot(diamonds, aes(carat, price, color = cut)) + 
    geom_boxplot() + 
    facet_grid(. ~ cut) +
    ggtitle("Diamond price by weight, and weight grouped by cut quality") + 
    labs(x = "Weight (carat)", y = "Price")