This is a basic demonstration of GGPlot (Grammar of Graphics) graphs.
I will be using various datasets:
- MPG data (mpg)
- Diamond data (diamonds)
Scatter plot showing engine displacement’s impact on highway consumption, with linear regression line and confidence intervals
g <- ggplot(mpg, aes(displ, hwy))
g+geom_point(color = "darkblue", size = 4, alpha = 1/2) +
geom_smooth(method = "lm") +
ggtitle("Highway Consumption vs. Displacement") +
labs(x = "Displacement", y = "Highway Mileage")
Scatter plot showing engine displacement’s impact on highway consumption, broken down by drive type, with linear regression line and no confidence intervals
g+geom_point(color = "darkgreen", size = 4, alpha = 1/2) +
geom_smooth(method = "lm", se = FALSE) +
ggtitle("Highway Consumption vs. Displacement, by drive type") +
labs(x = "Displacement", y = "Highway Mileage") +
facet_grid(.~drv)
Scatter plot showing engine displacement’s impact on highway consumption, with drive type indicated by different colours, and using the Times theme
g+geom_point(aes(color = drv), size = 4) +
theme_bw(base_family = "Times")
Scatter plot showing engine displacement’s impact on highway consumption, with year of data collection indicated by different colours, and facets added for number of cylinders and drive type
g <- ggplot(mpg, aes(displ,hwy,color = factor(year)))
g+geom_point(size = 4, alpha = 1/2) +
facet_grid(drv~cyl, margins = TRUE) +
ggtitle("Highway Consumption vs. Displacement, by drive type
and dataset year") +
labs(x = "Displacement", y = "Highway Mileage")
Exploring the diamond dataset
We look at the structure and top rows of the dataset to look at data structures and see which variables are available:
head(diamonds, 5)
## # A tibble: 5 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
summary(diamonds)
## carat cut color clarity
## Min. :0.2000 Fair : 1610 D: 6775 SI1 :13065
## 1st Qu.:0.4000 Good : 4906 E: 9797 VS2 :12258
## Median :0.7000 Very Good:12082 F: 9542 SI2 : 9194
## Mean :0.7979 Premium :13791 G:11292 VS1 : 8171
## 3rd Qu.:1.0400 Ideal :21551 H: 8304 VVS2 : 5066
## Max. :5.0100 I: 5422 VVS1 : 3655
## J: 2808 (Other): 2531
## depth table price x
## Min. :43.00 Min. :43.00 Min. : 326 Min. : 0.000
## 1st Qu.:61.00 1st Qu.:56.00 1st Qu.: 950 1st Qu.: 4.710
## Median :61.80 Median :57.00 Median : 2401 Median : 5.700
## Mean :61.75 Mean :57.46 Mean : 3933 Mean : 5.731
## 3rd Qu.:62.50 3rd Qu.:59.00 3rd Qu.: 5324 3rd Qu.: 6.540
## Max. :79.00 Max. :95.00 Max. :18823 Max. :10.740
##
## y z
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 4.720 1st Qu.: 2.910
## Median : 5.710 Median : 3.530
## Mean : 5.735 Mean : 3.539
## 3rd Qu.: 6.540 3rd Qu.: 4.040
## Max. :58.900 Max. :31.800
##
str(diamonds)
## Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables:
## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
We do a basic scatter plot of diamond depth and price to see what the basic representation of the data looks like:
g <- ggplot(diamonds, aes(depth, price))
g + geom_point(alpha = 1/3, color = "blue", size = 3) +
ggtitle("Diamond Price by Colour Depth") +
labs(x = "Colour Depth", y = "Price")
Faceted scatter plot showing diamon colour depth vs. price, sub-faceted by cut and quantile-grouping of carats
First, cut points are calculated by quantiles of the carat variable, and then a secornd carat variable is created using the cut points.
cutpoints <- quantile(diamonds$carat, seq(0, 1, length = 4), na.rm = TRUE)
diamonds$car2 <- cut(diamonds$carat, cutpoints)
We then compile a faceted scatter plot showing all relevant facets, considering colour depth vs. price.
g <- ggplot(diamonds, aes(depth, price))
g + geom_point(alpha = 1/4, color = "darkgreen", size = 2) +
facet_grid(car2 ~ cut, margins = TRUE) +
ggtitle("Diamond Price by Colour Depth,
faceted by cut (horizontal) and carat quantiles (vertical)") +
labs(x = "Colour Depth", y = "Price")
Boxplot of diamond price by weight in carat, and faceted by the diamond cut factor
ggplot(diamonds, aes(carat, price, color = cut)) +
geom_boxplot() +
facet_grid(. ~ cut) +
ggtitle("Diamond price by weight, and weight grouped by cut quality") +
labs(x = "Weight (carat)", y = "Price")