#install.packages("ggplot2")
#install.packages('ggrepel')
#install.packages('ggthemes')
#install.packages('scales')
#install.packages('plotly')
#install.packages('lattice')
#install.packages('GGally')
#install.packages("dplyr")
#install.packages("tidyverse")
#install.packages('ggtext')
#install.packages("glue")
#install.packages("tibble")
#install.packages("skimr")
#install.packages("gapminder")
library(ggplot2) #visualization
library(ggrepel) #labels for data
library(ggthemes) #collections of themes
library(scales) # scale
library(plotly) # interactive chart
library(GGally) # correlation
library(dplyr) # data transformation
library(tidyverse) # mega package containing 8 packages
library(ggtext) # for text visualization
library(glue) # combining multiple component
library(gapminder)
library(tibble)
library(skimr)Module 3-1-Principle - Data Visualization with ggplot2 in R
Expected Learning Outcomes
After taking this workshop, participants should be able to do following:
Explain the concept of the grammar of graphics when visualizing data with the ggplot2 package.
Be familiar with various types of charts.
Visualize data in counts and proportions.
Select appropriate charts based on strategic considerations (e.g., the characteristics of the data and audience).
Create a chart that involves one or two variables with either categorical or continuous data.
Create a chart by adding a categorical moderator (3rd variable) to the chart involving two or three variables.
Create correlation charts.
Read charts and generate insights.
Describe three popular packages that allow one to visualize data.
Explain the concept of the grammar of graphics when visualizing data with the ggplot2 package. ## Loading Packages
1. Understand mtcars data
1.1 Using Help
A data frame with 32 observations on 11 (numeric) variables.
| [, 1] | mpg |
Miles/(US) gallon |
| [, 2] | cyl |
Number of cylinders |
| [, 3] | disp |
Displacement (cu.in.) |
| [, 4] | hp |
Gross horsepower |
| [, 5] | drat |
Rear axle ratio |
| [, 6] | wt |
Weight (1000 lbs) |
| [, 7] | qsec |
1/4 mile time |
| [, 8] | vs |
Engine (0 = V-shaped, 1 = straight) |
| [, 9] | am |
Transmission (0 = automatic, 1 = manual) |
| [,10] | gear |
Number of forward gears |
| [,11] | carb |
Number of carburetors |
1.2 Reading data and converting to a tibble (cars)
[1] "data.frame"
# A tibble: 32 × 12
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
11 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4
12 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
13 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
14 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
15 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
16 10.4 8 460 215 3 5.42 17.8 0 0 3 4
17 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
18 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
19 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
20 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
model
<chr>
1 Mazda RX4
2 Mazda RX4 Wag
3 Datsun 710
4 Hornet 4 Drive
5 Hornet Sportabout
6 Valiant
7 Duster 360
8 Merc 240D
9 Merc 230
10 Merc 280
11 Merc 280C
12 Merc 450SE
13 Merc 450SL
14 Merc 450SLC
15 Cadillac Fleetwood
16 Lincoln Continental
17 Chrysler Imperial
18 Fiat 128
19 Honda Civic
20 Toyota Corolla
# ℹ 12 more rows
1.3 Simple Descriptive Statistics
shortcut for code chunk: ctl + alt + i
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb model
Min. :0.0000 Min. :3.000 Min. :1.000 Length:32
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000 Class :character
Median :0.0000 Median :4.000 Median :2.000 Mode :character
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
Rows: 32
Columns: 12
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8…
$ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 1…
$ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 18…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92…
$ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 1…
$ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0…
$ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2…
$ model <chr> "Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "H…
| Name | cars |
| Number of rows | 32 |
| Number of columns | 12 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 11 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| model | 0 | 1 | 7 | 19 | 0 | 32 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| mpg | 0 | 1 | 20.09 | 6.03 | 10.40 | 15.43 | 19.20 | 22.80 | 33.90 | ▃▇▅▁▂ |
| cyl | 0 | 1 | 6.19 | 1.79 | 4.00 | 4.00 | 6.00 | 8.00 | 8.00 | ▆▁▃▁▇ |
| disp | 0 | 1 | 230.72 | 123.94 | 71.10 | 120.83 | 196.30 | 326.00 | 472.00 | ▇▃▃▃▂ |
| hp | 0 | 1 | 146.69 | 68.56 | 52.00 | 96.50 | 123.00 | 180.00 | 335.00 | ▇▇▆▃▁ |
| drat | 0 | 1 | 3.60 | 0.53 | 2.76 | 3.08 | 3.70 | 3.92 | 4.93 | ▇▃▇▅▁ |
| wt | 0 | 1 | 3.22 | 0.98 | 1.51 | 2.58 | 3.33 | 3.61 | 5.42 | ▃▃▇▁▂ |
| qsec | 0 | 1 | 17.85 | 1.79 | 14.50 | 16.89 | 17.71 | 18.90 | 22.90 | ▃▇▇▂▁ |
| vs | 0 | 1 | 0.44 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| am | 0 | 1 | 0.41 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| gear | 0 | 1 | 3.69 | 0.74 | 3.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▇▁▆▁▂ |
| carb | 0 | 1 | 2.81 | 1.62 | 1.00 | 2.00 | 2.00 | 4.00 | 8.00 | ▇▂▅▁▁ |
2. Basic Plotting Methods in Base R
3. Lattice package
4. ggplot 2
- we will use ggplot2 – the best tool in the market for data visualization – from now on.
4.1 Elaborate Examples
4.1.1 x & y are both continuous with moderator & labeller()
Wrangling
Error in mtcars: The pipe operator requires a function call as RHS (<input>:4:9)