Import your data

library(Lahman)
data(Batting) 
Batting <- as.tibble(Batting)
## Warning: `as.tibble()` was deprecated in tibble 2.0.0.
## Please use `as_tibble()` instead.
## The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Chapter 15

Create a factor

skimr::skim(Batting)
Data summary
Name Batting
Number of rows 110495
Number of columns 22
_______________________
Column type frequency:
character 1
factor 2
numeric 19
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
playerID 0 1 5 9 0 20166 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
teamID 0 1 FALSE 149 CHN: 5129, PHI: 5026, PIT: 4984, SLN: 4904
lgID 0 1 FALSE 7 NL: 55945, AL: 50965, AA: 1893, NA: 737

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
yearID 0 1.00 1968.05 39.99 1871 1938 1977 2002 2021 ▂▃▃▆▇
stint 0 1.00 1.08 0.29 1 1 1 1 5 ▇▁▁▁▁
G 0 1.00 50.61 46.83 1 12 34 78 165 ▇▃▂▂▂
AB 0 1.00 138.56 183.32 0 3 45 222 716 ▇▁▁▁▁
R 0 1.00 18.40 27.99 0 0 4 26 198 ▇▁▁▁▁
H 0 1.00 36.18 52.07 0 0 8 55 262 ▇▁▁▁▁
X2B 0 1.00 6.18 9.61 0 0 1 9 67 ▇▁▁▁▁
X3B 0 1.00 1.23 2.58 0 0 0 1 36 ▇▁▁▁▁
HR 0 1.00 2.86 6.39 0 0 0 2 73 ▇▁▁▁▁
RBI 756 0.99 16.72 26.19 0 0 3 24 191 ▇▁▁▁▁
SB 2368 0.98 2.89 7.56 0 0 0 2 138 ▇▁▁▁▁
CS 23541 0.79 1.16 2.66 0 0 0 1 42 ▇▁▁▁▁
BB 0 1.00 12.79 20.56 0 0 2 18 232 ▇▁▁▁▁
SO 2100 0.98 20.63 28.72 0 1 9 29 223 ▇▁▁▁▁
IBB 36650 0.67 1.04 2.69 0 0 0 1 120 ▇▁▁▁▁
HBP 2816 0.97 1.06 2.30 0 0 0 1 51 ▇▁▁▁▁
SH 6068 0.95 2.17 4.13 0 0 0 3 67 ▇▁▁▁▁
SF 36103 0.67 1.01 1.92 0 0 0 1 19 ▇▁▁▁▁
GIDP 25441 0.77 2.87 4.66 0 0 0 4 36 ▇▁▁▁▁

Modify factor order

Make two bar charts here - one before ordering another after

hit_summary <- Batting %>%
    group_by(lgID) %>%
    summarise(avg_H = mean(H, na.rm = TRUE))
ggplot(hit_summary, aes(avg_H, lgID)) + geom_point()

ggplot(hit_summary, aes(avg_H, fct_reorder(lgID, avg_H))) + geom_point()

Modify factor levels

Show examples of three functions:

  • fct_recode
  • fct_collapse
  • fct_lump

Chapter 14

No need to do anything here.