Import Data

results <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-09-07/results.csv')
## Warning: One or more parsing issues, see `problems()` for details
## Rows: 25220 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (8): position, positionText, time, milliseconds, fastestLap, rank, fast...
## dbl (10): resultId, raceId, driverId, constructorId, number, grid, positionO...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skimr::skim(results)
Data summary
Name results
Number of rows 25220
Number of columns 18
_______________________
Column type frequency:
character 8
numeric 10
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
position 0 1 1 2 0 34 0
positionText 0 1 1 2 0 39 0
time 0 1 2 11 0 6488 0
milliseconds 0 1 2 8 0 6687 0
fastestLap 0 1 1 2 0 80 0
rank 0 1 1 2 0 26 0
fastestLapTime 0 1 2 8 0 6266 0
fastestLapSpeed 0 1 2 7 0 6395 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
resultId 0 1 12611.23 7281.58 1 6305.75 12610.5 18915.25 25225 ▇▇▇▇▇
raceId 0 1 517.95 290.34 1 287.00 503.0 762.00 1064 ▆▇▇▆▆
driverId 0 1 250.84 258.25 1 56.00 158.0 347.00 854 ▇▃▂▁▂
constructorId 0 1 47.48 58.39 1 6.00 25.0 57.00 214 ▇▂▁▁▁
number 6 1 17.59 14.80 0 7.00 15.0 23.00 208 ▇▁▁▁▁
grid 0 1 11.21 7.27 0 5.00 11.0 17.00 34 ▇▇▇▃▁
positionOrder 0 1 12.93 7.74 1 6.00 12.0 19.00 39 ▇▇▆▂▁
points 0 1 1.80 4.03 0 0.00 0.0 2.00 50 ▇▁▁▁▁
laps 0 1 45.79 30.04 0 21.00 52.0 66.00 200 ▅▇▁▁▁
statusId 0 1 17.72 26.10 1 1.00 11.0 14.00 139 ▇▁▁▁▁
results <- results %>%
    filter(position != "\\N") %>%
    mutate(position = as_factor(position))

Introduction

Questions

Variation

Visualizing distributions

results %>%
    ggplot(aes(x = position)) +
    geom_bar()

Typical values

results %>%
    
    # Filter out positions lower than 10
    # filter(number < 3) %>%
    
    # Plot
    ggplot(aes(x = number)) +
    geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Unusual values

Missing Values

Covariation

A categorical and continuous variable

results %>%
    
    ggplot(aes(x = position, y = fastestLap)) +
    geom_boxplot()

Two categorical variables

results %>%
    
    count(position, rank) %>%
    
    ggplot(aes(x = position, y = rank)) +
    geom_boxplot()

Two continous variables

results %>%
    ggplot(aes(x = position, y = fastestLap)) +
    geom_boxplot()

Patterns and models