Import Data
results <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-09-07/results.csv')
## Warning: One or more parsing issues, see `problems()` for details
## Rows: 25220 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): position, positionText, time, milliseconds, fastestLap, rank, fast...
## dbl (10): resultId, raceId, driverId, constructorId, number, grid, positionO...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skimr::skim(results)
Data summary
| Name |
results |
| Number of rows |
25220 |
| Number of columns |
18 |
| _______________________ |
|
| Column type frequency: |
|
| character |
8 |
| numeric |
10 |
| ________________________ |
|
| Group variables |
None |
Variable type: character
| position |
0 |
1 |
1 |
2 |
0 |
34 |
0 |
| positionText |
0 |
1 |
1 |
2 |
0 |
39 |
0 |
| time |
0 |
1 |
2 |
11 |
0 |
6488 |
0 |
| milliseconds |
0 |
1 |
2 |
8 |
0 |
6687 |
0 |
| fastestLap |
0 |
1 |
1 |
2 |
0 |
80 |
0 |
| rank |
0 |
1 |
1 |
2 |
0 |
26 |
0 |
| fastestLapTime |
0 |
1 |
2 |
8 |
0 |
6266 |
0 |
| fastestLapSpeed |
0 |
1 |
2 |
7 |
0 |
6395 |
0 |
Variable type: numeric
| resultId |
0 |
1 |
12611.23 |
7281.58 |
1 |
6305.75 |
12610.5 |
18915.25 |
25225 |
▇▇▇▇▇ |
| raceId |
0 |
1 |
517.95 |
290.34 |
1 |
287.00 |
503.0 |
762.00 |
1064 |
▆▇▇▆▆ |
| driverId |
0 |
1 |
250.84 |
258.25 |
1 |
56.00 |
158.0 |
347.00 |
854 |
▇▃▂▁▂ |
| constructorId |
0 |
1 |
47.48 |
58.39 |
1 |
6.00 |
25.0 |
57.00 |
214 |
▇▂▁▁▁ |
| number |
6 |
1 |
17.59 |
14.80 |
0 |
7.00 |
15.0 |
23.00 |
208 |
▇▁▁▁▁ |
| grid |
0 |
1 |
11.21 |
7.27 |
0 |
5.00 |
11.0 |
17.00 |
34 |
▇▇▇▃▁ |
| positionOrder |
0 |
1 |
12.93 |
7.74 |
1 |
6.00 |
12.0 |
19.00 |
39 |
▇▇▆▂▁ |
| points |
0 |
1 |
1.80 |
4.03 |
0 |
0.00 |
0.0 |
2.00 |
50 |
▇▁▁▁▁ |
| laps |
0 |
1 |
45.79 |
30.04 |
0 |
21.00 |
52.0 |
66.00 |
200 |
▅▇▁▁▁ |
| statusId |
0 |
1 |
17.72 |
26.10 |
1 |
1.00 |
11.0 |
14.00 |
139 |
▇▁▁▁▁ |
results <- results %>%
filter(position != "\\N") %>%
mutate(position = as_factor(position))
Introduction
Questions
Variation
Visualizing distributions
results %>%
ggplot(aes(x = position)) +
geom_bar()

Typical values
results %>%
# Filter out positions lower than 10
# filter(number < 3) %>%
# Plot
ggplot(aes(x = number)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Unusual values
Missing Values
Covariation
A categorical and continuous variable
results %>%
ggplot(aes(x = position, y = fastestLap)) +
geom_boxplot()

Two categorical variables
results %>%
count(position, rank) %>%
ggplot(aes(x = position, y = rank)) +
geom_boxplot()

Two continous variables
results %>%
ggplot(aes(x = position, y = fastestLap)) +
geom_boxplot()

Patterns and models