results <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-09-07/results.csv')
## Warning: One or more parsing issues, see `problems()` for details
## Rows: 25220 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): position, positionText, time, milliseconds, fastestLap, rank, fast...
## dbl (10): resultId, raceId, driverId, constructorId, number, grid, positionO...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skimr::skim(results)
| Name | results |
| Number of rows | 25220 |
| Number of columns | 18 |
| _______________________ | |
| Column type frequency: | |
| character | 8 |
| numeric | 10 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| position | 0 | 1 | 1 | 2 | 0 | 34 | 0 |
| positionText | 0 | 1 | 1 | 2 | 0 | 39 | 0 |
| time | 0 | 1 | 2 | 11 | 0 | 6488 | 0 |
| milliseconds | 0 | 1 | 2 | 8 | 0 | 6687 | 0 |
| fastestLap | 0 | 1 | 1 | 2 | 0 | 80 | 0 |
| rank | 0 | 1 | 1 | 2 | 0 | 26 | 0 |
| fastestLapTime | 0 | 1 | 2 | 8 | 0 | 6266 | 0 |
| fastestLapSpeed | 0 | 1 | 2 | 7 | 0 | 6395 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| resultId | 0 | 1 | 12611.23 | 7281.58 | 1 | 6305.75 | 12610.5 | 18915.25 | 25225 | ▇▇▇▇▇ |
| raceId | 0 | 1 | 517.95 | 290.34 | 1 | 287.00 | 503.0 | 762.00 | 1064 | ▆▇▇▆▆ |
| driverId | 0 | 1 | 250.84 | 258.25 | 1 | 56.00 | 158.0 | 347.00 | 854 | ▇▃▂▁▂ |
| constructorId | 0 | 1 | 47.48 | 58.39 | 1 | 6.00 | 25.0 | 57.00 | 214 | ▇▂▁▁▁ |
| number | 6 | 1 | 17.59 | 14.80 | 0 | 7.00 | 15.0 | 23.00 | 208 | ▇▁▁▁▁ |
| grid | 0 | 1 | 11.21 | 7.27 | 0 | 5.00 | 11.0 | 17.00 | 34 | ▇▇▇▃▁ |
| positionOrder | 0 | 1 | 12.93 | 7.74 | 1 | 6.00 | 12.0 | 19.00 | 39 | ▇▇▆▂▁ |
| points | 0 | 1 | 1.80 | 4.03 | 0 | 0.00 | 0.0 | 2.00 | 50 | ▇▁▁▁▁ |
| laps | 0 | 1 | 45.79 | 30.04 | 0 | 21.00 | 52.0 | 66.00 | 200 | ▅▇▁▁▁ |
| statusId | 0 | 1 | 17.72 | 26.10 | 1 | 1.00 | 11.0 | 14.00 | 139 | ▇▁▁▁▁ |
The Primary keys in my data are raceId, rank, and driverId.
Divide it using dplyr::select in a way the two have a common variable, which you could use to join the two.
data_1 <- results %>% select(raceId:statusId)
data_1
## # A tibble: 25,220 × 17
## raceId driverId construct…¹ number grid posit…² posit…³ posit…⁴ points laps
## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 18 1 1 22 1 "1" 1 1 10 58
## 2 18 2 2 3 5 "2" 2 2 8 58
## 3 18 3 3 7 7 "3" 3 3 6 58
## 4 18 4 4 5 11 "4" 4 4 5 58
## 5 18 5 1 23 3 "5" 5 5 4 58
## 6 18 6 3 8 13 "6" 6 6 3 57
## 7 18 7 5 14 17 "7" 7 7 2 55
## 8 18 8 6 1 15 "8" 8 8 1 53
## 9 18 9 2 4 2 "\\N" R 9 0 47
## 10 18 10 7 12 18 "\\N" R 10 0 43
## # … with 25,210 more rows, 7 more variables: time <chr>, milliseconds <chr>,
## # fastestLap <chr>, rank <chr>, fastestLapTime <chr>, fastestLapSpeed <chr>,
## # statusId <dbl>, and abbreviated variable names ¹constructorId, ²position,
## # ³positionText, ⁴positionOrder
data_2 <- results %>% select(raceId, rank, fastestLapTime)
data_2
## # A tibble: 25,220 × 3
## raceId rank fastestLapTime
## <dbl> <chr> <chr>
## 1 18 2 1:27.452
## 2 18 3 1:27.739
## 3 18 5 1:28.090
## 4 18 7 1:28.603
## 5 18 1 1:27.418
## 6 18 14 1:29.639
## 7 18 12 1:29.534
## 8 18 4 1:27.903
## 9 18 9 1:28.753
## 10 18 13 1:29.558
## # … with 25,210 more rows
Use tidyr::left_join or other joining functions.
data_1 %>%
left_join(data_2, by = c("raceId", "fastestLapTime"))
## # A tibble: 488,388 × 18
## raceId driverId construct…¹ number grid posit…² posit…³ posit…⁴ points laps
## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 18 1 1 22 1 "1" 1 1 10 58
## 2 18 2 2 3 5 "2" 2 2 8 58
## 3 18 3 3 7 7 "3" 3 3 6 58
## 4 18 4 4 5 11 "4" 4 4 5 58
## 5 18 5 1 23 3 "5" 5 5 4 58
## 6 18 6 3 8 13 "6" 6 6 3 57
## 7 18 7 5 14 17 "7" 7 7 2 55
## 8 18 8 6 1 15 "8" 8 8 1 53
## 9 18 9 2 4 2 "\\N" R 9 0 47
## 10 18 10 7 12 18 "\\N" R 10 0 43
## # … with 488,378 more rows, 8 more variables: time <chr>, milliseconds <chr>,
## # fastestLap <chr>, rank.x <chr>, fastestLapTime <chr>,
## # fastestLapSpeed <chr>, statusId <dbl>, rank.y <chr>, and abbreviated
## # variable names ¹constructorId, ²position, ³positionText, ⁴positionOrder