Import two related datasets from TidyTuesday Project.
loadouts <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-01-24/loadouts.csv')
## Rows: 940 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): version, name, item_detailed, item
## dbl (2): season, item_number
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
survivalists <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-01-24/survivalists.csv')
## Rows: 94 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): name, gender, city, state, country, reason_tapped_out, reason_cate...
## dbl (5): season, age, result, days_lasted, day_linked_up
## lgl (1): medically_evacuated
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
set.seed(1234)
Data1 <- loadouts %>%
select(name, season, item) %>%
sample_n(10)
Data1
## # A tibble: 10 × 3
## name season item
## <chr> <dbl> <chr>
## 1 Dave Nessia 3 Ferro rod
## 2 Juan Pablo Quinonez 9 Sleeping bag
## 3 Benki Hill 9 Paracord
## 4 David McIntyre 2 Knife
## 5 Donny Dust 6 Paracord
## 6 Terry Burns 9 Ferro rod
## 7 Roland Welker 7 Axe
## 8 Jesse Bosdell 4 Rations
## 9 Tom Garstang 9 Saw
## 10 Sam Larson 1 Slingshot
set.seed(1234)
Data2 <- survivalists %>%
select(name, season, medically_evacuated) %>%
sample_n(10)
Data2
## # A tibble: 10 × 3
## name season medically_evacuated
## <chr> <dbl> <lgl>
## 1 Britt Ahart 3 FALSE
## 2 Nate Weber 8 FALSE
## 3 Carleigh Fairchild 3 TRUE
## 4 Chris Weatherman 1 FALSE
## 5 Dustin Feher 1 FALSE
## 6 Brody Wilkes 4 FALSE
## 7 Randy Champagne 2 FALSE
## 8 Lucas Miller 1 FALSE
## 9 Karie Lee Knoke 9 FALSE
## 10 Joe Nicholas 7 FALSE
Describe the two datasets:
Data1
Data 2
Describe the resulting data:
How is it different from the original two datasets? there are more rows as well as more columns
Data1 %>%
inner_join(Data2, by = c("season", "name"))
## # A tibble: 0 × 4
## # ℹ 4 variables: name <chr>, season <dbl>, item <chr>,
## # medically_evacuated <lgl>
Describe the resulting data:
How is it different from the original two datasets? the columns are in different order
Data2 %>% left_join(Data1)
## Joining with `by = join_by(name, season)`
## # A tibble: 10 × 4
## name season medically_evacuated item
## <chr> <dbl> <lgl> <chr>
## 1 Britt Ahart 3 FALSE <NA>
## 2 Nate Weber 8 FALSE <NA>
## 3 Carleigh Fairchild 3 TRUE <NA>
## 4 Chris Weatherman 1 FALSE <NA>
## 5 Dustin Feher 1 FALSE <NA>
## 6 Brody Wilkes 4 FALSE <NA>
## 7 Randy Champagne 2 FALSE <NA>
## 8 Lucas Miller 1 FALSE <NA>
## 9 Karie Lee Knoke 9 FALSE <NA>
## 10 Joe Nicholas 7 FALSE <NA>
Describe the resulting data:
How is it different from the original two datasets? there are now 4 columns
Data1 %>% right_join(Data2)
## Joining with `by = join_by(name, season)`
## # A tibble: 10 × 4
## name season item medically_evacuated
## <chr> <dbl> <chr> <lgl>
## 1 Britt Ahart 3 <NA> FALSE
## 2 Nate Weber 8 <NA> FALSE
## 3 Carleigh Fairchild 3 <NA> TRUE
## 4 Chris Weatherman 1 <NA> FALSE
## 5 Dustin Feher 1 <NA> FALSE
## 6 Brody Wilkes 4 <NA> FALSE
## 7 Randy Champagne 2 <NA> FALSE
## 8 Lucas Miller 1 <NA> FALSE
## 9 Karie Lee Knoke 9 <NA> FALSE
## 10 Joe Nicholas 7 <NA> FALSE
Describe the resulting data:
How is it different from the original two datasets? now the two datasets are fully combined
Data2 %>% full_join(Data1)
## Joining with `by = join_by(name, season)`
## # A tibble: 20 × 4
## name season medically_evacuated item
## <chr> <dbl> <lgl> <chr>
## 1 Britt Ahart 3 FALSE <NA>
## 2 Nate Weber 8 FALSE <NA>
## 3 Carleigh Fairchild 3 TRUE <NA>
## 4 Chris Weatherman 1 FALSE <NA>
## 5 Dustin Feher 1 FALSE <NA>
## 6 Brody Wilkes 4 FALSE <NA>
## 7 Randy Champagne 2 FALSE <NA>
## 8 Lucas Miller 1 FALSE <NA>
## 9 Karie Lee Knoke 9 FALSE <NA>
## 10 Joe Nicholas 7 FALSE <NA>
## 11 Dave Nessia 3 NA Ferro rod
## 12 Juan Pablo Quinonez 9 NA Sleeping bag
## 13 Benki Hill 9 NA Paracord
## 14 David McIntyre 2 NA Knife
## 15 Donny Dust 6 NA Paracord
## 16 Terry Burns 9 NA Ferro rod
## 17 Roland Welker 7 NA Axe
## 18 Jesse Bosdell 4 NA Rations
## 19 Tom Garstang 9 NA Saw
## 20 Sam Larson 1 NA Slingshot
Describe the resulting data:
How is it different from the original two datasets? there are 9 rows, as well as no medically_evacuated column
Data1 %>% semi_join(Data2, by = "season", "name")
## # A tibble: 9 × 3
## name season item
## <chr> <dbl> <chr>
## 1 Dave Nessia 3 Ferro rod
## 2 Juan Pablo Quinonez 9 Sleeping bag
## 3 Benki Hill 9 Paracord
## 4 David McIntyre 2 Knife
## 5 Terry Burns 9 Ferro rod
## 6 Roland Welker 7 Axe
## 7 Jesse Bosdell 4 Rations
## 8 Tom Garstang 9 Saw
## 9 Sam Larson 1 Slingshot
Describe the resulting data:
How is it different from the original two datasets?
Data1 %>% anti_join(Data2, by = "season", "name")
## # A tibble: 1 × 3
## name season item
## <chr> <dbl> <chr>
## 1 Donny Dust 6 Paracord