Import two related datasets from TidyTuesday Project.
survivalists <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-01-24/survivalists.csv')
## Rows: 94 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): name, gender, city, state, country, reason_tapped_out, reason_cate...
## dbl (5): season, age, result, days_lasted, day_linked_up
## lgl (1): medically_evacuated
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
loadouts <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-01-24/loadouts.csv')
## Rows: 940 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): version, name, item_detailed, item
## dbl (2): season, item_number
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1
Data 2
set.seed(2718)
survivalists_small <- survivalists %>% select(season, name, age) %>% sample_n(10)
loadouts_small <- loadouts %>% select(season, name, item_detailed) %>% sample_n(10)
survivalists_small
## # A tibble: 10 × 3
## season name age
## <dbl> <chr> <dbl>
## 1 5 Sam Larson 24
## 2 9 Jessie Krebs 49
## 3 5 Britt Ahart 41
## 4 3 Dave Nessia 49
## 5 9 Tom Garstang 35
## 6 8 Tim Madsen 48
## 7 7 Joe Nicholas 31
## 8 4 Josh Richardson 19
## 9 3 Callie North 27
## 10 8 Nate Weber 47
loadouts_small
## # A tibble: 10 × 3
## season name item_detailed
## <dbl> <chr> <chr>
## 1 9 Benki Hill Trapping wire
## 2 6 Nikki van Schyndel Trapping wire
## 3 6 Barry Karcher Sleeping bag
## 4 5 Brad Richardson Sleeping bag
## 5 4 Pete Brockdorff Gillnet – 12′ x 4′
## 6 7 Amos Rodriguez Fishing line and hooks
## 7 3 Megan Hanacek Gillnet
## 8 4 Dave Whipple Tarp – 12′ x 12′
## 9 9 Juan Pablo Quinonez Axe
## 10 4 Jesse Bosdell Rations
Describe the resulting data:
How is it different from the original two datasets? -Fewer rows name x and y columns are intersected
inner_joined_data <-inner_join(loadouts_small, survivalists_small)
## Joining with `by = join_by(season, name)`
inner_joined_data
## # A tibble: 0 × 4
## # ℹ 4 variables: season <dbl>, name <chr>, item_detailed <chr>, age <dbl>
Describe the resulting data:
How is it different from the original two datasets? - Contains all rows from each set, additional two rows
left_joined_data <- left_join(loadouts_small, survivalists_small)
## Joining with `by = join_by(season, name)`
left_joined_data
## # A tibble: 10 × 4
## season name item_detailed age
## <dbl> <chr> <chr> <dbl>
## 1 9 Benki Hill Trapping wire NA
## 2 6 Nikki van Schyndel Trapping wire NA
## 3 6 Barry Karcher Sleeping bag NA
## 4 5 Brad Richardson Sleeping bag NA
## 5 4 Pete Brockdorff Gillnet – 12′ x 4′ NA
## 6 7 Amos Rodriguez Fishing line and hooks NA
## 7 3 Megan Hanacek Gillnet NA
## 8 4 Dave Whipple Tarp – 12′ x 12′ NA
## 9 9 Juan Pablo Quinonez Axe NA
## 10 4 Jesse Bosdell Rations NA
Describe the resulting data:
How is it different from the original two datasets? -NA is included for missing columns
right_joined_data <- right_join(loadouts_small, survivalists_small)
## Joining with `by = join_by(season, name)`
right_joined_data
## # A tibble: 10 × 4
## season name item_detailed age
## <dbl> <chr> <chr> <dbl>
## 1 5 Sam Larson <NA> 24
## 2 9 Jessie Krebs <NA> 49
## 3 5 Britt Ahart <NA> 41
## 4 3 Dave Nessia <NA> 49
## 5 9 Tom Garstang <NA> 35
## 6 8 Tim Madsen <NA> 48
## 7 7 Joe Nicholas <NA> 31
## 8 4 Josh Richardson <NA> 19
## 9 3 Callie North <NA> 27
## 10 8 Nate Weber <NA> 47
Describe the resulting data:
How is it different from the original two datasets? -All rows included from both datasets. NA is inputed in data set if it only appears in one of the datasets
full_joined_data <- full_join(loadouts_small, survivalists_small)
## Joining with `by = join_by(season, name)`
full_joined_data
## # A tibble: 20 × 4
## season name item_detailed age
## <dbl> <chr> <chr> <dbl>
## 1 9 Benki Hill Trapping wire NA
## 2 6 Nikki van Schyndel Trapping wire NA
## 3 6 Barry Karcher Sleeping bag NA
## 4 5 Brad Richardson Sleeping bag NA
## 5 4 Pete Brockdorff Gillnet – 12′ x 4′ NA
## 6 7 Amos Rodriguez Fishing line and hooks NA
## 7 3 Megan Hanacek Gillnet NA
## 8 4 Dave Whipple Tarp – 12′ x 12′ NA
## 9 9 Juan Pablo Quinonez Axe NA
## 10 4 Jesse Bosdell Rations NA
## 11 5 Sam Larson <NA> 24
## 12 9 Jessie Krebs <NA> 49
## 13 5 Britt Ahart <NA> 41
## 14 3 Dave Nessia <NA> 49
## 15 9 Tom Garstang <NA> 35
## 16 8 Tim Madsen <NA> 48
## 17 7 Joe Nicholas <NA> 31
## 18 4 Josh Richardson <NA> 19
## 19 3 Callie North <NA> 27
## 20 8 Nate Weber <NA> 47
Describe the resulting data:
How is it different from the original two datasets? -Only used one data sets columns. Rows were taken from items included in both sets.
semi_joined_data <- semi_join(loadouts_small, survivalists_small)
## Joining with `by = join_by(season, name)`
semi_joined_data
## # A tibble: 0 × 3
## # ℹ 3 variables: season <dbl>, name <chr>, item_detailed <chr>
Describe the resulting data:
How is it different from the original two datasets? -Only 2 rows. Takes missing pieces from other dataset
anti_joined_data <- anti_join(loadouts_small, survivalists_small)
## Joining with `by = join_by(season, name)`
anti_joined_data
## # A tibble: 10 × 3
## season name item_detailed
## <dbl> <chr> <chr>
## 1 9 Benki Hill Trapping wire
## 2 6 Nikki van Schyndel Trapping wire
## 3 6 Barry Karcher Sleeping bag
## 4 5 Brad Richardson Sleeping bag
## 5 4 Pete Brockdorff Gillnet – 12′ x 4′
## 6 7 Amos Rodriguez Fishing line and hooks
## 7 3 Megan Hanacek Gillnet
## 8 4 Dave Whipple Tarp – 12′ x 12′
## 9 9 Juan Pablo Quinonez Axe
## 10 4 Jesse Bosdell Rations