1. Import your data

Import two related datasets from TidyTuesday Project.

loadouts <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-01-24/loadouts.csv')
## Rows: 940 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): version, name, item_detailed, item
## dbl (2): season, item_number
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
survivalists <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-01-24/survivalists.csv')
## Rows: 94 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): name, gender, city, state, country, reason_tapped_out, reason_cate...
## dbl  (5): season, age, result, days_lasted, day_linked_up
## lgl  (1): medically_evacuated
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

set.seed(1234)

Data1 <- loadouts %>%
    select(name, season, item) %>%
    sample_n(10)
Data1
## # A tibble: 10 × 3
##    name                season item        
##    <chr>                <dbl> <chr>       
##  1 Dave Nessia              3 Ferro rod   
##  2 Juan Pablo Quinonez      9 Sleeping bag
##  3 Benki Hill               9 Paracord    
##  4 David McIntyre           2 Knife       
##  5 Donny Dust               6 Paracord    
##  6 Terry Burns              9 Ferro rod   
##  7 Roland Welker            7 Axe         
##  8 Jesse Bosdell            4 Rations     
##  9 Tom Garstang             9 Saw         
## 10 Sam Larson               1 Slingshot
set.seed(1234)

Data2 <- survivalists %>%
    select(name, season, medically_evacuated) %>%
    sample_n(10)
Data2
## # A tibble: 10 × 3
##    name               season medically_evacuated
##    <chr>               <dbl> <lgl>              
##  1 Britt Ahart             3 FALSE              
##  2 Nate Weber              8 FALSE              
##  3 Carleigh Fairchild      3 TRUE               
##  4 Chris Weatherman        1 FALSE              
##  5 Dustin Feher            1 FALSE              
##  6 Brody Wilkes            4 FALSE              
##  7 Randy Champagne         2 FALSE              
##  8 Lucas Miller            1 FALSE              
##  9 Karie Lee Knoke         9 FALSE              
## 10 Joe Nicholas            7 FALSE

Describe the two datasets:

Data1

Data 2

3. inner_join

Describe the resulting data:

How is it different from the original two datasets? there are more rows as well as more columns

Data1 %>%
    inner_join(Data2, by = c("season", "name"))
## # A tibble: 0 × 4
## # ℹ 4 variables: name <chr>, season <dbl>, item <chr>,
## #   medically_evacuated <lgl>

4. left_join

Describe the resulting data:

How is it different from the original two datasets? the columns are in different order

Data2 %>% left_join(Data1)
## Joining with `by = join_by(name, season)`
## # A tibble: 10 × 4
##    name               season medically_evacuated item 
##    <chr>               <dbl> <lgl>               <chr>
##  1 Britt Ahart             3 FALSE               <NA> 
##  2 Nate Weber              8 FALSE               <NA> 
##  3 Carleigh Fairchild      3 TRUE                <NA> 
##  4 Chris Weatherman        1 FALSE               <NA> 
##  5 Dustin Feher            1 FALSE               <NA> 
##  6 Brody Wilkes            4 FALSE               <NA> 
##  7 Randy Champagne         2 FALSE               <NA> 
##  8 Lucas Miller            1 FALSE               <NA> 
##  9 Karie Lee Knoke         9 FALSE               <NA> 
## 10 Joe Nicholas            7 FALSE               <NA>

5. right_join

Describe the resulting data:

How is it different from the original two datasets? there are now 4 columns

Data1 %>% right_join(Data2)
## Joining with `by = join_by(name, season)`
## # A tibble: 10 × 4
##    name               season item  medically_evacuated
##    <chr>               <dbl> <chr> <lgl>              
##  1 Britt Ahart             3 <NA>  FALSE              
##  2 Nate Weber              8 <NA>  FALSE              
##  3 Carleigh Fairchild      3 <NA>  TRUE               
##  4 Chris Weatherman        1 <NA>  FALSE              
##  5 Dustin Feher            1 <NA>  FALSE              
##  6 Brody Wilkes            4 <NA>  FALSE              
##  7 Randy Champagne         2 <NA>  FALSE              
##  8 Lucas Miller            1 <NA>  FALSE              
##  9 Karie Lee Knoke         9 <NA>  FALSE              
## 10 Joe Nicholas            7 <NA>  FALSE

6. full_join

Describe the resulting data:

How is it different from the original two datasets? now the two datasets are fully combined

Data2 %>% full_join(Data1)
## Joining with `by = join_by(name, season)`
## # A tibble: 20 × 4
##    name                season medically_evacuated item        
##    <chr>                <dbl> <lgl>               <chr>       
##  1 Britt Ahart              3 FALSE               <NA>        
##  2 Nate Weber               8 FALSE               <NA>        
##  3 Carleigh Fairchild       3 TRUE                <NA>        
##  4 Chris Weatherman         1 FALSE               <NA>        
##  5 Dustin Feher             1 FALSE               <NA>        
##  6 Brody Wilkes             4 FALSE               <NA>        
##  7 Randy Champagne          2 FALSE               <NA>        
##  8 Lucas Miller             1 FALSE               <NA>        
##  9 Karie Lee Knoke          9 FALSE               <NA>        
## 10 Joe Nicholas             7 FALSE               <NA>        
## 11 Dave Nessia              3 NA                  Ferro rod   
## 12 Juan Pablo Quinonez      9 NA                  Sleeping bag
## 13 Benki Hill               9 NA                  Paracord    
## 14 David McIntyre           2 NA                  Knife       
## 15 Donny Dust               6 NA                  Paracord    
## 16 Terry Burns              9 NA                  Ferro rod   
## 17 Roland Welker            7 NA                  Axe         
## 18 Jesse Bosdell            4 NA                  Rations     
## 19 Tom Garstang             9 NA                  Saw         
## 20 Sam Larson               1 NA                  Slingshot

7. semi_join

Describe the resulting data:

How is it different from the original two datasets? there are 9 rows, as well as no medically_evacuated column

Data1 %>% semi_join(Data2, by = "season", "name")
## # A tibble: 9 × 3
##   name                season item        
##   <chr>                <dbl> <chr>       
## 1 Dave Nessia              3 Ferro rod   
## 2 Juan Pablo Quinonez      9 Sleeping bag
## 3 Benki Hill               9 Paracord    
## 4 David McIntyre           2 Knife       
## 5 Terry Burns              9 Ferro rod   
## 6 Roland Welker            7 Axe         
## 7 Jesse Bosdell            4 Rations     
## 8 Tom Garstang             9 Saw         
## 9 Sam Larson               1 Slingshot

8. anti_join

Describe the resulting data:

How is it different from the original two datasets?

Data1 %>% anti_join(Data2, by = "season", "name")
## # A tibble: 1 × 3
##   name       season item    
##   <chr>       <dbl> <chr>   
## 1 Donny Dust      6 Paracord