1. Import your data

Import two related datasets from TidyTuesday Project.

colony <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2022/2022-01-11/colony.csv')
## Rows: 1222 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): months, state
## dbl (8): year, colony_n, colony_max, colony_lost, colony_lost_pct, colony_ad...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
stressor <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2022/2022-01-11/stressor.csv')
## Rows: 7332 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): months, state, stressor
## dbl (2): year, stress_pct
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1: Colony

Data 2: Stressor

set.seed(123)
colony_small <- colony %>% select(state, colony_n, colony_lost_pct) %>% sample_n(10)
stressor_small <- stressor %>% select(state, stressor, stress_pct) %>% sample_n(10)

colony_small
## # A tibble: 10 × 3
##    state      colony_n colony_lost_pct
##    <chr>         <dbl>           <dbl>
##  1 Utah          16000              15
##  2 Vermont        6000               2
##  3 Texas        125000              10
##  4 Hawaii        15000               1
##  5 Florida      245000              17
##  6 Wyoming       27000              12
##  7 Kansas         5000              22
##  8 California   640000              10
##  9 Florida      197000              14
## 10 Texas        205000               8
stressor_small
## # A tibble: 10 × 3
##    state          stressor              stress_pct
##    <chr>          <chr>                      <dbl>
##  1 Tennessee      Varroa mites                40.9
##  2 Tennessee      Disesases                    1  
##  3 Connecticut    Varroa mites                24.3
##  4 Missouri       Disesases                    0.4
##  5 Maryland       Other                        0.2
##  6 South Dakota   Pesticides                   1.6
##  7 North Carolina Disesases                    0.5
##  8 Florida        Disesases                    7.7
##  9 Michigan       Disesases                    9.2
## 10 Indiana        Other pests/parasites        5.9

3. inner_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% inner_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 2 × 5
##   state   colony_n colony_lost_pct stressor  stress_pct
##   <chr>      <dbl>           <dbl> <chr>          <dbl>
## 1 Florida   245000              17 Disesases        7.7
## 2 Florida   197000              14 Disesases        7.7

4. left_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% left_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 10 × 5
##    state      colony_n colony_lost_pct stressor  stress_pct
##    <chr>         <dbl>           <dbl> <chr>          <dbl>
##  1 Utah          16000              15 <NA>            NA  
##  2 Vermont        6000               2 <NA>            NA  
##  3 Texas        125000              10 <NA>            NA  
##  4 Hawaii        15000               1 <NA>            NA  
##  5 Florida      245000              17 Disesases        7.7
##  6 Wyoming       27000              12 <NA>            NA  
##  7 Kansas         5000              22 <NA>            NA  
##  8 California   640000              10 <NA>            NA  
##  9 Florida      197000              14 Disesases        7.7
## 10 Texas        205000               8 <NA>            NA

5. right_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% right_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 11 × 5
##    state          colony_n colony_lost_pct stressor              stress_pct
##    <chr>             <dbl>           <dbl> <chr>                      <dbl>
##  1 Florida          245000              17 Disesases                    7.7
##  2 Florida          197000              14 Disesases                    7.7
##  3 Tennessee            NA              NA Varroa mites                40.9
##  4 Tennessee            NA              NA Disesases                    1  
##  5 Connecticut          NA              NA Varroa mites                24.3
##  6 Missouri             NA              NA Disesases                    0.4
##  7 Maryland             NA              NA Other                        0.2
##  8 South Dakota         NA              NA Pesticides                   1.6
##  9 North Carolina       NA              NA Disesases                    0.5
## 10 Michigan             NA              NA Disesases                    9.2
## 11 Indiana              NA              NA Other pests/parasites        5.9

6. full_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% full_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 19 × 5
##    state          colony_n colony_lost_pct stressor              stress_pct
##    <chr>             <dbl>           <dbl> <chr>                      <dbl>
##  1 Utah              16000              15 <NA>                        NA  
##  2 Vermont            6000               2 <NA>                        NA  
##  3 Texas            125000              10 <NA>                        NA  
##  4 Hawaii            15000               1 <NA>                        NA  
##  5 Florida          245000              17 Disesases                    7.7
##  6 Wyoming           27000              12 <NA>                        NA  
##  7 Kansas             5000              22 <NA>                        NA  
##  8 California       640000              10 <NA>                        NA  
##  9 Florida          197000              14 Disesases                    7.7
## 10 Texas            205000               8 <NA>                        NA  
## 11 Tennessee            NA              NA Varroa mites                40.9
## 12 Tennessee            NA              NA Disesases                    1  
## 13 Connecticut          NA              NA Varroa mites                24.3
## 14 Missouri             NA              NA Disesases                    0.4
## 15 Maryland             NA              NA Other                        0.2
## 16 South Dakota         NA              NA Pesticides                   1.6
## 17 North Carolina       NA              NA Disesases                    0.5
## 18 Michigan             NA              NA Disesases                    9.2
## 19 Indiana              NA              NA Other pests/parasites        5.9

7. semi_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% semi_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 2 × 3
##   state   colony_n colony_lost_pct
##   <chr>      <dbl>           <dbl>
## 1 Florida   245000              17
## 2 Florida   197000              14

8. anti_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% anti_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 8 × 3
##   state      colony_n colony_lost_pct
##   <chr>         <dbl>           <dbl>
## 1 Utah          16000              15
## 2 Vermont        6000               2
## 3 Texas        125000              10
## 4 Hawaii        15000               1
## 5 Wyoming       27000              12
## 6 Kansas         5000              22
## 7 California   640000              10
## 8 Texas        205000               8