1. Import your data

Import two related datasets from TidyTuesday Project.

colony <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2022/2022-01-11/colony.csv')
## Rows: 1222 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): months, state
## dbl (8): year, colony_n, colony_max, colony_lost, colony_lost_pct, colony_ad...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
stressor <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2022/2022-01-11/stressor.csv')
## Rows: 7332 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): months, state, stressor
## dbl (2): year, stress_pct
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1: Colony

Data 2: Stressor

colony_small <- colony %>% select(state, colony_n, colony_lost_pct) %>% sample_n(10)
stressor_small <- stressor %>% select(state, stressor, stress_pct) %>% sample_n(10)

colony_small
## # A tibble: 10 × 3
##    state       colony_n colony_lost_pct
##    <chr>          <dbl>           <dbl>
##  1 Oklahoma        5000               7
##  2 Missouri        7000               3
##  3 Kentucky        8500              13
##  4 Maryland        7500               9
##  5 Oklahoma        3700              NA
##  6 Minnesota      39000               6
##  7 Louisiana      55000               7
##  8 Ohio           15500              25
##  9 Mississippi    26000               6
## 10 Wyoming        27000              12
stressor_small
## # A tibble: 10 × 3
##    state          stressor              stress_pct
##    <chr>          <chr>                      <dbl>
##  1 Illinois       Varroa mites                21.2
##  2 Oklahoma       Other pests/parasites       10.9
##  3 West Virginia  Varroa mites                23.2
##  4 North Carolina Pesticides                   2  
##  5 New Jersey     Unknown                      2.3
##  6 California     Other pests/parasites       15.8
##  7 California     Pesticides                  11.6
##  8 Georgia        Pesticides                   2.6
##  9 Mississippi    Unknown                      5.2
## 10 Ohio           Pesticides                   1.8

The number of rows might not reflect the actual number of rows from the code. It seems like the random 10 that is chosen after running the code initially is different from the one that happens when I knit the whole code

3. inner_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% inner_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 4 × 5
##   state       colony_n colony_lost_pct stressor              stress_pct
##   <chr>          <dbl>           <dbl> <chr>                      <dbl>
## 1 Oklahoma        5000               7 Other pests/parasites       10.9
## 2 Oklahoma        3700              NA Other pests/parasites       10.9
## 3 Ohio           15500              25 Pesticides                   1.8
## 4 Mississippi    26000               6 Unknown                      5.2

4. left_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% left_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 10 × 5
##    state       colony_n colony_lost_pct stressor              stress_pct
##    <chr>          <dbl>           <dbl> <chr>                      <dbl>
##  1 Oklahoma        5000               7 Other pests/parasites       10.9
##  2 Missouri        7000               3 <NA>                        NA  
##  3 Kentucky        8500              13 <NA>                        NA  
##  4 Maryland        7500               9 <NA>                        NA  
##  5 Oklahoma        3700              NA Other pests/parasites       10.9
##  6 Minnesota      39000               6 <NA>                        NA  
##  7 Louisiana      55000               7 <NA>                        NA  
##  8 Ohio           15500              25 Pesticides                   1.8
##  9 Mississippi    26000               6 Unknown                      5.2
## 10 Wyoming        27000              12 <NA>                        NA

5. right_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% right_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 11 × 5
##    state          colony_n colony_lost_pct stressor              stress_pct
##    <chr>             <dbl>           <dbl> <chr>                      <dbl>
##  1 Oklahoma           5000               7 Other pests/parasites       10.9
##  2 Oklahoma           3700              NA Other pests/parasites       10.9
##  3 Ohio              15500              25 Pesticides                   1.8
##  4 Mississippi       26000               6 Unknown                      5.2
##  5 Illinois             NA              NA Varroa mites                21.2
##  6 West Virginia        NA              NA Varroa mites                23.2
##  7 North Carolina       NA              NA Pesticides                   2  
##  8 New Jersey           NA              NA Unknown                      2.3
##  9 California           NA              NA Other pests/parasites       15.8
## 10 California           NA              NA Pesticides                  11.6
## 11 Georgia              NA              NA Pesticides                   2.6

6. full_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% full_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 17 × 5
##    state          colony_n colony_lost_pct stressor              stress_pct
##    <chr>             <dbl>           <dbl> <chr>                      <dbl>
##  1 Oklahoma           5000               7 Other pests/parasites       10.9
##  2 Missouri           7000               3 <NA>                        NA  
##  3 Kentucky           8500              13 <NA>                        NA  
##  4 Maryland           7500               9 <NA>                        NA  
##  5 Oklahoma           3700              NA Other pests/parasites       10.9
##  6 Minnesota         39000               6 <NA>                        NA  
##  7 Louisiana         55000               7 <NA>                        NA  
##  8 Ohio              15500              25 Pesticides                   1.8
##  9 Mississippi       26000               6 Unknown                      5.2
## 10 Wyoming           27000              12 <NA>                        NA  
## 11 Illinois             NA              NA Varroa mites                21.2
## 12 West Virginia        NA              NA Varroa mites                23.2
## 13 North Carolina       NA              NA Pesticides                   2  
## 14 New Jersey           NA              NA Unknown                      2.3
## 15 California           NA              NA Other pests/parasites       15.8
## 16 California           NA              NA Pesticides                  11.6
## 17 Georgia              NA              NA Pesticides                   2.6

7. semi_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% semi_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 4 × 3
##   state       colony_n colony_lost_pct
##   <chr>          <dbl>           <dbl>
## 1 Oklahoma        5000               7
## 2 Oklahoma        3700              NA
## 3 Ohio           15500              25
## 4 Mississippi    26000               6

8. anti_join

Describe the resulting data:

How is it different from the original two datasets?

colony_small %>% anti_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 6 × 3
##   state     colony_n colony_lost_pct
##   <chr>        <dbl>           <dbl>
## 1 Missouri      7000               3
## 2 Kentucky      8500              13
## 3 Maryland      7500               9
## 4 Minnesota    39000               6
## 5 Louisiana    55000               7
## 6 Wyoming      27000              12