1. Import your data

Import two related datasets from TidyTuesday Project.

colony <- read_excel("../00_data/Colony.xlsx")

stressor <- read_excel("../00_data/Stressor.xlsx")

2. Make data small

Describe the two datasets:

Data1: Colony

Data 2: Stressor

set.seed(8)
colony_small <- colony %>% select(year, state, colony_lost_pct) %>%
    sample_n(20)
stressor_small <- stressor %>% select(year, state, stressor) %>%
sample_n(20)

colony_small
## # A tibble: 20 × 3
##     year state       colony_lost_pct
##    <dbl> <chr>                 <dbl>
##  1  2021 Oklahoma                 21
##  2  2018 New Mexico               52
##  3  2015 Indiana                  22
##  4  2018 Michigan                  9
##  5  2019 Connecticut               8
##  6  2019 Minnesota                16
##  7  2015 Utah                     13
##  8  2020 Georgia                  12
##  9  2016 Washington                7
## 10  2015 Louisiana                 4
## 11  2017 Tennessee                12
## 12  2018 Idaho                    13
## 13  2016 New York                  9
## 14  2021 Maine                     6
## 15  2016 Tennessee                10
## 16  2021 Arkansas                 13
## 17  2016 Arkansas                 19
## 18  2018 Hawaii                    6
## 19  2017 Michigan                 16
## 20  2015 Kentucky                 12
stressor_small
## # A tibble: 20 × 3
##     year state          stressor             
##    <dbl> <chr>          <chr>                
##  1  2020 Other States   Other                
##  2  2015 Georgia        Other pests/parasites
##  3  2015 Arizona        Other                
##  4  2017 Other States   Pesticides           
##  5  2019 New York       Unknown              
##  6  2021 Iowa           Unknown              
##  7  2015 Connecticut    Disesases            
##  8  2018 North Carolina Other pests/parasites
##  9  2016 Nebraska       Pesticides           
## 10  2020 Connecticut    Unknown              
## 11  2020 Connecticut    Pesticides           
## 12  2016 Georgia        Other                
## 13  2020 North Dakota   Other pests/parasites
## 14  2021 New Jersey     Pesticides           
## 15  2016 Arizona        Pesticides           
## 16  2018 South Carolina Varroa mites         
## 17  2019 Georgia        Varroa mites         
## 18  2016 Missouri       Other pests/parasites
## 19  2018 Other States   Pesticides           
## 20  2018 Idaho          Other

3. inner_join

Describe the resulting data:

How is it different from the original two datasets? 1 row compared to 20 rows

colony_small %>% inner_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 1 × 4
##    year state colony_lost_pct stressor
##   <dbl> <chr>           <dbl> <chr>   
## 1  2018 Idaho              13 Other

4. left_join

Describe the resulting data:

How is it different from the original two datasets? 4 columns compared to 3 columns

colony_small %>% left_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 20 × 4
##     year state       colony_lost_pct stressor
##    <dbl> <chr>                 <dbl> <chr>   
##  1  2021 Oklahoma                 21 <NA>    
##  2  2018 New Mexico               52 <NA>    
##  3  2015 Indiana                  22 <NA>    
##  4  2018 Michigan                  9 <NA>    
##  5  2019 Connecticut               8 <NA>    
##  6  2019 Minnesota                16 <NA>    
##  7  2015 Utah                     13 <NA>    
##  8  2020 Georgia                  12 <NA>    
##  9  2016 Washington                7 <NA>    
## 10  2015 Louisiana                 4 <NA>    
## 11  2017 Tennessee                12 <NA>    
## 12  2018 Idaho                    13 Other   
## 13  2016 New York                  9 <NA>    
## 14  2021 Maine                     6 <NA>    
## 15  2016 Tennessee                10 <NA>    
## 16  2021 Arkansas                 13 <NA>    
## 17  2016 Arkansas                 19 <NA>    
## 18  2018 Hawaii                    6 <NA>    
## 19  2017 Michigan                 16 <NA>    
## 20  2015 Kentucky                 12 <NA>

5. right_join

Describe the resulting data:

How is it different from the original two datasets? 4 columns compared to 3 columns

colony_small %>% right_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 20 × 4
##     year state          colony_lost_pct stressor             
##    <dbl> <chr>                    <dbl> <chr>                
##  1  2018 Idaho                       13 Other                
##  2  2020 Other States                NA Other                
##  3  2015 Georgia                     NA Other pests/parasites
##  4  2015 Arizona                     NA Other                
##  5  2017 Other States                NA Pesticides           
##  6  2019 New York                    NA Unknown              
##  7  2021 Iowa                        NA Unknown              
##  8  2015 Connecticut                 NA Disesases            
##  9  2018 North Carolina              NA Other pests/parasites
## 10  2016 Nebraska                    NA Pesticides           
## 11  2020 Connecticut                 NA Unknown              
## 12  2020 Connecticut                 NA Pesticides           
## 13  2016 Georgia                     NA Other                
## 14  2020 North Dakota                NA Other pests/parasites
## 15  2021 New Jersey                  NA Pesticides           
## 16  2016 Arizona                     NA Pesticides           
## 17  2018 South Carolina              NA Varroa mites         
## 18  2019 Georgia                     NA Varroa mites         
## 19  2016 Missouri                    NA Other pests/parasites
## 20  2018 Other States                NA Pesticides

6. full_join

Describe the resulting data:

How is it different from the original two datasets? 4 columns compared to 3 columns 39 rows compared to 20 rows

colony_small %>% full_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 39 × 4
##     year state       colony_lost_pct stressor
##    <dbl> <chr>                 <dbl> <chr>   
##  1  2021 Oklahoma                 21 <NA>    
##  2  2018 New Mexico               52 <NA>    
##  3  2015 Indiana                  22 <NA>    
##  4  2018 Michigan                  9 <NA>    
##  5  2019 Connecticut               8 <NA>    
##  6  2019 Minnesota                16 <NA>    
##  7  2015 Utah                     13 <NA>    
##  8  2020 Georgia                  12 <NA>    
##  9  2016 Washington                7 <NA>    
## 10  2015 Louisiana                 4 <NA>    
## # ℹ 29 more rows

7. semi_join

Describe the resulting data:

How is it different from the original two datasets? 1 row compared to 20 rows

colony_small %>% semi_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 1 × 3
##    year state colony_lost_pct
##   <dbl> <chr>           <dbl>
## 1  2018 Idaho              13

8. anti_join

Describe the resulting data:

How is it different from the original two datasets? 19 rows compraed to 20 rows

colony_small %>% anti_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 19 × 3
##     year state       colony_lost_pct
##    <dbl> <chr>                 <dbl>
##  1  2021 Oklahoma                 21
##  2  2018 New Mexico               52
##  3  2015 Indiana                  22
##  4  2018 Michigan                  9
##  5  2019 Connecticut               8
##  6  2019 Minnesota                16
##  7  2015 Utah                     13
##  8  2020 Georgia                  12
##  9  2016 Washington                7
## 10  2015 Louisiana                 4
## 11  2017 Tennessee                12
## 12  2016 New York                  9
## 13  2021 Maine                     6
## 14  2016 Tennessee                10
## 15  2021 Arkansas                 13
## 16  2016 Arkansas                 19
## 17  2018 Hawaii                    6
## 18  2017 Michigan                 16
## 19  2015 Kentucky                 12