Import two related datasets from TidyTuesday Project.
colony <- read_excel("../00_data/Colony.xlsx")
stressor <- read_excel("../00_data/Stressor.xlsx")
Describe the two datasets:
Data1: Colony
Data 2: Stressor
set.seed(8)
colony_small <- colony %>% select(year, state, colony_lost_pct) %>%
sample_n(20)
stressor_small <- stressor %>% select(year, state, stressor) %>%
sample_n(20)
colony_small
## # A tibble: 20 × 3
## year state colony_lost_pct
## <dbl> <chr> <dbl>
## 1 2021 Oklahoma 21
## 2 2018 New Mexico 52
## 3 2015 Indiana 22
## 4 2018 Michigan 9
## 5 2019 Connecticut 8
## 6 2019 Minnesota 16
## 7 2015 Utah 13
## 8 2020 Georgia 12
## 9 2016 Washington 7
## 10 2015 Louisiana 4
## 11 2017 Tennessee 12
## 12 2018 Idaho 13
## 13 2016 New York 9
## 14 2021 Maine 6
## 15 2016 Tennessee 10
## 16 2021 Arkansas 13
## 17 2016 Arkansas 19
## 18 2018 Hawaii 6
## 19 2017 Michigan 16
## 20 2015 Kentucky 12
stressor_small
## # A tibble: 20 × 3
## year state stressor
## <dbl> <chr> <chr>
## 1 2020 Other States Other
## 2 2015 Georgia Other pests/parasites
## 3 2015 Arizona Other
## 4 2017 Other States Pesticides
## 5 2019 New York Unknown
## 6 2021 Iowa Unknown
## 7 2015 Connecticut Disesases
## 8 2018 North Carolina Other pests/parasites
## 9 2016 Nebraska Pesticides
## 10 2020 Connecticut Unknown
## 11 2020 Connecticut Pesticides
## 12 2016 Georgia Other
## 13 2020 North Dakota Other pests/parasites
## 14 2021 New Jersey Pesticides
## 15 2016 Arizona Pesticides
## 16 2018 South Carolina Varroa mites
## 17 2019 Georgia Varroa mites
## 18 2016 Missouri Other pests/parasites
## 19 2018 Other States Pesticides
## 20 2018 Idaho Other
Describe the resulting data:
How is it different from the original two datasets? 1 row compared to 20 rows
colony_small %>% inner_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 1 × 4
## year state colony_lost_pct stressor
## <dbl> <chr> <dbl> <chr>
## 1 2018 Idaho 13 Other
Describe the resulting data:
How is it different from the original two datasets? 4 columns compared to 3 columns
colony_small %>% left_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 20 × 4
## year state colony_lost_pct stressor
## <dbl> <chr> <dbl> <chr>
## 1 2021 Oklahoma 21 <NA>
## 2 2018 New Mexico 52 <NA>
## 3 2015 Indiana 22 <NA>
## 4 2018 Michigan 9 <NA>
## 5 2019 Connecticut 8 <NA>
## 6 2019 Minnesota 16 <NA>
## 7 2015 Utah 13 <NA>
## 8 2020 Georgia 12 <NA>
## 9 2016 Washington 7 <NA>
## 10 2015 Louisiana 4 <NA>
## 11 2017 Tennessee 12 <NA>
## 12 2018 Idaho 13 Other
## 13 2016 New York 9 <NA>
## 14 2021 Maine 6 <NA>
## 15 2016 Tennessee 10 <NA>
## 16 2021 Arkansas 13 <NA>
## 17 2016 Arkansas 19 <NA>
## 18 2018 Hawaii 6 <NA>
## 19 2017 Michigan 16 <NA>
## 20 2015 Kentucky 12 <NA>
Describe the resulting data:
How is it different from the original two datasets? 4 columns compared to 3 columns
colony_small %>% right_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 20 × 4
## year state colony_lost_pct stressor
## <dbl> <chr> <dbl> <chr>
## 1 2018 Idaho 13 Other
## 2 2020 Other States NA Other
## 3 2015 Georgia NA Other pests/parasites
## 4 2015 Arizona NA Other
## 5 2017 Other States NA Pesticides
## 6 2019 New York NA Unknown
## 7 2021 Iowa NA Unknown
## 8 2015 Connecticut NA Disesases
## 9 2018 North Carolina NA Other pests/parasites
## 10 2016 Nebraska NA Pesticides
## 11 2020 Connecticut NA Unknown
## 12 2020 Connecticut NA Pesticides
## 13 2016 Georgia NA Other
## 14 2020 North Dakota NA Other pests/parasites
## 15 2021 New Jersey NA Pesticides
## 16 2016 Arizona NA Pesticides
## 17 2018 South Carolina NA Varroa mites
## 18 2019 Georgia NA Varroa mites
## 19 2016 Missouri NA Other pests/parasites
## 20 2018 Other States NA Pesticides
Describe the resulting data:
How is it different from the original two datasets? 4 columns compared to 3 columns 39 rows compared to 20 rows
colony_small %>% full_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 39 × 4
## year state colony_lost_pct stressor
## <dbl> <chr> <dbl> <chr>
## 1 2021 Oklahoma 21 <NA>
## 2 2018 New Mexico 52 <NA>
## 3 2015 Indiana 22 <NA>
## 4 2018 Michigan 9 <NA>
## 5 2019 Connecticut 8 <NA>
## 6 2019 Minnesota 16 <NA>
## 7 2015 Utah 13 <NA>
## 8 2020 Georgia 12 <NA>
## 9 2016 Washington 7 <NA>
## 10 2015 Louisiana 4 <NA>
## # ℹ 29 more rows
Describe the resulting data:
How is it different from the original two datasets? 1 row compared to 20 rows
colony_small %>% semi_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 1 × 3
## year state colony_lost_pct
## <dbl> <chr> <dbl>
## 1 2018 Idaho 13
Describe the resulting data:
How is it different from the original two datasets? 19 rows compraed to 20 rows
colony_small %>% anti_join(stressor_small)
## Joining with `by = join_by(year, state)`
## # A tibble: 19 × 3
## year state colony_lost_pct
## <dbl> <chr> <dbl>
## 1 2021 Oklahoma 21
## 2 2018 New Mexico 52
## 3 2015 Indiana 22
## 4 2018 Michigan 9
## 5 2019 Connecticut 8
## 6 2019 Minnesota 16
## 7 2015 Utah 13
## 8 2020 Georgia 12
## 9 2016 Washington 7
## 10 2015 Louisiana 4
## 11 2017 Tennessee 12
## 12 2016 New York 9
## 13 2021 Maine 6
## 14 2016 Tennessee 10
## 15 2021 Arkansas 13
## 16 2016 Arkansas 19
## 17 2018 Hawaii 6
## 18 2017 Michigan 16
## 19 2015 Kentucky 12