Import two related datasets from TidyTuesday Project.
colony <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2022/2022-01-11/colony.csv')
## Rows: 1222 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): months, state
## dbl (8): year, colony_n, colony_max, colony_lost, colony_lost_pct, colony_ad...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
stressor <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2022/2022-01-11/stressor.csv')
## Rows: 7332 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): months, state, stressor
## dbl (2): year, stress_pct
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: Colony
Data 2: Stressor
set.seed(123)
colony_small <- colony %>% select(state, colony_n, colony_lost_pct) %>% sample_n(10)
stressor_small <- stressor %>% select(state, stressor, stress_pct) %>% sample_n(10)
colony_small
## # A tibble: 10 × 3
## state colony_n colony_lost_pct
## <chr> <dbl> <dbl>
## 1 Utah 16000 15
## 2 Vermont 6000 2
## 3 Texas 125000 10
## 4 Hawaii 15000 1
## 5 Florida 245000 17
## 6 Wyoming 27000 12
## 7 Kansas 5000 22
## 8 California 640000 10
## 9 Florida 197000 14
## 10 Texas 205000 8
stressor_small
## # A tibble: 10 × 3
## state stressor stress_pct
## <chr> <chr> <dbl>
## 1 Tennessee Varroa mites 40.9
## 2 Tennessee Disesases 1
## 3 Connecticut Varroa mites 24.3
## 4 Missouri Disesases 0.4
## 5 Maryland Other 0.2
## 6 South Dakota Pesticides 1.6
## 7 North Carolina Disesases 0.5
## 8 Florida Disesases 7.7
## 9 Michigan Disesases 9.2
## 10 Indiana Other pests/parasites 5.9
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% inner_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 2 × 5
## state colony_n colony_lost_pct stressor stress_pct
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Florida 245000 17 Disesases 7.7
## 2 Florida 197000 14 Disesases 7.7
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% left_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 10 × 5
## state colony_n colony_lost_pct stressor stress_pct
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Utah 16000 15 <NA> NA
## 2 Vermont 6000 2 <NA> NA
## 3 Texas 125000 10 <NA> NA
## 4 Hawaii 15000 1 <NA> NA
## 5 Florida 245000 17 Disesases 7.7
## 6 Wyoming 27000 12 <NA> NA
## 7 Kansas 5000 22 <NA> NA
## 8 California 640000 10 <NA> NA
## 9 Florida 197000 14 Disesases 7.7
## 10 Texas 205000 8 <NA> NA
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% right_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 11 × 5
## state colony_n colony_lost_pct stressor stress_pct
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Florida 245000 17 Disesases 7.7
## 2 Florida 197000 14 Disesases 7.7
## 3 Tennessee NA NA Varroa mites 40.9
## 4 Tennessee NA NA Disesases 1
## 5 Connecticut NA NA Varroa mites 24.3
## 6 Missouri NA NA Disesases 0.4
## 7 Maryland NA NA Other 0.2
## 8 South Dakota NA NA Pesticides 1.6
## 9 North Carolina NA NA Disesases 0.5
## 10 Michigan NA NA Disesases 9.2
## 11 Indiana NA NA Other pests/parasites 5.9
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% full_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 19 × 5
## state colony_n colony_lost_pct stressor stress_pct
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Utah 16000 15 <NA> NA
## 2 Vermont 6000 2 <NA> NA
## 3 Texas 125000 10 <NA> NA
## 4 Hawaii 15000 1 <NA> NA
## 5 Florida 245000 17 Disesases 7.7
## 6 Wyoming 27000 12 <NA> NA
## 7 Kansas 5000 22 <NA> NA
## 8 California 640000 10 <NA> NA
## 9 Florida 197000 14 Disesases 7.7
## 10 Texas 205000 8 <NA> NA
## 11 Tennessee NA NA Varroa mites 40.9
## 12 Tennessee NA NA Disesases 1
## 13 Connecticut NA NA Varroa mites 24.3
## 14 Missouri NA NA Disesases 0.4
## 15 Maryland NA NA Other 0.2
## 16 South Dakota NA NA Pesticides 1.6
## 17 North Carolina NA NA Disesases 0.5
## 18 Michigan NA NA Disesases 9.2
## 19 Indiana NA NA Other pests/parasites 5.9
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% semi_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 2 × 3
## state colony_n colony_lost_pct
## <chr> <dbl> <dbl>
## 1 Florida 245000 17
## 2 Florida 197000 14
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% anti_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 8 × 3
## state colony_n colony_lost_pct
## <chr> <dbl> <dbl>
## 1 Utah 16000 15
## 2 Vermont 6000 2
## 3 Texas 125000 10
## 4 Hawaii 15000 1
## 5 Wyoming 27000 12
## 6 Kansas 5000 22
## 7 California 640000 10
## 8 Texas 205000 8