Import two related datasets from TidyTuesday Project.
colony <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2022/2022-01-11/colony.csv')
## Rows: 1222 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): months, state
## dbl (8): year, colony_n, colony_max, colony_lost, colony_lost_pct, colony_ad...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
stressor <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2022/2022-01-11/stressor.csv')
## Rows: 7332 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): months, state, stressor
## dbl (2): year, stress_pct
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: Colony
Data 2: Stressor
colony_small <- colony %>% select(state, colony_n, colony_lost_pct) %>% sample_n(10)
stressor_small <- stressor %>% select(state, stressor, stress_pct) %>% sample_n(10)
colony_small
## # A tibble: 10 × 3
## state colony_n colony_lost_pct
## <chr> <dbl> <dbl>
## 1 Oklahoma 5000 7
## 2 Missouri 7000 3
## 3 Kentucky 8500 13
## 4 Maryland 7500 9
## 5 Oklahoma 3700 NA
## 6 Minnesota 39000 6
## 7 Louisiana 55000 7
## 8 Ohio 15500 25
## 9 Mississippi 26000 6
## 10 Wyoming 27000 12
stressor_small
## # A tibble: 10 × 3
## state stressor stress_pct
## <chr> <chr> <dbl>
## 1 Illinois Varroa mites 21.2
## 2 Oklahoma Other pests/parasites 10.9
## 3 West Virginia Varroa mites 23.2
## 4 North Carolina Pesticides 2
## 5 New Jersey Unknown 2.3
## 6 California Other pests/parasites 15.8
## 7 California Pesticides 11.6
## 8 Georgia Pesticides 2.6
## 9 Mississippi Unknown 5.2
## 10 Ohio Pesticides 1.8
The number of rows might not reflect the actual number of rows from the code. It seems like the random 10 that is chosen after running the code initially is different from the one that happens when I knit the whole code
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% inner_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 4 × 5
## state colony_n colony_lost_pct stressor stress_pct
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Oklahoma 5000 7 Other pests/parasites 10.9
## 2 Oklahoma 3700 NA Other pests/parasites 10.9
## 3 Ohio 15500 25 Pesticides 1.8
## 4 Mississippi 26000 6 Unknown 5.2
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% left_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 10 × 5
## state colony_n colony_lost_pct stressor stress_pct
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Oklahoma 5000 7 Other pests/parasites 10.9
## 2 Missouri 7000 3 <NA> NA
## 3 Kentucky 8500 13 <NA> NA
## 4 Maryland 7500 9 <NA> NA
## 5 Oklahoma 3700 NA Other pests/parasites 10.9
## 6 Minnesota 39000 6 <NA> NA
## 7 Louisiana 55000 7 <NA> NA
## 8 Ohio 15500 25 Pesticides 1.8
## 9 Mississippi 26000 6 Unknown 5.2
## 10 Wyoming 27000 12 <NA> NA
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% right_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 11 × 5
## state colony_n colony_lost_pct stressor stress_pct
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Oklahoma 5000 7 Other pests/parasites 10.9
## 2 Oklahoma 3700 NA Other pests/parasites 10.9
## 3 Ohio 15500 25 Pesticides 1.8
## 4 Mississippi 26000 6 Unknown 5.2
## 5 Illinois NA NA Varroa mites 21.2
## 6 West Virginia NA NA Varroa mites 23.2
## 7 North Carolina NA NA Pesticides 2
## 8 New Jersey NA NA Unknown 2.3
## 9 California NA NA Other pests/parasites 15.8
## 10 California NA NA Pesticides 11.6
## 11 Georgia NA NA Pesticides 2.6
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% full_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 17 × 5
## state colony_n colony_lost_pct stressor stress_pct
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Oklahoma 5000 7 Other pests/parasites 10.9
## 2 Missouri 7000 3 <NA> NA
## 3 Kentucky 8500 13 <NA> NA
## 4 Maryland 7500 9 <NA> NA
## 5 Oklahoma 3700 NA Other pests/parasites 10.9
## 6 Minnesota 39000 6 <NA> NA
## 7 Louisiana 55000 7 <NA> NA
## 8 Ohio 15500 25 Pesticides 1.8
## 9 Mississippi 26000 6 Unknown 5.2
## 10 Wyoming 27000 12 <NA> NA
## 11 Illinois NA NA Varroa mites 21.2
## 12 West Virginia NA NA Varroa mites 23.2
## 13 North Carolina NA NA Pesticides 2
## 14 New Jersey NA NA Unknown 2.3
## 15 California NA NA Other pests/parasites 15.8
## 16 California NA NA Pesticides 11.6
## 17 Georgia NA NA Pesticides 2.6
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% semi_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 4 × 3
## state colony_n colony_lost_pct
## <chr> <dbl> <dbl>
## 1 Oklahoma 5000 7
## 2 Oklahoma 3700 NA
## 3 Ohio 15500 25
## 4 Mississippi 26000 6
Describe the resulting data:
How is it different from the original two datasets?
colony_small %>% anti_join(stressor_small)
## Joining with `by = join_by(state)`
## # A tibble: 6 × 3
## state colony_n colony_lost_pct
## <chr> <dbl> <dbl>
## 1 Missouri 7000 3
## 2 Kentucky 8500 13
## 3 Maryland 7500 9
## 4 Minnesota 39000 6
## 5 Louisiana 55000 7
## 6 Wyoming 27000 12