bee_colonies <- read_excel("../00_data/MyData3.xlsx")
stressors <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11/stressor.csv')
## Rows: 7332 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): months, state, stressor
## dbl (2): year, stress_pct
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
set.seed(123)
bee_colonies_smaller2 <- bee_colonies %>%
select(year, months, state, colony_size, colony_lost) %>%
sample_n(10)
set.seed(123)
stressors_small <- stressors %>%
select(year, months, state, stressor) %>%
sample_n(10)
bee_colonies_smaller2
## # A tibble: 10 × 5
## year months state colony_size colony_lost
## <dbl> <chr> <chr> <dbl> <dbl>
## 1 2017 January-March Utah 16000 2700
## 2 2017 April-June Vermont 6000 170
## 3 2015 October-December Texas 125000 25000
## 4 2017 October-December Hawaii 15000 130
## 5 2016 January-March Florida 245000 45000
## 6 2019 October-December Wyoming 27000 3300
## 7 2021 January-March Kansas 5000 1400
## 8 2020 July-September California 640000 69000
## 9 2018 July-September Florida 197000 30000
## 10 2018 January-March Texas 205000 22000
stressors_small
## # A tibble: 10 × 4
## year months state stressor
## <dbl> <chr> <chr> <chr>
## 1 2017 January-March South Carolina Disesases
## 2 2017 January-March West Virginia Disesases
## 3 2016 October-December West Virginia Varroa mites
## 4 2015 April-June Virginia Pesticides
## 5 2018 October-December Illinois Varroa mites
## 6 2017 July-September New York Pesticides
## 7 2016 July-September Nebraska Unknown
## 8 2016 January-March Arkansas Other pests/parasites
## 9 2017 October-December Wyoming Other
## 10 2019 July-September Other States Disesases
The resulting varibles from “bee_colonies” and “stressors”, has common variables. The variables from the dataset “stressors” is “year”, “months” and “state”, while “year”, “months”, and “state is from dataset bee_colonies. I chose the variable”state”. The resulting dataset has one row, and 8 columns.
bee_colonies_smaller2 %>% inner_join(stressors_small)
## Joining with `by = join_by(year, months, state)`
## # A tibble: 0 × 6
## # ℹ 6 variables: year <dbl>, months <chr>, state <chr>, colony_size <dbl>,
## # colony_lost <dbl>, stressor <chr>
stressors_small %>% inner_join(bee_colonies_smaller2)
## Joining with `by = join_by(year, months, state)`
## # A tibble: 0 × 6
## # ℹ 6 variables: year <dbl>, months <chr>, state <chr>, stressor <chr>,
## # colony_size <dbl>, colony_lost <dbl>
The resulting dataset has one row and one column. This data does not have the best related variables to perform the function.
bee_colonies_smaller2 %>% left_join(stressors_small)
## Joining with `by = join_by(year, months, state)`
## # A tibble: 10 × 6
## year months state colony_size colony_lost stressor
## <dbl> <chr> <chr> <dbl> <dbl> <chr>
## 1 2017 January-March Utah 16000 2700 <NA>
## 2 2017 April-June Vermont 6000 170 <NA>
## 3 2015 October-December Texas 125000 25000 <NA>
## 4 2017 October-December Hawaii 15000 130 <NA>
## 5 2016 January-March Florida 245000 45000 <NA>
## 6 2019 October-December Wyoming 27000 3300 <NA>
## 7 2021 January-March Kansas 5000 1400 <NA>
## 8 2020 July-September California 640000 69000 <NA>
## 9 2018 July-September Florida 197000 30000 <NA>
## 10 2018 January-March Texas 205000 22000 <NA>
The resulting dataset has 10 rows and 8 columns. This data does not have the best related variables to perform the function.
bee_colonies_smaller2 %>% right_join(stressors_small)
## Joining with `by = join_by(year, months, state)`
## # A tibble: 10 × 6
## year months state colony_size colony_lost stressor
## <dbl> <chr> <chr> <dbl> <dbl> <chr>
## 1 2017 January-March South Carolina NA NA Disesases
## 2 2017 January-March West Virginia NA NA Disesases
## 3 2016 October-December West Virginia NA NA Varroa mites
## 4 2015 April-June Virginia NA NA Pesticides
## 5 2018 October-December Illinois NA NA Varroa mites
## 6 2017 July-September New York NA NA Pesticides
## 7 2016 July-September Nebraska NA NA Unknown
## 8 2016 January-March Arkansas NA NA Other pests/pa…
## 9 2017 October-December Wyoming NA NA Other
## 10 2019 July-September Other States NA NA Disesases
bee_colonies_smaller2 %>% full_join(stressors_small)
## Joining with `by = join_by(year, months, state)`
## # A tibble: 20 × 6
## year months state colony_size colony_lost stressor
## <dbl> <chr> <chr> <dbl> <dbl> <chr>
## 1 2017 January-March Utah 16000 2700 <NA>
## 2 2017 April-June Vermont 6000 170 <NA>
## 3 2015 October-December Texas 125000 25000 <NA>
## 4 2017 October-December Hawaii 15000 130 <NA>
## 5 2016 January-March Florida 245000 45000 <NA>
## 6 2019 October-December Wyoming 27000 3300 <NA>
## 7 2021 January-March Kansas 5000 1400 <NA>
## 8 2020 July-September California 640000 69000 <NA>
## 9 2018 July-September Florida 197000 30000 <NA>
## 10 2018 January-March Texas 205000 22000 <NA>
## 11 2017 January-March South Carolina NA NA Disesases
## 12 2017 January-March West Virginia NA NA Disesases
## 13 2016 October-December West Virginia NA NA Varroa mites
## 14 2015 April-June Virginia NA NA Pesticides
## 15 2018 October-December Illinois NA NA Varroa mites
## 16 2017 July-September New York NA NA Pesticides
## 17 2016 July-September Nebraska NA NA Unknown
## 18 2016 January-March Arkansas NA NA Other pests/pa…
## 19 2017 October-December Wyoming NA NA Other
## 20 2019 July-September Other States NA NA Disesases
stressors_small %>% semi_join(bee_colonies_smaller2)
## Joining with `by = join_by(year, months, state)`
## # A tibble: 0 × 4
## # ℹ 4 variables: year <dbl>, months <chr>, state <chr>, stressor <chr>
bee_colonies_smaller2 %>% anti_join(stressors_small)
## Joining with `by = join_by(year, months, state)`
## # A tibble: 10 × 5
## year months state colony_size colony_lost
## <dbl> <chr> <chr> <dbl> <dbl>
## 1 2017 January-March Utah 16000 2700
## 2 2017 April-June Vermont 6000 170
## 3 2015 October-December Texas 125000 25000
## 4 2017 October-December Hawaii 15000 130
## 5 2016 January-March Florida 245000 45000
## 6 2019 October-December Wyoming 27000 3300
## 7 2021 January-March Kansas 5000 1400
## 8 2020 July-September California 640000 69000
## 9 2018 July-September Florida 197000 30000
## 10 2018 January-March Texas 205000 22000
The primary keys in my data bee_colonies, is a combination of the variables year, months, state and colony_max, when using count() function to count n = 1 for each row and observation.