Import two related datasets from TidyTuesday Project.
groundhogs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-01-30/groundhogs.csv')
## Rows: 75 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): slug, shortname, name, city, region, country, source, current_pred...
## dbl (4): id, latitude, longitude, predictions_count
## lgl (2): is_groundhog, active
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
predictions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-01-30/predictions.csv')
## Rows: 1462 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): details
## dbl (2): id, year
## lgl (1): shadow
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: Groundhogs
Data 2: predictions
groundhogs_small <- groundhogs %>% select(shortname, id, region ) %>% sample_n(10)
predictions_small <- predictions %>% select(id, year, shadow) %>% sample_n(10)
groundhogs_small
## # A tibble: 10 × 3
## shortname id region
## <chr> <dbl> <chr>
## 1 Chuck 47 Virginia
## 2 Hank 54 Pennsylvania
## 3 Lucy 42 Pennsylvania
## 4 Beau 7 Georgia
## 5 Snowy 50 Washington
## 6 Stonewall 28 New Jersey
## 7 Slew 30 Washington
## 8 Dave 10 New York
## 9 Mike 35 Ontario
## 10 Willie 3 Ontario
predictions_small
## # A tibble: 10 × 3
## id year shadow
## <dbl> <dbl> <lgl>
## 1 62 2021 FALSE
## 2 4 1981 TRUE
## 3 35 2016 FALSE
## 4 52 2018 TRUE
## 5 1 1942 TRUE
## 6 14 2019 FALSE
## 7 1 1966 TRUE
## 8 2 2000 TRUE
## 9 9 1996 FALSE
## 10 6 2001 FALSE
Describe the resulting data:
How is it different from the original two datasets? * 1 row compared to 10 in orginal data set
groundhogs_small %>% inner_join(predictions_small)
## Joining with `by = join_by(id)`
## # A tibble: 1 × 5
## shortname id region year shadow
## <chr> <dbl> <chr> <dbl> <lgl>
## 1 Mike 35 Ontario 2016 FALSE
Describe the resulting data:
How is it different from the original two datasets?
groundhogs_small %>% inner_join(predictions_small)
## Joining with `by = join_by(id)`
## # A tibble: 1 × 5
## shortname id region year shadow
## <chr> <dbl> <chr> <dbl> <lgl>
## 1 Mike 35 Ontario 2016 FALSE
Describe the resulting data:
How is it different from the original two datasets?
groundhogs_small %>% inner_join(predictions_small)
## Joining with `by = join_by(id)`
## # A tibble: 1 × 5
## shortname id region year shadow
## <chr> <dbl> <chr> <dbl> <lgl>
## 1 Mike 35 Ontario 2016 FALSE
Describe the resulting data:
How is it different from the original two datasets?
groundhogs_small %>% inner_join(predictions_small)
## Joining with `by = join_by(id)`
## # A tibble: 1 × 5
## shortname id region year shadow
## <chr> <dbl> <chr> <dbl> <lgl>
## 1 Mike 35 Ontario 2016 FALSE
Describe the resulting data:
How is it different from the original two datasets?
groundhogs_small %>% inner_join(predictions_small)
## Joining with `by = join_by(id)`
## # A tibble: 1 × 5
## shortname id region year shadow
## <chr> <dbl> <chr> <dbl> <lgl>
## 1 Mike 35 Ontario 2016 FALSE
Describe the resulting data:
How is it different from the original two datasets?
groundhogs_small %>% inner_join(predictions_small)
## Joining with `by = join_by(id)`
## # A tibble: 1 × 5
## shortname id region year shadow
## <chr> <dbl> <chr> <dbl> <lgl>
## 1 Mike 35 Ontario 2016 FALSE