Import two related datasets from TidyTuesday Project.
groundhogs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-01-30/groundhogs.csv')
## Rows: 75 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): slug, shortname, name, city, region, country, source, current_pred...
## dbl (4): id, latitude, longitude, predictions_count
## lgl (2): is_groundhog, active
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
predictions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-01-30/predictions.csv')
## Rows: 1462 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): details
## dbl (2): id, year
## lgl (1): shadow
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: groundhogs
Data 2: predictions
set.seed(1234)
groundhogs_small <- groundhogs %>% select(id, shortname, city) %>% sample_n(10)
predictions_small <- predictions %>% select(id, shadow, year) %>% sample_n(10)
groundhogs_small
## # A tibble: 10 × 3
## id shortname city
## <dbl> <chr> <chr>
## 1 28 Stonewall Wantage
## 2 22 Billy Balzac
## 3 9 Gertie Hanna City
## 4 5 Charlie Athens
## 5 38 Bill Stephens City
## 6 16 Sam Shubenacadie
## 7 4 Jimmy Sun Prairie
## 8 14 Merv Stonewall
## 9 56 Phil Washington DC
## 10 62 Gordy Milwaukee
predictions_small
## # A tibble: 10 × 3
## id shadow year
## <dbl> <lgl> <dbl>
## 1 1 NA 1889
## 2 9 FALSE 2009
## 3 43 TRUE 2015
## 4 23 FALSE 2017
## 5 41 FALSE 2020
## 6 26 TRUE 2010
## 7 7 FALSE 2011
## 8 11 TRUE 2022
## 9 21 TRUE 2006
## 10 10 NA 1994
Describe the resulting data:
How is it different from the original two datasets? 1 row compared to 10 rows in original data set, and all columns from the two data sets
groundhogs_small %>% inner_join(predictions_small, by = c("id"))
## # A tibble: 1 × 5
## id shortname city shadow year
## <dbl> <chr> <chr> <lgl> <dbl>
## 1 9 Gertie Hanna City FALSE 2009
Describe the resulting data:
How is it different from the original two datasets? shadow and year columns have NA besides the id from the innerjoin data set
groundhogs_small %>% left_join(predictions_small, by = c("id"))
## # A tibble: 10 × 5
## id shortname city shadow year
## <dbl> <chr> <chr> <lgl> <dbl>
## 1 28 Stonewall Wantage NA NA
## 2 22 Billy Balzac NA NA
## 3 9 Gertie Hanna City FALSE 2009
## 4 5 Charlie Athens NA NA
## 5 38 Bill Stephens City NA NA
## 6 16 Sam Shubenacadie NA NA
## 7 4 Jimmy Sun Prairie NA NA
## 8 14 Merv Stonewall NA NA
## 9 56 Phil Washington DC NA NA
## 10 62 Gordy Milwaukee NA NA
Describe the resulting data:
How is it different from the original two datasets? shortname and city have NA besides the id from the innerjoin data set
groundhogs_small %>% right_join(predictions_small, by = c("id"))
## # A tibble: 10 × 5
## id shortname city shadow year
## <dbl> <chr> <chr> <lgl> <dbl>
## 1 9 Gertie Hanna City FALSE 2009
## 2 1 <NA> <NA> NA 1889
## 3 43 <NA> <NA> TRUE 2015
## 4 23 <NA> <NA> FALSE 2017
## 5 41 <NA> <NA> FALSE 2020
## 6 26 <NA> <NA> TRUE 2010
## 7 7 <NA> <NA> FALSE 2011
## 8 11 <NA> <NA> TRUE 2022
## 9 21 <NA> <NA> TRUE 2006
## 10 10 <NA> <NA> NA 1994
Describe the resulting data:
How is it different from the original two datasets? Both left and right join data sets are combined, 19 rows compared to 10 in the past two data sets
groundhogs_small %>% full_join(predictions_small, by = c("id"))
## # A tibble: 19 × 5
## id shortname city shadow year
## <dbl> <chr> <chr> <lgl> <dbl>
## 1 28 Stonewall Wantage NA NA
## 2 22 Billy Balzac NA NA
## 3 9 Gertie Hanna City FALSE 2009
## 4 5 Charlie Athens NA NA
## 5 38 Bill Stephens City NA NA
## 6 16 Sam Shubenacadie NA NA
## 7 4 Jimmy Sun Prairie NA NA
## 8 14 Merv Stonewall NA NA
## 9 56 Phil Washington DC NA NA
## 10 62 Gordy Milwaukee NA NA
## 11 1 <NA> <NA> NA 1889
## 12 43 <NA> <NA> TRUE 2015
## 13 23 <NA> <NA> FALSE 2017
## 14 41 <NA> <NA> FALSE 2020
## 15 26 <NA> <NA> TRUE 2010
## 16 7 <NA> <NA> FALSE 2011
## 17 11 <NA> <NA> TRUE 2022
## 18 21 <NA> <NA> TRUE 2006
## 19 10 <NA> <NA> NA 1994
Describe the resulting data:
How is it different from the original two datasets? There are 3 columns compared to 5 in the previous data sets, 1 row compared to 19 in previous data set
groundhogs_small %>% semi_join(predictions_small, by = c("id"))
## # A tibble: 1 × 3
## id shortname city
## <dbl> <chr> <chr>
## 1 9 Gertie Hanna City
Describe the resulting data:
How is it different from the original two datasets? 9 rows compared to 1 row in previous data set
groundhogs_small %>% anti_join(predictions_small, by = c("id"))
## # A tibble: 9 × 3
## id shortname city
## <dbl> <chr> <chr>
## 1 28 Stonewall Wantage
## 2 22 Billy Balzac
## 3 5 Charlie Athens
## 4 38 Bill Stephens City
## 5 16 Sam Shubenacadie
## 6 4 Jimmy Sun Prairie
## 7 14 Merv Stonewall
## 8 56 Phil Washington DC
## 9 62 Gordy Milwaukee