Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

groundhogs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-01-30/groundhogs.csv')

## Rows: 75 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): slug, shortname, name, city, region, country, source, current_pred...
## dbl  (4): id, latitude, longitude, predictions_count
## lgl  (2): is_groundhog, active
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

predictions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-01-30/predictions.csv')

## Rows: 1462 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): details
## dbl (2): id, year
## lgl (1): shadow
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1: groundhogs

Columns: id, shortname, city
Rows: 10 rows

Data 2: predictions

Columns: id, shadow, year
Rows: 10 rows

set.seed(1234)
groundhogs_small <- groundhogs %>% select(id, shortname, city) %>% sample_n(10)
predictions_small <- predictions %>% select(id, shadow, year) %>% sample_n(10)

groundhogs_small

## # A tibble: 10 × 3
##       id shortname city         
##    <dbl> <chr>     <chr>        
##  1    28 Stonewall Wantage      
##  2    22 Billy     Balzac       
##  3     9 Gertie    Hanna City   
##  4     5 Charlie   Athens       
##  5    38 Bill      Stephens City
##  6    16 Sam       Shubenacadie 
##  7     4 Jimmy     Sun Prairie  
##  8    14 Merv      Stonewall    
##  9    56 Phil      Washington DC
## 10    62 Gordy     Milwaukee

predictions_small

## # A tibble: 10 × 3
##       id shadow  year
##    <dbl> <lgl>  <dbl>
##  1     1 NA      1889
##  2     9 FALSE   2009
##  3    43 TRUE    2015
##  4    23 FALSE   2017
##  5    41 FALSE   2020
##  6    26 TRUE    2010
##  7     7 FALSE   2011
##  8    11 TRUE    2022
##  9    21 TRUE    2006
## 10    10 NA      1994

3. inner_join

Describe the resulting data:

Columns: id, shortname, city, shadow, year
Rows: 1

How is it different from the original two datasets? 1 row compared to 10 rows in original data set, and all columns from the two data sets

groundhogs_small %>% inner_join(predictions_small, by = c("id"))

## # A tibble: 1 × 5
##      id shortname city       shadow  year
##   <dbl> <chr>     <chr>      <lgl>  <dbl>
## 1     9 Gertie    Hanna City FALSE   2009

4. left_join

Describe the resulting data:

Columns: id, shortname, city, shadow, year
Rows: 10

How is it different from the original two datasets? shadow and year columns have NA besides the id from the innerjoin data set

groundhogs_small %>% left_join(predictions_small, by = c("id"))

## # A tibble: 10 × 5
##       id shortname city          shadow  year
##    <dbl> <chr>     <chr>         <lgl>  <dbl>
##  1    28 Stonewall Wantage       NA        NA
##  2    22 Billy     Balzac        NA        NA
##  3     9 Gertie    Hanna City    FALSE   2009
##  4     5 Charlie   Athens        NA        NA
##  5    38 Bill      Stephens City NA        NA
##  6    16 Sam       Shubenacadie  NA        NA
##  7     4 Jimmy     Sun Prairie   NA        NA
##  8    14 Merv      Stonewall     NA        NA
##  9    56 Phil      Washington DC NA        NA
## 10    62 Gordy     Milwaukee     NA        NA

5. right_join

Describe the resulting data:

Columns: id, shortname, city, shadow, year
Rows: 10

How is it different from the original two datasets? shortname and city have NA besides the id from the innerjoin data set

groundhogs_small %>% right_join(predictions_small, by = c("id"))

## # A tibble: 10 × 5
##       id shortname city       shadow  year
##    <dbl> <chr>     <chr>      <lgl>  <dbl>
##  1     9 Gertie    Hanna City FALSE   2009
##  2     1 <NA>      <NA>       NA      1889
##  3    43 <NA>      <NA>       TRUE    2015
##  4    23 <NA>      <NA>       FALSE   2017
##  5    41 <NA>      <NA>       FALSE   2020
##  6    26 <NA>      <NA>       TRUE    2010
##  7     7 <NA>      <NA>       FALSE   2011
##  8    11 <NA>      <NA>       TRUE    2022
##  9    21 <NA>      <NA>       TRUE    2006
## 10    10 <NA>      <NA>       NA      1994

6. full_join

Describe the resulting data:

Columns: id, shortname, city, shadow, year
Rows: 19

How is it different from the original two datasets? Both left and right join data sets are combined, 19 rows compared to 10 in the past two data sets

groundhogs_small %>% full_join(predictions_small, by = c("id"))

## # A tibble: 19 × 5
##       id shortname city          shadow  year
##    <dbl> <chr>     <chr>         <lgl>  <dbl>
##  1    28 Stonewall Wantage       NA        NA
##  2    22 Billy     Balzac        NA        NA
##  3     9 Gertie    Hanna City    FALSE   2009
##  4     5 Charlie   Athens        NA        NA
##  5    38 Bill      Stephens City NA        NA
##  6    16 Sam       Shubenacadie  NA        NA
##  7     4 Jimmy     Sun Prairie   NA        NA
##  8    14 Merv      Stonewall     NA        NA
##  9    56 Phil      Washington DC NA        NA
## 10    62 Gordy     Milwaukee     NA        NA
## 11     1 <NA>      <NA>          NA      1889
## 12    43 <NA>      <NA>          TRUE    2015
## 13    23 <NA>      <NA>          FALSE   2017
## 14    41 <NA>      <NA>          FALSE   2020
## 15    26 <NA>      <NA>          TRUE    2010
## 16     7 <NA>      <NA>          FALSE   2011
## 17    11 <NA>      <NA>          TRUE    2022
## 18    21 <NA>      <NA>          TRUE    2006
## 19    10 <NA>      <NA>          NA      1994

7. semi_join

Describe the resulting data:

Columns: id, shortname, city
Rows: 1

How is it different from the original two datasets? There are 3 columns compared to 5 in the previous data sets, 1 row compared to 19 in previous data set

groundhogs_small %>% semi_join(predictions_small, by = c("id"))

## # A tibble: 1 × 3
##      id shortname city      
##   <dbl> <chr>     <chr>     
## 1     9 Gertie    Hanna City

8. anti_join

Describe the resulting data:

Columns: id, shortname, city
Rows: 9

How is it different from the original two datasets? 9 rows compared to 1 row in previous data set

groundhogs_small %>% anti_join(predictions_small, by = c("id"))

## # A tibble: 9 × 3
##      id shortname city         
##   <dbl> <chr>     <chr>        
## 1    28 Stonewall Wantage      
## 2    22 Billy     Balzac       
## 3     5 Charlie   Athens       
## 4    38 Bill      Stephens City
## 5    16 Sam       Shubenacadie 
## 6     4 Jimmy     Sun Prairie  
## 7    14 Merv      Stonewall    
## 8    56 Phil      Washington DC
## 9    62 Gordy     Milwaukee

Week 9: Apply it to your data 8

Nick Sobalo

2022-10-21

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join