Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

groundhogs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-01-30/groundhogs.csv')

## Rows: 75 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): slug, shortname, name, city, region, country, source, current_pred...
## dbl  (4): id, latitude, longitude, predictions_count
## lgl  (2): is_groundhog, active
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

predictions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-01-30/predictions.csv')

## Rows: 1462 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): details
## dbl (2): id, year
## lgl (1): shadow
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1: Groundhogs

Columns: shortname, id, region
Rows: 10

Data 2: predictions

Columns: id, year, shadow
Rows: 10

groundhogs_small <- groundhogs %>% select(shortname, id, region ) %>% sample_n(10) 
predictions_small <- predictions %>% select(id, year, shadow) %>% sample_n(10)

groundhogs_small

## # A tibble: 10 × 3
##    shortname    id region      
##    <chr>     <dbl> <chr>       
##  1 Chuck        47 Virginia    
##  2 Hank         54 Pennsylvania
##  3 Lucy         42 Pennsylvania
##  4 Beau          7 Georgia     
##  5 Snowy        50 Washington  
##  6 Stonewall    28 New Jersey  
##  7 Slew         30 Washington  
##  8 Dave         10 New York    
##  9 Mike         35 Ontario     
## 10 Willie        3 Ontario

predictions_small

## # A tibble: 10 × 3
##       id  year shadow
##    <dbl> <dbl> <lgl> 
##  1    62  2021 FALSE 
##  2     4  1981 TRUE  
##  3    35  2016 FALSE 
##  4    52  2018 TRUE  
##  5     1  1942 TRUE  
##  6    14  2019 FALSE 
##  7     1  1966 TRUE  
##  8     2  2000 TRUE  
##  9     9  1996 FALSE 
## 10     6  2001 FALSE

3. inner_join

Describe the resulting data:

Columns: shortname, id, region, year, shadow
Rows:1

How is it different from the original two datasets? * 1 row compared to 10 in orginal data set

groundhogs_small %>% inner_join(predictions_small)

## Joining with `by = join_by(id)`

## # A tibble: 1 × 5
##   shortname    id region   year shadow
##   <chr>     <dbl> <chr>   <dbl> <lgl> 
## 1 Mike         35 Ontario  2016 FALSE

4. left_join

Describe the resulting data:

Columns: 5
Rows: 2

How is it different from the original two datasets?

groundhogs_small %>% inner_join(predictions_small)

## Joining with `by = join_by(id)`

## # A tibble: 1 × 5
##   shortname    id region   year shadow
##   <chr>     <dbl> <chr>   <dbl> <lgl> 
## 1 Mike         35 Ontario  2016 FALSE

5. right_join

Describe the resulting data:

Columns:5
Rows:2

How is it different from the original two datasets?

groundhogs_small %>% inner_join(predictions_small)

## Joining with `by = join_by(id)`

## # A tibble: 1 × 5
##   shortname    id region   year shadow
##   <chr>     <dbl> <chr>   <dbl> <lgl> 
## 1 Mike         35 Ontario  2016 FALSE

6. full_join

Describe the resulting data:

Columns:5
Rows:1

How is it different from the original two datasets?

groundhogs_small %>% inner_join(predictions_small)

## Joining with `by = join_by(id)`

## # A tibble: 1 × 5
##   shortname    id region   year shadow
##   <chr>     <dbl> <chr>   <dbl> <lgl> 
## 1 Mike         35 Ontario  2016 FALSE

7. semi_join

Describe the resulting data:

Columns:5
Rows:1

How is it different from the original two datasets?

groundhogs_small %>% inner_join(predictions_small)

## Joining with `by = join_by(id)`

## # A tibble: 1 × 5
##   shortname    id region   year shadow
##   <chr>     <dbl> <chr>   <dbl> <lgl> 
## 1 Mike         35 Ontario  2016 FALSE

8. anti_join

Describe the resulting data:

Columns:5
Rows:1

How is it different from the original two datasets?

groundhogs_small %>% inner_join(predictions_small)

## Joining with `by = join_by(id)`

## # A tibble: 1 × 5
##   shortname    id region   year shadow
##   <chr>     <dbl> <chr>   <dbl> <lgl> 
## 1 Mike         35 Ontario  2016 FALSE

Week 9: Apply it to your data 8

Christa Imbriano

2022-10-05

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join