Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2022/2022-01-25/ratings.csv', show_col_types = FALSE)
details <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2022/2022-01-25/details.csv', show_col_types = FALSE)

2. Make data small

Describe the two datasets:

Data1: ratings

Columns: id, rank, name
Rows: 5 rows

Data2: details

Columns: id, minplaytime, yearpublished
Rows: 5 rows

set.seed(1234)
ratings_small <- ratings %>% select(id, rank, name) %>% sample_n(5)
details_small <- details %>% select(id, minplaytime, yearpublished) %>% sample_n(5)

ratings_small

## # A tibble: 5 × 3
##       id  rank name                        
##    <dbl> <dbl> <chr>                       
## 1   7483  6221 Schatz der Drachen          
## 2 234292  8733 13 Ghosts                   
## 3 304985  3541 Dark Ages: Holy Roman Empire
## 4  89403  8189 Big Five                    
## 5 208663 10188 Godforsaken Scavengers

details_small

## # A tibble: 5 × 3
##       id minplaytime yearpublished
##    <dbl>       <dbl>         <dbl>
## 1 213893          40          2017
## 2 256151          20          2018
## 3 143416          25          2015
## 4 180852          30          2016
## 5  98975         180          2012

3. inner_join

Describe the resulting data:

Columns: id, rank, name, minplaytime, yearpublished
Rows: 0 rows

How is it different from the original two datasets? There are no rows anymore. This tibble has all the columns from both datasets. It is by random chance that there are no matching id numbers so there can be no inner join.

inner_join(ratings_small, details_small, by = "id")

## # A tibble: 0 × 5
## # ℹ 5 variables: id <dbl>, rank <dbl>, name <chr>, minplaytime <dbl>,
## #   yearpublished <dbl>

4. left_join

Describe the resulting data:

Columns: id, rank, name, minplaytime, yearpublished
Rows: 5 rows

How is it different from the original two datasets? It has columns from both tibble’s columns combined, but it only uses ids from the ratings’s ids. Rank and name are preserved from the ratings dataset, but the minplaytime and yearpublished are NA because there were no rows with the ratings_small ids in the details_small table.

left_join(ratings_small, details_small, by = "id")

## # A tibble: 5 × 5
##       id  rank name                         minplaytime yearpublished
##    <dbl> <dbl> <chr>                              <dbl>         <dbl>
## 1   7483  6221 Schatz der Drachen                    NA            NA
## 2 234292  8733 13 Ghosts                             NA            NA
## 3 304985  3541 Dark Ages: Holy Roman Empire          NA            NA
## 4  89403  8189 Big Five                              NA            NA
## 5 208663 10188 Godforsaken Scavengers                NA            NA

5. right_join

Describe the resulting data:

Columns: id, rank, name, minplaytime, yearpublished
Rows: 5 rows

How is it different from the original two datasets? It has columns from both tibble’s columns combined, but it only uses ids from the details’s ids. minplaytime and yearpublished are preserved from the details dataset, but the rank and name are NA because there were no rows with the details_small ids in the ratings_small table.

right_join(ratings_small, details_small, by = "id")

## # A tibble: 5 × 5
##       id  rank name  minplaytime yearpublished
##    <dbl> <dbl> <chr>       <dbl>         <dbl>
## 1 213893    NA <NA>           40          2017
## 2 256151    NA <NA>           20          2018
## 3 143416    NA <NA>           25          2015
## 4 180852    NA <NA>           30          2016
## 5  98975    NA <NA>          180          2012

6. full_join

Describe the resulting data:

Columns: id, rank, name, minplaytime, yearpublished
Rows: 10 rows

How is it different from the original two datasets? It has columns from both tibble’s columns combined. It also has the entire tables stacked on top of each other with the items for the oppsing table’s items as NA. For example, rows in the details table appear with a rank as NA in this table.

full_join(ratings_small, details_small, by = "id")

## # A tibble: 10 × 5
##        id  rank name                         minplaytime yearpublished
##     <dbl> <dbl> <chr>                              <dbl>         <dbl>
##  1   7483  6221 Schatz der Drachen                    NA            NA
##  2 234292  8733 13 Ghosts                             NA            NA
##  3 304985  3541 Dark Ages: Holy Roman Empire          NA            NA
##  4  89403  8189 Big Five                              NA            NA
##  5 208663 10188 Godforsaken Scavengers                NA            NA
##  6 213893    NA <NA>                                  40          2017
##  7 256151    NA <NA>                                  20          2018
##  8 143416    NA <NA>                                  25          2015
##  9 180852    NA <NA>                                  30          2016
## 10  98975    NA <NA>                                 180          2012

7. semi_join

Describe the resulting data:

Columns: id, rank, name
Rows: 0 rows

How is it different from the original two datasets? It only has columns from the ratings dataset but it still does not have any rows because none of the ids match up.

semi_join(ratings_small, details_small, by = "id")

## # A tibble: 0 × 3
## # ℹ 3 variables: id <dbl>, rank <dbl>, name <chr>

8. anti_join

Describe the resulting data:

Columns: id, rank, name
Rows: 5 rows

How is it different from the original two datasets? It has only the columns from the ratings dataset, but it has all rows that do not share an id with a row from the details table.

anti_join(ratings_small, details_small, by = "id")

## # A tibble: 5 × 3
##       id  rank name                        
##    <dbl> <dbl> <chr>                       
## 1   7483  6221 Schatz der Drachen          
## 2 234292  8733 13 Ghosts                   
## 3 304985  3541 Dark Ages: Holy Roman Empire
## 4  89403  8189 Big Five                    
## 5 208663 10188 Godforsaken Scavengers

Week 9: Apply it to your data 8

Simon Champney

2026-03-10

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join