Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

movies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-07-29/movies.csv')

## Rows: 36121 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): source, report, title, available_globally, runtime
## dbl  (2): hours_viewed, views
## date (1): release_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

shows <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-07-29/shows.csv')

## Rows: 27803 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): source, report, title, available_globally, runtime
## dbl  (2): hours_viewed, views
## date (1): release_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1: shows

Columns: report, title, and hours viewed
Rows: 10 rows

Data 2: movies

Columns: report, title, and views
Rows: 10 rows

set.seed(1234)
shows_small <- shows %>% select(report, title, hours_viewed) %>% sample_n(10)
movies_small <- movies %>% select(report, title, views) %>% sample_n(10)

shows_small

## # A tibble: 10 × 3
##    report      title                                                hours_viewed
##    <chr>       <chr>                                                       <dbl>
##  1 2025Jan-Jun Pete Davidson: Alive From New York                         100000
##  2 2024Jul-Dec The Boss Baby: Back in Business: Season 4                19800000
##  3 2025Jan-Jun Kawatani Yuichi -Nihontouitsu Gaiden-: Season 1 // …       200000
##  4 2024Jul-Dec Sex and the City: Season 5                               14800000
##  5 2023Jul-Dec Story Time Book: Read-Along: Season 1                     3300000
##  6 2024Jul-Dec Super Monsters: Season 3                                  3300000
##  7 2025Jan-Jun Suits (2011): Season 9                                   26800000
##  8 2024Jan-Jun The Golden Hour: Season 1 // Het gouden uur: Seizoe…     13700000
##  9 2024Jul-Dec Star Trek: Enterprise: Season 1                           7200000
## 10 2025Jan-Jun Go Dog Go: Season 1                                       9300000

movies_small

## # A tibble: 10 × 3
##    report      title                                                       views
##    <chr>       <chr>                                                       <dbl>
##  1 2023Jul-Dec The K.E.OP/S System // El sistema K.E.O.P/S                1   e5
##  2 2024Jan-Jun The Emoji Movie                                            1.68e7
##  3 2024Jul-Dec The Wedding // Kasal                                       1   e5
##  4 2024Jan-Jun Tricky Old Dogs // Les vieux fourneaux                     1   e6
##  5 2025Jan-Jun Babylon (2022)                                             7   e5
##  6 2025Jan-Jun Through Thick and Thin // V Dobrém I Zlém                  6   e5
##  7 2024Jan-Jun The Equalizer 3                                            5.35e7
##  8 2023Jul-Dec No Game, No Life the Movie: Zero // ノーゲーム・ノーライフ ゼロ…… 1   e5
##  9 2024Jul-Dec The Cursed: Dead Man's Prey // 방법: 재차의                1   e5
## 10 2023Jul-Dec Ghost Stories (2017)                                       1   e5

3. inner_join

Describe the resulting data:

Columns: report, title, hours viewed, and views
Rows: 0

How is it different from the original two datasets?

0 rows compared to 10 rows and all columns from both data sets

shows_small %>% inner_join(movies_small, by = c("report", "title"))

## # A tibble: 0 × 4
## # ℹ 4 variables: report <chr>, title <chr>, hours_viewed <dbl>, views <dbl>

4. left_join

Describe the resulting data:

Columns: report, title, hours viewed, and views
Rows:10

How is it different from the original two datasets?

Keeps all rows from shows_small and adds matching views from movies_small. No match = NA

shows_small %>% left_join(movies_small, by = c("report", "title"))

## # A tibble: 10 × 4
##    report      title                                          hours_viewed views
##    <chr>       <chr>                                                 <dbl> <dbl>
##  1 2025Jan-Jun Pete Davidson: Alive From New York                   100000    NA
##  2 2024Jul-Dec The Boss Baby: Back in Business: Season 4          19800000    NA
##  3 2025Jan-Jun Kawatani Yuichi -Nihontouitsu Gaiden-: Season…       200000    NA
##  4 2024Jul-Dec Sex and the City: Season 5                         14800000    NA
##  5 2023Jul-Dec Story Time Book: Read-Along: Season 1               3300000    NA
##  6 2024Jul-Dec Super Monsters: Season 3                            3300000    NA
##  7 2025Jan-Jun Suits (2011): Season 9                             26800000    NA
##  8 2024Jan-Jun The Golden Hour: Season 1 // Het gouden uur: …     13700000    NA
##  9 2024Jul-Dec Star Trek: Enterprise: Season 1                     7200000    NA
## 10 2025Jan-Jun Go Dog Go: Season 1                                 9300000    NA

5. right_join

Describe the resulting data:

Columns: report, title, hours viewed, and views
Rows: 10

How is it different from the original two datasets?

Keeps all rows from movies_small and adds matching views from shows_small. No match = NA

shows_small %>% right_join(movies_small, by = c("report", "title"))

## # A tibble: 10 × 4
##    report      title                                         hours_viewed  views
##    <chr>       <chr>                                                <dbl>  <dbl>
##  1 2023Jul-Dec The K.E.OP/S System // El sistema K.E.O.P/S             NA 1   e5
##  2 2024Jan-Jun The Emoji Movie                                         NA 1.68e7
##  3 2024Jul-Dec The Wedding // Kasal                                    NA 1   e5
##  4 2024Jan-Jun Tricky Old Dogs // Les vieux fourneaux                  NA 1   e6
##  5 2025Jan-Jun Babylon (2022)                                          NA 7   e5
##  6 2025Jan-Jun Through Thick and Thin // V Dobrém I Zlém               NA 6   e5
##  7 2024Jan-Jun The Equalizer 3                                         NA 5.35e7
##  8 2023Jul-Dec No Game, No Life the Movie: Zero // ノーゲーム・ノー…           NA 1   e5
##  9 2024Jul-Dec The Cursed: Dead Man's Prey // 방법: 재차의             NA 1   e5
## 10 2023Jul-Dec Ghost Stories (2017)                                    NA 1   e5

6. full_join

Describe the resulting data:

Columns: report, title, hours viewed, and views
Rows: 20

How is it different from the original two datasets?

Includes all rows from both datasets. Matching rows get combined, non matching rows = NA

shows_small %>% full_join(movies_small, by = c("report", "title"))

## # A tibble: 20 × 4
##    report      title                                        hours_viewed   views
##    <chr>       <chr>                                               <dbl>   <dbl>
##  1 2025Jan-Jun Pete Davidson: Alive From New York                 100000 NA     
##  2 2024Jul-Dec The Boss Baby: Back in Business: Season 4        19800000 NA     
##  3 2025Jan-Jun Kawatani Yuichi -Nihontouitsu Gaiden-: Seas…       200000 NA     
##  4 2024Jul-Dec Sex and the City: Season 5                       14800000 NA     
##  5 2023Jul-Dec Story Time Book: Read-Along: Season 1             3300000 NA     
##  6 2024Jul-Dec Super Monsters: Season 3                          3300000 NA     
##  7 2025Jan-Jun Suits (2011): Season 9                           26800000 NA     
##  8 2024Jan-Jun The Golden Hour: Season 1 // Het gouden uur…     13700000 NA     
##  9 2024Jul-Dec Star Trek: Enterprise: Season 1                   7200000 NA     
## 10 2025Jan-Jun Go Dog Go: Season 1                               9300000 NA     
## 11 2023Jul-Dec The K.E.OP/S System // El sistema K.E.O.P/S            NA  1   e5
## 12 2024Jan-Jun The Emoji Movie                                        NA  1.68e7
## 13 2024Jul-Dec The Wedding // Kasal                                   NA  1   e5
## 14 2024Jan-Jun Tricky Old Dogs // Les vieux fourneaux                 NA  1   e6
## 15 2025Jan-Jun Babylon (2022)                                         NA  7   e5
## 16 2025Jan-Jun Through Thick and Thin // V Dobrém I Zlém              NA  6   e5
## 17 2024Jan-Jun The Equalizer 3                                        NA  5.35e7
## 18 2023Jul-Dec No Game, No Life the Movie: Zero // ノーゲーム・ノ…           NA  1   e5
## 19 2024Jul-Dec The Cursed: Dead Man's Prey // 방법: 재차의            NA  1   e5
## 20 2023Jul-Dec Ghost Stories (2017)                                   NA  1   e5

7. semi_join

Describe the resulting data:

Columns: report, title, hours viewed, and views
Rows: 0

How is it different from the original two datasets?

Only rows from shows_small that have matches in movies_small. No matches = 0 rows

shows_small %>% semi_join(movies_small, by = c("report", "title"))

## # A tibble: 0 × 3
## # ℹ 3 variables: report <chr>, title <chr>, hours_viewed <dbl>

8. anti_join

Describe the resulting data:

Columns: report, title, hours viewed, and views
Rows: 10

How is it different from the original two datasets?

Only rows from shows_small that do not have matches in movies_small. No matches = all 10 rows

shows_small %>% anti_join(movies_small, by = c("report", "title"))

## # A tibble: 10 × 3
##    report      title                                                hours_viewed
##    <chr>       <chr>                                                       <dbl>
##  1 2025Jan-Jun Pete Davidson: Alive From New York                         100000
##  2 2024Jul-Dec The Boss Baby: Back in Business: Season 4                19800000
##  3 2025Jan-Jun Kawatani Yuichi -Nihontouitsu Gaiden-: Season 1 // …       200000
##  4 2024Jul-Dec Sex and the City: Season 5                               14800000
##  5 2023Jul-Dec Story Time Book: Read-Along: Season 1                     3300000
##  6 2024Jul-Dec Super Monsters: Season 3                                  3300000
##  7 2025Jan-Jun Suits (2011): Season 9                                   26800000
##  8 2024Jan-Jun The Golden Hour: Season 1 // Het gouden uur: Seizoe…     13700000
##  9 2024Jul-Dec Star Trek: Enterprise: Season 1                           7200000
## 10 2025Jan-Jun Go Dog Go: Season 1                                       9300000

Week 9: Apply it to your data 8

Brady Kelly

2026-04-02

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join