Import two related datasets from TidyTuesday Project.
movies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-07-29/movies.csv')
## Rows: 36121 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): source, report, title, available_globally, runtime
## dbl (2): hours_viewed, views
## date (1): release_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
shows <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-07-29/shows.csv')
## Rows: 27803 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): source, report, title, available_globally, runtime
## dbl (2): hours_viewed, views
## date (1): release_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: shows
Data 2: movies
set.seed(1234)
shows_small <- shows %>% select(report, title, hours_viewed) %>% sample_n(10)
movies_small <- movies %>% select(report, title, views) %>% sample_n(10)
shows_small
## # A tibble: 10 × 3
## report title hours_viewed
## <chr> <chr> <dbl>
## 1 2025Jan-Jun Pete Davidson: Alive From New York 100000
## 2 2024Jul-Dec The Boss Baby: Back in Business: Season 4 19800000
## 3 2025Jan-Jun Kawatani Yuichi -Nihontouitsu Gaiden-: Season 1 // … 200000
## 4 2024Jul-Dec Sex and the City: Season 5 14800000
## 5 2023Jul-Dec Story Time Book: Read-Along: Season 1 3300000
## 6 2024Jul-Dec Super Monsters: Season 3 3300000
## 7 2025Jan-Jun Suits (2011): Season 9 26800000
## 8 2024Jan-Jun The Golden Hour: Season 1 // Het gouden uur: Seizoe… 13700000
## 9 2024Jul-Dec Star Trek: Enterprise: Season 1 7200000
## 10 2025Jan-Jun Go Dog Go: Season 1 9300000
movies_small
## # A tibble: 10 × 3
## report title views
## <chr> <chr> <dbl>
## 1 2023Jul-Dec The K.E.OP/S System // El sistema K.E.O.P/S 1 e5
## 2 2024Jan-Jun The Emoji Movie 1.68e7
## 3 2024Jul-Dec The Wedding // Kasal 1 e5
## 4 2024Jan-Jun Tricky Old Dogs // Les vieux fourneaux 1 e6
## 5 2025Jan-Jun Babylon (2022) 7 e5
## 6 2025Jan-Jun Through Thick and Thin // V Dobrém I Zlém 6 e5
## 7 2024Jan-Jun The Equalizer 3 5.35e7
## 8 2023Jul-Dec No Game, No Life the Movie: Zero // ノーゲーム・ノーライフ ゼロ…… 1 e5
## 9 2024Jul-Dec The Cursed: Dead Man's Prey // 방법: 재차의 1 e5
## 10 2023Jul-Dec Ghost Stories (2017) 1 e5
Describe the resulting data:
How is it different from the original two datasets?
0 rows compared to 10 rows and all columns from both data sets
shows_small %>% inner_join(movies_small, by = c("report", "title"))
## # A tibble: 0 × 4
## # ℹ 4 variables: report <chr>, title <chr>, hours_viewed <dbl>, views <dbl>
Describe the resulting data:
How is it different from the original two datasets?
Keeps all rows from shows_small and adds matching views from movies_small. No match = NA
shows_small %>% left_join(movies_small, by = c("report", "title"))
## # A tibble: 10 × 4
## report title hours_viewed views
## <chr> <chr> <dbl> <dbl>
## 1 2025Jan-Jun Pete Davidson: Alive From New York 100000 NA
## 2 2024Jul-Dec The Boss Baby: Back in Business: Season 4 19800000 NA
## 3 2025Jan-Jun Kawatani Yuichi -Nihontouitsu Gaiden-: Season… 200000 NA
## 4 2024Jul-Dec Sex and the City: Season 5 14800000 NA
## 5 2023Jul-Dec Story Time Book: Read-Along: Season 1 3300000 NA
## 6 2024Jul-Dec Super Monsters: Season 3 3300000 NA
## 7 2025Jan-Jun Suits (2011): Season 9 26800000 NA
## 8 2024Jan-Jun The Golden Hour: Season 1 // Het gouden uur: … 13700000 NA
## 9 2024Jul-Dec Star Trek: Enterprise: Season 1 7200000 NA
## 10 2025Jan-Jun Go Dog Go: Season 1 9300000 NA
Describe the resulting data:
How is it different from the original two datasets?
Keeps all rows from movies_small and adds matching views from shows_small. No match = NA
shows_small %>% right_join(movies_small, by = c("report", "title"))
## # A tibble: 10 × 4
## report title hours_viewed views
## <chr> <chr> <dbl> <dbl>
## 1 2023Jul-Dec The K.E.OP/S System // El sistema K.E.O.P/S NA 1 e5
## 2 2024Jan-Jun The Emoji Movie NA 1.68e7
## 3 2024Jul-Dec The Wedding // Kasal NA 1 e5
## 4 2024Jan-Jun Tricky Old Dogs // Les vieux fourneaux NA 1 e6
## 5 2025Jan-Jun Babylon (2022) NA 7 e5
## 6 2025Jan-Jun Through Thick and Thin // V Dobrém I Zlém NA 6 e5
## 7 2024Jan-Jun The Equalizer 3 NA 5.35e7
## 8 2023Jul-Dec No Game, No Life the Movie: Zero // ノーゲーム・ノー… NA 1 e5
## 9 2024Jul-Dec The Cursed: Dead Man's Prey // 방법: 재차의 NA 1 e5
## 10 2023Jul-Dec Ghost Stories (2017) NA 1 e5
Describe the resulting data:
How is it different from the original two datasets?
Includes all rows from both datasets. Matching rows get combined, non matching rows = NA
shows_small %>% full_join(movies_small, by = c("report", "title"))
## # A tibble: 20 × 4
## report title hours_viewed views
## <chr> <chr> <dbl> <dbl>
## 1 2025Jan-Jun Pete Davidson: Alive From New York 100000 NA
## 2 2024Jul-Dec The Boss Baby: Back in Business: Season 4 19800000 NA
## 3 2025Jan-Jun Kawatani Yuichi -Nihontouitsu Gaiden-: Seas… 200000 NA
## 4 2024Jul-Dec Sex and the City: Season 5 14800000 NA
## 5 2023Jul-Dec Story Time Book: Read-Along: Season 1 3300000 NA
## 6 2024Jul-Dec Super Monsters: Season 3 3300000 NA
## 7 2025Jan-Jun Suits (2011): Season 9 26800000 NA
## 8 2024Jan-Jun The Golden Hour: Season 1 // Het gouden uur… 13700000 NA
## 9 2024Jul-Dec Star Trek: Enterprise: Season 1 7200000 NA
## 10 2025Jan-Jun Go Dog Go: Season 1 9300000 NA
## 11 2023Jul-Dec The K.E.OP/S System // El sistema K.E.O.P/S NA 1 e5
## 12 2024Jan-Jun The Emoji Movie NA 1.68e7
## 13 2024Jul-Dec The Wedding // Kasal NA 1 e5
## 14 2024Jan-Jun Tricky Old Dogs // Les vieux fourneaux NA 1 e6
## 15 2025Jan-Jun Babylon (2022) NA 7 e5
## 16 2025Jan-Jun Through Thick and Thin // V Dobrém I Zlém NA 6 e5
## 17 2024Jan-Jun The Equalizer 3 NA 5.35e7
## 18 2023Jul-Dec No Game, No Life the Movie: Zero // ノーゲーム・ノ… NA 1 e5
## 19 2024Jul-Dec The Cursed: Dead Man's Prey // 방법: 재차의 NA 1 e5
## 20 2023Jul-Dec Ghost Stories (2017) NA 1 e5
Describe the resulting data:
How is it different from the original two datasets?
Only rows from shows_small that have matches in movies_small. No matches = 0 rows
shows_small %>% semi_join(movies_small, by = c("report", "title"))
## # A tibble: 0 × 3
## # ℹ 3 variables: report <chr>, title <chr>, hours_viewed <dbl>
Describe the resulting data:
How is it different from the original two datasets?
Only rows from shows_small that do not have matches in movies_small. No matches = all 10 rows
shows_small %>% anti_join(movies_small, by = c("report", "title"))
## # A tibble: 10 × 3
## report title hours_viewed
## <chr> <chr> <dbl>
## 1 2025Jan-Jun Pete Davidson: Alive From New York 100000
## 2 2024Jul-Dec The Boss Baby: Back in Business: Season 4 19800000
## 3 2025Jan-Jun Kawatani Yuichi -Nihontouitsu Gaiden-: Season 1 // … 200000
## 4 2024Jul-Dec Sex and the City: Season 5 14800000
## 5 2023Jul-Dec Story Time Book: Read-Along: Season 1 3300000
## 6 2024Jul-Dec Super Monsters: Season 3 3300000
## 7 2025Jan-Jun Suits (2011): Season 9 26800000
## 8 2024Jan-Jun The Golden Hour: Season 1 // Het gouden uur: Seizoe… 13700000
## 9 2024Jul-Dec Star Trek: Enterprise: Season 1 7200000
## 10 2025Jan-Jun Go Dog Go: Season 1 9300000