Import two related datasets from TidyTuesday Project.
pixar_films = read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-03-11/pixar_films.csv')
## Rows: 27 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): film, film_rating
## dbl (2): number, run_time
## date (1): release_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
public_response = read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-03-11/public_response.csv')
## Rows: 24 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): film, cinema_score
## dbl (3): rotten_tomatoes, metacritic, critics_choice
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: pixar_films
Data 2: public_response
set.seed(1234)
pixar_films_small <- pixar_films %>% select(film, film_rating, release_date) %>% sample_n(10)
public_response_small <- public_response %>% select(film, cinema_score, metacritic) %>% sample_n(10)
pixar_films_small
## # A tibble: 10 × 3
## film film_rating release_date
## <chr> <chr> <date>
## 1 The Good Dinosaur PG 2015-11-25
## 2 Lightyear N/A 2022-06-17
## 3 Onward PG 2020-03-06
## 4 Finding Nemo G 2003-05-30
## 5 Cars 2 G 2011-06-24
## 6 Inside Out PG 2015-06-19
## 7 WALL-E G 2008-06-27
## 8 Luca N/A 2021-06-18
## 9 The Incredibles PG 2004-11-05
## 10 <NA> Not Rated 2023-06-16
public_response_small
## # A tibble: 10 × 3
## film cinema_score metacritic
## <chr> <chr> <dbl>
## 1 Monsters, Inc. A+ 79
## 2 A Bug's Life A 77
## 3 Cars A 73
## 4 The Incredibles A+ 90
## 5 Inside Out A 94
## 6 Monsters University A 65
## 7 Coco A+ 81
## 8 Luca <NA> NA
## 9 Finding Dory A 77
## 10 Finding Nemo A+ 90
Describe the resulting data:
How is it different from the original two datasets?
pixar_films_small %>% inner_join(public_response_small, by = c("film"))
## # A tibble: 4 × 5
## film film_rating release_date cinema_score metacritic
## <chr> <chr> <date> <chr> <dbl>
## 1 Finding Nemo G 2003-05-30 A+ 90
## 2 Inside Out PG 2015-06-19 A 94
## 3 Luca N/A 2021-06-18 <NA> NA
## 4 The Incredibles PG 2004-11-05 A+ 90
Describe the resulting data:
How is it different from the original two datasets?
*all columns from both data sets
pixar_films_small %>% left_join(public_response_small, by = c("film"))
## # A tibble: 10 × 5
## film film_rating release_date cinema_score metacritic
## <chr> <chr> <date> <chr> <dbl>
## 1 The Good Dinosaur PG 2015-11-25 <NA> NA
## 2 Lightyear N/A 2022-06-17 <NA> NA
## 3 Onward PG 2020-03-06 <NA> NA
## 4 Finding Nemo G 2003-05-30 A+ 90
## 5 Cars 2 G 2011-06-24 <NA> NA
## 6 Inside Out PG 2015-06-19 A 94
## 7 WALL-E G 2008-06-27 <NA> NA
## 8 Luca N/A 2021-06-18 <NA> NA
## 9 The Incredibles PG 2004-11-05 A+ 90
## 10 <NA> Not Rated 2023-06-16 <NA> NA
Describe the resulting data:
How is it different from the original two datasets?
film_rating and release_date are only shown for the first 4 rows (the matching variables) all columns from both data sets
pixar_films_small %>% right_join(public_response_small, by = c("film"))
## # A tibble: 10 × 5
## film film_rating release_date cinema_score metacritic
## <chr> <chr> <date> <chr> <dbl>
## 1 Finding Nemo G 2003-05-30 A+ 90
## 2 Inside Out PG 2015-06-19 A 94
## 3 Luca N/A 2021-06-18 <NA> NA
## 4 The Incredibles PG 2004-11-05 A+ 90
## 5 Monsters, Inc. <NA> NA A+ 79
## 6 A Bug's Life <NA> NA A 77
## 7 Cars <NA> NA A 73
## 8 Monsters University <NA> NA A 65
## 9 Coco <NA> NA A+ 81
## 10 Finding Dory <NA> NA A 77
Describe the resulting data:
How is it different from the original two datasets?
all columns shown from original datasets 16 rows instead of 10
pixar_films_small %>% full_join(public_response_small, by = "film")
## # A tibble: 16 × 5
## film film_rating release_date cinema_score metacritic
## <chr> <chr> <date> <chr> <dbl>
## 1 The Good Dinosaur PG 2015-11-25 <NA> NA
## 2 Lightyear N/A 2022-06-17 <NA> NA
## 3 Onward PG 2020-03-06 <NA> NA
## 4 Finding Nemo G 2003-05-30 A+ 90
## 5 Cars 2 G 2011-06-24 <NA> NA
## 6 Inside Out PG 2015-06-19 A 94
## 7 WALL-E G 2008-06-27 <NA> NA
## 8 Luca N/A 2021-06-18 <NA> NA
## 9 The Incredibles PG 2004-11-05 A+ 90
## 10 <NA> Not Rated 2023-06-16 <NA> NA
## 11 Monsters, Inc. <NA> NA A+ 79
## 12 A Bug's Life <NA> NA A 77
## 13 Cars <NA> NA A 73
## 14 Monsters University <NA> NA A 65
## 15 Coco <NA> NA A+ 81
## 16 Finding Dory <NA> NA A 77
Describe the resulting data:
How is it different from the original two datasets?
only 3 columns are shown from both datasets 4 rows instead of 10 rows
pixar_films_small %>%
semi_join(public_response_small, by = c("film"))
## # A tibble: 4 × 3
## film film_rating release_date
## <chr> <chr> <date>
## 1 Finding Nemo G 2003-05-30
## 2 Inside Out PG 2015-06-19
## 3 Luca N/A 2021-06-18
## 4 The Incredibles PG 2004-11-05
Describe the resulting data:
How is it different from the original two datasets?
only 3 columns are shown from both datasets 6 rows instead of 10 rows
pixar_films_small %>%
anti_join(public_response_small, by = c("film"))
## # A tibble: 6 × 3
## film film_rating release_date
## <chr> <chr> <date>
## 1 The Good Dinosaur PG 2015-11-25
## 2 Lightyear N/A 2022-06-17
## 3 Onward PG 2020-03-06
## 4 Cars 2 G 2011-06-24
## 5 WALL-E G 2008-06-27
## 6 <NA> Not Rated 2023-06-16