knitr::opts_chunk$set(echo = TRUE)
This assignment uses a dataset of movie ratings to create a global baseline estimate that can be used to predict a critic’s rating based on their average ratings and the average film ratings. For this task, the goal is to predict what the critic Param would rate the movie Pitch Perfect 2.
The following code block loads the packages required for this task.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
The following code block imports the dataset and cleans it up a little.
movie_raw <- read_csv("https://raw.githubusercontent.com/mraynolds/data_607/refs/heads/main/MovieRatings.csv", skip_empty_rows = TRUE)
## Rows: 212 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Critic
## dbl (6): CaptainAmerica, Deadpool, Frozen, JungleBook, PitchPerfect2, StarWa...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
movie_raw <- movie_raw |> clean_names() |> filter(critic != is.na(critic))
The following code makes the dataset tidy by pivoting longer. It retains the NA values for movies that were not rated by each critic, although this is not necessary. The NA values could also have been elimnated.
movie_ratings <- movie_raw |>
pivot_longer(cols = !critic,
names_to = "movie",
values_to = "rating")
The code below creates the mean of every movie rating, creating the mean rating for all movies. This will be used as the global baseline for all movie ratings.
mean_movie <- movie_ratings |>
summarize(mean_movie = mean(rating, na.rm = TRUE)) |> as.numeric()
mean_movie
## [1] 3.934426
The following code creates a dataframe with the average rating for each movie based on all of the ratings for that specific movie.
movie_average <- movie_ratings |>
group_by(movie) |>
summarize(avg_rating = mean(rating, na.rm = TRUE))
movie_average
## # A tibble: 6 × 2
## movie avg_rating
## <chr> <dbl>
## 1 captain_america 4.27
## 2 deadpool 4.44
## 3 frozen 3.73
## 4 jungle_book 3.9
## 5 pitch_perfect2 2.71
## 6 star_wars_force 4.15
The movie_average dataframe is then modified to include a column that shows the value of the average rating relative to the global mean movie rating. Each movie rating has the global movie mean subtracted from it.
movie_average <- movie_average |>
mutate(
relative_movie_rating = (avg_rating - mean_movie)
)
The following code creates a dataframe with the average rating for each critic based on their ratings.
critic_average <- movie_ratings |>
group_by(critic) |>
summarize(avg_rating = mean(rating, na.rm = TRUE))
The critic_average is then modified to include a column that shows the value of the average rating relative to the global mean movie rating. Each critic rating has the global movie mean subtracted from it.
critic_average <- critic_average |>
mutate(
relative_critic_rating = (avg_rating - mean_movie)
)
The following code estimates what Param would rate the movie Pitch Perfect 2.
critic <- "Param"
movie <- "pitch_perfect2"
pitch_perfect_2_relative_rating <- movie_average$relative_movie_rating[movie_average$movie == movie]
critic_relative_rating <- critic_average$relative_critic_rating[critic_average$critic == critic]
Param_rating_pitch_perfect2 <- mean_movie + pitch_perfect_2_relative_rating + critic_relative_rating
paste(critic, "would rate the movie",movie,"approximately", round(Param_rating_pitch_perfect2, 2))
## [1] "Param would rate the movie pitch_perfect2 approximately 2.28"