For this exercise im trying to use global baseline estimates to help us predict movie ratings.
We first start with the user reviews, initially a long list of the ratings that were given to the movies from 1-5 converted to a wider format using pivot_wider for better viewing
viewer_ratings <- read_csv ("viewer_ratings.csv")
ratings <- viewer_ratings %>%
pivot_wider(
names_from = "movie_name",
values_from = "viewer_rating",
)
knitr::kable((ratings), "simple")
| viewer_name | Inside Out 2 | Deadpool & Wolverine | Wicked | Dune Part 2 | Quiet Place: Day 1 | Kingdom of the Planet of Apes |
|---|---|---|---|---|---|---|
| Nathasha | 3.5 | 3.0 | 4.5 | NA | 4.0 | 3.5 |
| Joseph | NA | 4.5 | NA | 3.5 | 3.0 | 3.5 |
| Rosy | 3.5 | 3.5 | 3.5 | 4.0 | 3.0 | 4.0 |
| Andrew | 3.0 | 4.5 | NA | 4.5 | 3.5 | 4.5 |
| Julia | 3.5 | 3.0 | 3.5 | NA | 4.5 | 3.0 |
| Kazi | 4.5 | 3.0 | 3.0 | 3.0 | 3.5 | 3.5 |
| Badri | 3.2 | 4.0 | NA | 3.8 | 3.2 | 3.9 |
| Rafiya | 4.0 | NA | 4.2 | NA | 3.2 | 3.6 |
| Thomas | NA | 3.8 | NA | 3.5 | 4.0 | 3.6 |
| Benjamin | 3.7 | 3.7 | 3.5 | 3.8 | 3.7 | 3.9 |
Now we add averages for each movie in a column, and viewer averages for each row
ratings$viewer_avg <- round(rowMeans(ratings[, -1], na.rm = TRUE),1)
movie_avg <- round(colMeans(ratings[, -1], na.rm = TRUE), 1)
ratings <- rbind(ratings, c("movie_avg", movie_avg))
knitr::kable((ratings), "simple")
| viewer_name | Inside Out 2 | Deadpool & Wolverine | Wicked | Dune Part 2 | Quiet Place: Day 1 | Kingdom of the Planet of Apes | viewer_avg |
|---|---|---|---|---|---|---|---|
| Nathasha | 3.5 | 3 | 4.5 | NA | 4 | 3.5 | 3.7 |
| Joseph | NA | 4.5 | NA | 3.5 | 3 | 3.5 | 3.6 |
| Rosy | 3.5 | 3.5 | 3.5 | 4 | 3 | 4 | 3.6 |
| Andrew | 3 | 4.5 | NA | 4.5 | 3.5 | 4.5 | 4 |
| Julia | 3.5 | 3 | 3.5 | NA | 4.5 | 3 | 3.5 |
| Kazi | 4.5 | 3 | 3 | 3 | 3.5 | 3.5 | 3.4 |
| Badri | 3.2 | 4 | NA | 3.8 | 3.2 | 3.9 | 3.6 |
| Rafiya | 4 | NA | 4.2 | NA | 3.2 | 3.6 | 3.8 |
| Thomas | NA | 3.8 | NA | 3.5 | 4 | 3.6 | 3.7 |
| Benjamin | 3.7 | 3.7 | 3.5 | 3.8 | 3.7 | 3.9 | 3.7 |
| movie_avg | 3.6 | 3.7 | 3.7 | 3.7 | 3.6 | 3.7 | 3.7 |
THe formula for predicting what someones rating for a specific movie according to the the spreadsheet was Global Baseline Estimate = Mean Movie Rating + (The specific movies avg rating - MMR) + (The specific viewers average rating - MMR)
Now we can predict how the viewers who did not see the movie might rate it. And replace the NA values with the GLobal estimates.
| viewer_name | Inside Out 2 | Deadpool & Wolverine | Wicked | Dune Part 2 | Quiet Place: Day 1 | Kingdom of the Planet of Apes | viewer_avg |
|---|---|---|---|---|---|---|---|
| Nathasha | 3.5 | 3 | 4.5 | 3.7 | 4 | 3.5 | 3.7 |
| Joseph | 3.5 | 4.5 | 3.6 | 3.5 | 3 | 3.5 | 3.6 |
| Rosy | 3.5 | 3.5 | 3.5 | 4 | 3 | 4 | 3.6 |
| Andrew | 3 | 4.5 | 4 | 4.5 | 3.5 | 4.5 | 4 |
| Julia | 3.5 | 3 | 3.5 | 3.5 | 4.5 | 3 | 3.5 |
| Kazi | 4.5 | 3 | 3 | 3 | 3.5 | 3.5 | 3.4 |
| Badri | 3.2 | 4 | 3.6 | 3.8 | 3.2 | 3.9 | 3.6 |
| Rafiya | 4 | 3.8 | 4.2 | 3.8 | 3.2 | 3.6 | 3.8 |
| Thomas | 3.6 | 3.8 | 3.7 | 3.5 | 4 | 3.6 | 3.7 |
| Benjamin | 3.7 | 3.7 | 3.5 | 3.8 | 3.7 | 3.9 | 3.7 |
| movie_avg | 3.6 | 3.7 | 3.7 | 3.7 | 3.6 | 3.7 | 3.7 |