library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(knitr)
ratings_df <- read.csv("https://raw.githubusercontent.com/farhodibr/CUNY-SPS-MSDS/refs/heads/main/DATA607/LAB11/movie_reviews%20-%20Form%20Responses%201.csv")
This dataset contains movie ratings by users of the survey
kable(ratings_df)
| Timestamp | Name | Gladiator2 | Wolfs | The_Substance | Bad_Boys4 | The_Beekeeper | Rebel_Ridge |
|---|---|---|---|---|---|---|---|
| 5:34:20 PM | Aiuna | 1 | NA | 5 | 2 | 2 | 1 |
| 6:12:59 PM | Ilya | 3 | 3 | 3 | 3 | NA | 3 |
| 6:52:31 PM | Vadim | 1 | NA | 3 | 1 | 3 | 3 |
| 10:16:18 PM | Vladimir Storchevoy | 5 | 5 | 5 | 5 | 4 | 4 |
| 1:44:22 AM | Timur | 5 | 5 | 4 | 5 | 3 | 3 |
| 1:56:52 AM | Gosha | 4 | 5 | 3 | 5 | 4 | 4 |
| 10:30:44 AM | Bob | 3 | 5 | 4 | NA | 3 | NA |
| 9:40:22 PM | Kirill | 4 | 4 | NA | 5 | 3 | 3 |
| 9:41:38 PM | James | 3 | 4 | 4 | 5 | 3 | 3 |
ratings_df_long <- ratings_df |>
select(-Timestamp) |>
pivot_longer(
cols = -Name,
names_to = "movie",
values_to = "rating"
)
kable(head(ratings_df_long, 10))
| Name | movie | rating |
|---|---|---|
| Aiuna | Gladiator2 | 1 |
| Aiuna | Wolfs | NA |
| Aiuna | The_Substance | 5 |
| Aiuna | Bad_Boys4 | 2 |
| Aiuna | The_Beekeeper | 2 |
| Aiuna | Rebel_Ridge | 1 |
| Ilya | Gladiator2 | 3 |
| Ilya | Wolfs | 3 |
| Ilya | The_Substance | 3 |
| Ilya | Bad_Boys4 | 3 |
The global baseline estimate recommender uses following formula:
\[ \hat{r}_{ui} = \mu + b_u + b_m \]
Where:
\(\mu\) - global average rating
\(b_u\) - user bias (how a user
tends to rate compared to the global average).
This calculated by formula:
\[ b_u = \frac{1}{N_u} (\sum(r_u - \mu)) \]
Where:
\(r_u\) is the movie’s rating by
user
\(N_u\) is the number of ratings by the
user
\(b_m\) - movies bias (how a
movie is rated compared to the global average)
This calculated by formula:
\[
b_m = \frac{1}{N_m}(\sum(r_u - \mu - b_u))
\] Where:
\(N_m\) is is the number of ratings for
that movie
This code cell calculates \(\mu\) (global average rating)
mu <- mean(ratings_df_long$rating, na.rm = TRUE)
kable(mu)
| x |
|---|
| 3.5625 |
This code cell creates user_biasdataframe, which
includes \(b_u\) bias for each
user:
user_bias <- ratings_df_long |>
group_by(Name) |>
summarise(bias_user = mean(rating - mu, na.rm = TRUE))
kable(user_bias)
| Name | bias_user |
|---|---|
| Aiuna | -1.3625000 |
| Bob | 0.1875000 |
| Gosha | 0.6041667 |
| Ilya | -0.5625000 |
| James | 0.1041667 |
| Kirill | 0.2375000 |
| Timur | 0.6041667 |
| Vadim | -1.3625000 |
| Vladimir Storchevoy | 1.1041667 |
This code cell creates movie_bias dataset, which include
\(b_m\) bias for each movie:
movie_bias <- ratings_df_long |>
left_join(user_bias, by = "Name") |>
group_by(movie) |>
summarise(bias_movie = mean(rating - mu - bias_user, na.rm = TRUE))
kable(movie_bias)
| movie | bias_movie |
|---|---|
| Bad_Boys4 | 0.3916667 |
| Gladiator2 | -0.2907407 |
| Rebel_Ridge | -0.4833333 |
| The_Beekeeper | -0.4520833 |
| The_Substance | 0.3979167 |
| Wolfs | 0.5404762 |
This code cell creates predicted_ratingsdataframe which
calculates predicted ratings for not rated movies (NAs) by using global
baseline estimate recommender formula
predicted_ratings <- ratings_df_long |>
filter(is.na(rating)) |>
left_join(user_bias, by ="Name") |>
left_join(movie_bias, by = "movie") |>
mutate(predicted = round(mu + bias_user + bias_movie))
kable(predicted_ratings)
| Name | movie | rating | bias_user | bias_movie | predicted |
|---|---|---|---|---|---|
| Aiuna | Wolfs | NA | -1.3625 | 0.5404762 | 3 |
| Ilya | The_Beekeeper | NA | -0.5625 | -0.4520833 | 3 |
| Vadim | Wolfs | NA | -1.3625 | 0.5404762 | 3 |
| Bob | Bad_Boys4 | NA | 0.1875 | 0.3916667 | 4 |
| Bob | Rebel_Ridge | NA | 0.1875 | -0.4833333 | 3 |
| Kirill | The_Substance | NA | 0.2375 | 0.3979167 | 4 |