I will be using the synthetic dataset shared by Desiree Thomas since it’s better then my old dataset. I’ll have to double check to make sure that any row that have an empty rating is removed and calculate the global mean.
Step 2
For each movie I’d will calculate the movie bias and for each user I will calculate the user bias. Once I have those I can use the prediction formula
rui = μ − bu − bi
rui = The predicted rating.
μ = The Global Mean.
bu = The specific User’s Bias.
bi = The specific Movie’s Bias.
Code Base
Load the dataset
First I will load the necessary library and the data from the CSV and convert empty box into N/A
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
We join the calculated bias back into our dataframe and normalize the prediction to make sure they’re within 1-5 star range
full_results <- df %>%left_join(user_biases, by ="account_id") %>%left_join(movie_biases, by ="movie_title") %>%mutate(predicted_rating = mu + bu + bi)# Normalizationfull_results$predicted_rating <-pmin(pmax(full_results$predicted_rating, 1), 5)
Recommendations
Finding which movie has the highest predicted_rating for each user
# A tibble: 6 × 2
movie_title bi
<chr> <dbl>
1 Barbie 0.266
2 Spider-Man: Across the Spider-Verse 0.164
3 Inside Out 2 -0.0400
4 Dune: Part Two -0.0855
5 Oppenheimer -0.104
6 Deadpool & Wolverine -0.128
cat("\nRecommended Movies for Users who have not seen everything:\n")
Recommended Movies for Users who have not seen everything:
print(recommendations)
# A tibble: 68 × 3
profile_name movie_title predicted_rating
<chr> <chr> <dbl>
1 Carol Johnson Inside Out 2 2.29
2 Edward Perez Oppenheimer 2.70
3 Amanda Wright Barbie 4.93
4 Mark Taylor Barbie 4.07
5 Donna Harris Barbie 3.07
6 Richard Williams Spider-Man: Across the Spider-Verse 3.16
7 Susan Adams Dune: Part Two 2.66
8 Michelle Jones Barbie 2.77
9 Andrew Adams Barbie 2.93
10 Ashley Allen Barbie 2.60
# ℹ 58 more rows