Assignment 3a

Approach

Step 1

I will be using the synthetic dataset shared by Desiree Thomas since it’s better then my old dataset. I’ll have to double check to make sure that any row that have an empty rating is removed and calculate the global mean.

Step 2

For each movie I’d will calculate the movie bias and for each user I will calculate the user bias. Once I have those I can use the prediction formula

rui = μ − bu − bi

  • rui = The predicted rating.
  • μ = The Global Mean.
  • bu = The specific User’s Bias.
  • bi = The specific Movie’s Bias.

Code Base

Load the dataset

First I will load the necessary library and the data from the CSV and convert empty box into N/A

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
df <- read.csv("excel_version_survey_results.csv", stringsAsFactors = FALSE)
df$rating <- as.numeric(df$rating)

Calculate Global Mean

Calculate the average of every available rating in the entire dataset. Ignore every N/A value

mu <- mean(df$rating, na.rm = TRUE)

Calculate User Bias

Calculating the average rating given by users and subtracting that from the global mean

user_biases <- df %>%
  group_by(account_id) %>%
  summarise(user_mean = mean(rating, na.rm = TRUE), .groups = 'drop') %>%
  mutate(bu = user_mean - mu)

Caculate Movie Bias

Calculating the average rating for each movie and subtracting that from the global mean

movie_biases <- df %>%
  group_by(movie_title) %>%
  summarise(movie_mean = mean(rating, na.rm = TRUE), .groups = 'drop') %>%
  mutate(bi = movie_mean - mu)

Apply the Global Baseline Estimate formula

We join the calculated bias back into our dataframe and normalize the prediction to make sure they’re within 1-5 star range

full_results <- df %>%
  left_join(user_biases, by = "account_id") %>%
  left_join(movie_biases, by = "movie_title") %>%
  mutate(predicted_rating = mu + bu + bi)

# Normalization
full_results$predicted_rating <- pmin(pmax(full_results$predicted_rating, 1), 5)

Recommendations

Finding which movie has the highest predicted_rating for each user

recommendations <- full_results %>%
  filter(is.na(rating)) %>%
  group_by(account_id, profile_name) %>%
  arrange(desc(predicted_rating)) %>%
  slice(1) %>% 
  ungroup() %>%
  select(profile_name, movie_title, predicted_rating)

cat("Recommendation System Results\n")
Recommendation System Results
cat("Global Mean Rating across all users/movies:", round(mu, 2), "\n\n")
Global Mean Rating across all users/movies: 3.04 
cat("Calculated Movie Biases (bi):\n")
Calculated Movie Biases (bi):
print(movie_biases %>% select(movie_title, bi) %>% arrange(desc(bi)))
# A tibble: 6 × 2
  movie_title                              bi
  <chr>                                 <dbl>
1 Barbie                               0.266 
2 Spider-Man: Across the Spider-Verse  0.164 
3 Inside Out 2                        -0.0400
4 Dune: Part Two                      -0.0855
5 Oppenheimer                         -0.104 
6 Deadpool & Wolverine                -0.128 
cat("\nRecommended Movies for Users who have not seen everything:\n")

Recommended Movies for Users who have not seen everything:
print(recommendations)
# A tibble: 68 × 3
   profile_name     movie_title                         predicted_rating
   <chr>            <chr>                                          <dbl>
 1 Carol Johnson    Inside Out 2                                    2.29
 2 Edward Perez     Oppenheimer                                     2.70
 3 Amanda Wright    Barbie                                          4.93
 4 Mark Taylor      Barbie                                          4.07
 5 Donna Harris     Barbie                                          3.07
 6 Richard Williams Spider-Man: Across the Spider-Verse             3.16
 7 Susan Adams      Dune: Part Two                                  2.66
 8 Michelle Jones   Barbie                                          2.77
 9 Andrew Adams     Barbie                                          2.93
10 Ashley Allen     Barbie                                          2.60
# ℹ 58 more rows