Week 11 Assignment - extra credit

Movie Ratings Data

The MovieRatings sheet contains movie ratings given by various critics. Each row represents a critic, and each column represents a movie. The ratings are on a scale of 1 to 5, and some values are missing, indicating that the critic did not rate that movie. Therefore, the NaN values should be addressed during the considerations of the recommender system.

Calculating the Average Overall Rating

file_path <- "C:\\Documents\\R Projects\\MovieRatings.xlsx"
movie_ratings_df <- read_excel(file_path, sheet = "MovieRatings")
overall_avg_rating <- mean(unlist(movie_ratings_df[,-1]), na.rm = TRUE)
overall_avg_rating

## [1] 3.934426

Calculating the Movie Bias

movie_avg_ratings <- colMeans(movie_ratings_df[,-1], na.rm = TRUE)
movie_biases <- movie_avg_ratings - overall_avg_rating
movie_biases

## CaptainAmerica       Deadpool         Frozen     JungleBook  PitchPerfect2 
##     0.33830104     0.51001821    -0.20715350    -0.03442623    -1.22014052 
##  StarWarsForce 
##     0.21941992

Calculating the Critic

critic_avg_ratings <- rowMeans(movie_ratings_df[,-1], na.rm = TRUE)
critic_biases <- critic_avg_ratings - overall_avg_rating
critic_biases

##  [1]  0.06557377 -0.43442623  1.06557377  0.73224044 -0.68442623 -0.43442623
##  [7] -0.60109290  0.06557377 -0.43442623 -0.26775956  0.86557377  0.06557377
## [13]  0.73224044  0.06557377 -0.33442623  1.06557377

Predict Ratings for Recommender to Pick

A Global Baseline Estimate recommender system predicts the ratings for items by users using a non-personalized approach. Once these predictions are made, the system can recommend movies to users based on the highest predicted ratings for movies they haven’t rated yet. This is why the prediction is important in ensuring that the recommender works correctly.

predicted_ratings <- matrix(NA, nrow = nrow(movie_ratings_df), ncol = ncol(movie_ratings_df)-1)

for (i in 1:nrow(movie_ratings_df)) {
  for (j in 2:ncol(movie_ratings_df)) {
    predicted_ratings[i, j-1] <- overall_avg_rating + critic_biases[i] + movie_biases[j-1]
  }
}

predicted_ratings_df <- as.data.frame(predicted_ratings)
colnames(predicted_ratings_df) <- colnames(movie_ratings_df)[-1]
predicted_ratings_df <- cbind(Critic = movie_ratings_df$Critic, predicted_ratings_df)

print(predicted_ratings_df)

##       Critic CaptainAmerica Deadpool   Frozen JungleBook PitchPerfect2
## 1     Burton       4.338301 4.510018 3.792846   3.965574      2.779859
## 2    Charley       3.838301 4.010018 3.292846   3.465574      2.279859
## 3        Dan       5.338301 5.510018 4.792846   4.965574      3.779859
## 4  Dieudonne       5.004968 5.176685 4.459513   4.632240      3.446526
## 5       Matt       3.588301 3.760018 3.042846   3.215574      2.029859
## 6   Mauricio       3.838301 4.010018 3.292846   3.465574      2.279859
## 7        Max       3.671634 3.843352 3.126180   3.298907      2.113193
## 8     Nathan       4.338301 4.510018 3.792846   3.965574      2.779859
## 9      Param       3.838301 4.010018 3.292846   3.465574      2.279859
## 10    Parshu       4.004968 4.176685 3.459513   3.632240      2.446526
## 11 Prashanth       5.138301 5.310018 4.592846   4.765574      3.579859
## 12    Shipra       4.338301 4.510018 3.792846   3.965574      2.779859
## 13  Sreejaya       5.004968 5.176685 4.459513   4.632240      3.446526
## 14     Steve       4.338301 4.510018 3.792846   3.965574      2.779859
## 15     Vuthy       3.938301 4.110018 3.392846   3.565574      2.379859
## 16   Xingjia       5.338301 5.510018 4.792846   4.965574      3.779859
##    StarWarsForce
## 1       4.219420
## 2       3.719420
## 3       5.219420
## 4       4.886087
## 5       3.469420
## 6       3.719420
## 7       3.552753
## 8       4.219420
## 9       3.719420
## 10      3.886087
## 11      5.019420
## 12      4.219420
## 13      4.886087
## 14      4.219420
## 15      3.819420
## 16      5.219420