Introduction

Recommender systems have become increasingly popular in recent years and are used by various online platforms to suggest products, movies, or songs to users. Personalized algorithms like “content-based” and “item-item collaborative filtering” have been developed to provide individualized recommendations to users based on their preferences and past behaviors. However, sometimes non-personalized recommenders are also useful or necessary, particularly when the user does not have a history of interactions with the platform. One of the most commonly used non-personalized recommender system algorithms is the “Global Baseline Estimate.”

In this context, we will use survey data collected on movie ratings and write R code that makes movie recommendations using the Global Baseline Estimate algorithm. The Global Baseline Estimate algorithm predicts the overall rating of a movie as the sum of its average rating and the deviations of the users and movies from their respective means. We will start by importing the survey data into R, followed by the computation of the algorithm, and finally, generating a list of recommended movies.

library(tidyverse)
# Load the required packages
library(readxl)

# Load the movie ratings data
movie_ratings <- read_excel("MovieRatings.xlsx")
str(movie_ratings)
## tibble [16 × 7] (S3: tbl_df/tbl/data.frame)
##  $ Critic        : chr [1:16] "Burton" "Charley" "Dan" "Dieudonne" ...
##  $ CaptainAmerica: num [1:16] NA 4 NA 5 4 4 4 NA 4 4 ...
##  $ Deadpool      : num [1:16] NA 5 5 4 NA NA 4 NA 4 3 ...
##  $ Frozen        : num [1:16] NA 4 NA NA 2 3 4 NA 1 5 ...
##  $ JungleBook    : num [1:16] 4 3 NA NA NA 3 2 NA NA 5 ...
##  $ PitchPerfect2 : num [1:16] NA 2 NA NA 2 4 2 NA NA 2 ...
##  $ StarWarsForce : num [1:16] 4 3 5 5 5 NA 4 4 5 3 ...
library(dplyr)

# select only the rating columns
rating_cols <- c("CaptainAmerica", "Deadpool", "Frozen", "JungleBook", "PitchPerfect2", "StarWarsForce")
ratings <- select(movie_ratings, all_of(rating_cols))

# calculate the user average for the ratings
user_avg <- rowMeans(ratings, na.rm = TRUE)

# add the user_avg as a new column in movie_ratings
movie_ratings$user_avg <- user_avg

# print the updated movie_ratings data frame
print(movie_ratings)
## # A tibble: 16 × 8
##    Critic    CaptainAmerica Deadpool Frozen JungleBook PitchPe…¹ StarW…² user_…³
##    <chr>              <dbl>    <dbl>  <dbl>      <dbl>     <dbl>   <dbl>   <dbl>
##  1 Burton                NA       NA     NA          4        NA       4    4   
##  2 Charley                4        5      4          3         2       3    3.5 
##  3 Dan                   NA        5     NA         NA        NA       5    5   
##  4 Dieudonne              5        4     NA         NA        NA       5    4.67
##  5 Matt                   4       NA      2         NA         2       5    3.25
##  6 Mauricio               4       NA      3          3         4      NA    3.5 
##  7 Max                    4        4      4          2         2       4    3.33
##  8 Nathan                NA       NA     NA         NA        NA       4    4   
##  9 Param                  4        4      1         NA        NA       5    3.5 
## 10 Parshu                 4        3      5          5         2       3    3.67
## 11 Prashanth              5        5      5          5        NA       4    4.8 
## 12 Shipra                NA       NA      4          5        NA       3    4   
## 13 Sreejaya               5        5      5          4         4       5    4.67
## 14 Steve                  4       NA     NA         NA        NA       4    4   
## 15 Vuthy                  4        5      3          3         3      NA    3.6 
## 16 Xingjia               NA       NA      5          5        NA      NA    5   
## # … with abbreviated variable names ¹​PitchPerfect2, ²​StarWarsForce, ³​user_avg
movie_avg<-colMeans(movie_ratings[sapply(movie_ratings, is.numeric)],na.rm=TRUE)
movie_avg
## CaptainAmerica       Deadpool         Frozen     JungleBook  PitchPerfect2 
##       4.272727       4.444444       3.727273       3.900000       2.714286 
##  StarWarsForce       user_avg 
##       4.153846       4.030208
mean_movie<-mean(movie_avg)
mean_movie
## [1] 3.891826
movie_avg_minus_mean_movie<-movie_avg-mean_movie
movie_avg_minus_mean_movie
## CaptainAmerica       Deadpool         Frozen     JungleBook  PitchPerfect2 
##    0.380900895    0.552618066   -0.164553651    0.008173622   -1.177540664 
##  StarWarsForce       user_avg 
##    0.262019776    0.138381955
user_avg_minus_mean_movie<-movie_ratings$user_avg-mean_movie
user_avg_minus_mean_movie
##  [1]  0.1081736 -0.3918264  1.1081736  0.7748403 -0.6418264 -0.3918264
##  [7] -0.5584930  0.1081736 -0.3918264 -0.2251597  0.9081736  0.1081736
## [13]  0.7748403  0.1081736 -0.2918264  1.1081736
library(dplyr)

# select only the rating columns
rating_cols <- c("CaptainAmerica", "Deadpool", "Frozen", "JungleBook", "PitchPerfect2", "StarWarsForce")
user_avg<-movie_ratings$user_avg

# print the updated movie_ratings data frame
print(movie_avg)
## CaptainAmerica       Deadpool         Frozen     JungleBook  PitchPerfect2 
##       4.272727       4.444444       3.727273       3.900000       2.714286 
##  StarWarsForce       user_avg 
##       4.153846       4.030208
df_new <- cbind(movie_ratings,user_avg,user_avg_minus_mean_movie)
df_new
##       Critic CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2
## 1     Burton             NA       NA     NA          4            NA
## 2    Charley              4        5      4          3             2
## 3        Dan             NA        5     NA         NA            NA
## 4  Dieudonne              5        4     NA         NA            NA
## 5       Matt              4       NA      2         NA             2
## 6   Mauricio              4       NA      3          3             4
## 7        Max              4        4      4          2             2
## 8     Nathan             NA       NA     NA         NA            NA
## 9      Param              4        4      1         NA            NA
## 10    Parshu              4        3      5          5             2
## 11 Prashanth              5        5      5          5            NA
## 12    Shipra             NA       NA      4          5            NA
## 13  Sreejaya              5        5      5          4             4
## 14     Steve              4       NA     NA         NA            NA
## 15     Vuthy              4        5      3          3             3
## 16   Xingjia             NA       NA      5          5            NA
##    StarWarsForce user_avg user_avg user_avg_minus_mean_movie
## 1              4 4.000000 4.000000                 0.1081736
## 2              3 3.500000 3.500000                -0.3918264
## 3              5 5.000000 5.000000                 1.1081736
## 4              5 4.666667 4.666667                 0.7748403
## 5              5 3.250000 3.250000                -0.6418264
## 6             NA 3.500000 3.500000                -0.3918264
## 7              4 3.333333 3.333333                -0.5584930
## 8              4 4.000000 4.000000                 0.1081736
## 9              5 3.500000 3.500000                -0.3918264
## 10             3 3.666667 3.666667                -0.2251597
## 11             4 4.800000 4.800000                 0.9081736
## 12             3 4.000000 4.000000                 0.1081736
## 13             5 4.666667 4.666667                 0.7748403
## 14             4 4.000000 4.000000                 0.1081736
## 15            NA 3.600000 3.600000                -0.2918264
## 16            NA 5.000000 5.000000                 1.1081736
df1<-data.frame(movie_avg,movie_avg_minus_mean_movie)
df1
##                movie_avg movie_avg_minus_mean_movie
## CaptainAmerica  4.272727                0.380900895
## Deadpool        4.444444                0.552618066
## Frozen          3.727273               -0.164553651
## JungleBook      3.900000                0.008173622
## PitchPerfect2   2.714286               -1.177540664
## StarWarsForce   4.153846                0.262019776
## user_avg        4.030208                0.138381955

How would Param rate Pitch Perfect 2?

Using the general formula for computing the global baseline estimate, we can estimate how Param would rate Pitch Perfect 2 as follows:

Global Baseline Estimate for Pitch Perfect2 = Mean rating for Pitch Perfect2 + Pitch Perfect2 rating relative to average + Param’s rating relative to average

Param_rate_pitch <-  (3.5 - 4.030) +(-1.177) + 2.714
Param_rate_pitch
## [1] 1.007

Therefore, the global baseline estimate for Param’s rating of Pitch Perfect 2 is 1.007

This estimate suggests that Param would rate Pitch Perfect 2 lower than the average rating for the movie

predict rating for each movie

We are going to compute the predicted rating for each movie that each user has not rated, using the following formula:

predicted_rating = mean_movie + user_avg_minus_mean_movie + (rating of other users who have rated this movie - mean_movie)

# create an empty data frame to store the predicted ratings
predicted_ratings <- data.frame(matrix(ncol = 3, nrow = 0))

# add column names to the predicted_ratings data frame
colnames(predicted_ratings) <- c("User", "Movie", "Predicted_Rating")

# loop through each user
for (user in 1:nrow(df_new)) {
  
  # loop through each movie that the user has not rated
  for (movie in rating_cols[is.na(df_new[user, rating_cols])]) {
    
    # get the ratings of other users who have rated this movie
    other_ratings <- df_new[!is.na(df_new[, movie]) & df_new$Critic != df_new[user, "Critic"], movie]
    
    # calculate the predicted rating using the formula
    predicted_rating <- mean_movie + df_new[user, "user_avg_minus_mean_movie"] + (mean(other_ratings) - mean_movie)
    
    # add the user, movie, and predicted rating to the predicted_ratings data frame
    predicted_ratings <- rbind(predicted_ratings, data.frame(User = df_new[user, "Critic"], Movie = movie, Predicted_Rating = predicted_rating))
  }
}

# print the predicted_ratings data frame
predicted_ratings
##         User          Movie Predicted_Rating
## 1     Burton CaptainAmerica         4.380901
## 2     Burton       Deadpool         4.552618
## 3     Burton         Frozen         3.835446
## 4     Burton  PitchPerfect2         2.822459
## 5        Dan CaptainAmerica         5.380901
## 6        Dan         Frozen         4.835446
## 7        Dan     JungleBook         5.008174
## 8        Dan  PitchPerfect2         3.822459
## 9  Dieudonne         Frozen         4.502113
## 10 Dieudonne     JungleBook         4.674840
## 11 Dieudonne  PitchPerfect2         3.489126
## 12      Matt       Deadpool         3.802618
## 13      Matt     JungleBook         3.258174
## 14  Mauricio       Deadpool         4.052618
## 15  Mauricio  StarWarsForce         3.762020
## 16    Nathan CaptainAmerica         4.380901
## 17    Nathan       Deadpool         4.552618
## 18    Nathan         Frozen         3.835446
## 19    Nathan     JungleBook         4.008174
## 20    Nathan  PitchPerfect2         2.822459
## 21     Param     JungleBook         3.508174
## 22     Param  PitchPerfect2         2.322459
## 23 Prashanth  PitchPerfect2         3.622459
## 24    Shipra CaptainAmerica         4.380901
## 25    Shipra       Deadpool         4.552618
## 26    Shipra  PitchPerfect2         2.822459
## 27     Steve       Deadpool         4.552618
## 28     Steve         Frozen         3.835446
## 29     Steve     JungleBook         4.008174
## 30     Steve  PitchPerfect2         2.822459
## 31     Vuthy  StarWarsForce         3.862020
## 32   Xingjia CaptainAmerica         5.380901
## 33   Xingjia       Deadpool         5.552618
## 34   Xingjia  PitchPerfect2         3.822459
## 35   Xingjia  StarWarsForce         5.262020

“CaptainAmerica” rated by user “Dan” has the highest predicted rating of 5.380901, followed by “Deadpool” rated by “Xingjia” with a predicted rating of 5.552618

conclusion

In this analysis, we used the Global Baseline Estimate algorithm to generate movie recommendations based on survey data collected on movie ratings. We started by importing the data into R and computing the user and movie means, and then used these values to compute the Global Baseline Estimate for each movie. We also predicted the ratings for movies that each user had not rated and generated a list of recommended movies based on these predictions.