Recommender systems have become increasingly popular in recent years and are used by various online platforms to suggest products, movies, or songs to users. Personalized algorithms like “content-based” and “item-item collaborative filtering” have been developed to provide individualized recommendations to users based on their preferences and past behaviors. However, sometimes non-personalized recommenders are also useful or necessary, particularly when the user does not have a history of interactions with the platform. One of the most commonly used non-personalized recommender system algorithms is the “Global Baseline Estimate.”
In this context, we will use survey data collected on movie ratings and write R code that makes movie recommendations using the Global Baseline Estimate algorithm. The Global Baseline Estimate algorithm predicts the overall rating of a movie as the sum of its average rating and the deviations of the users and movies from their respective means. We will start by importing the survey data into R, followed by the computation of the algorithm, and finally, generating a list of recommended movies.
library(tidyverse)
# Load the required packages
library(readxl)
# Load the movie ratings data
<- read_excel("MovieRatings.xlsx") movie_ratings
str(movie_ratings)
## tibble [16 × 7] (S3: tbl_df/tbl/data.frame)
## $ Critic : chr [1:16] "Burton" "Charley" "Dan" "Dieudonne" ...
## $ CaptainAmerica: num [1:16] NA 4 NA 5 4 4 4 NA 4 4 ...
## $ Deadpool : num [1:16] NA 5 5 4 NA NA 4 NA 4 3 ...
## $ Frozen : num [1:16] NA 4 NA NA 2 3 4 NA 1 5 ...
## $ JungleBook : num [1:16] 4 3 NA NA NA 3 2 NA NA 5 ...
## $ PitchPerfect2 : num [1:16] NA 2 NA NA 2 4 2 NA NA 2 ...
## $ StarWarsForce : num [1:16] 4 3 5 5 5 NA 4 4 5 3 ...
library(dplyr)
# select only the rating columns
<- c("CaptainAmerica", "Deadpool", "Frozen", "JungleBook", "PitchPerfect2", "StarWarsForce")
rating_cols <- select(movie_ratings, all_of(rating_cols))
ratings
# calculate the user average for the ratings
<- rowMeans(ratings, na.rm = TRUE)
user_avg
# add the user_avg as a new column in movie_ratings
$user_avg <- user_avg
movie_ratings
# print the updated movie_ratings data frame
print(movie_ratings)
## # A tibble: 16 × 8
## Critic CaptainAmerica Deadpool Frozen JungleBook PitchPe…¹ StarW…² user_…³
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Burton NA NA NA 4 NA 4 4
## 2 Charley 4 5 4 3 2 3 3.5
## 3 Dan NA 5 NA NA NA 5 5
## 4 Dieudonne 5 4 NA NA NA 5 4.67
## 5 Matt 4 NA 2 NA 2 5 3.25
## 6 Mauricio 4 NA 3 3 4 NA 3.5
## 7 Max 4 4 4 2 2 4 3.33
## 8 Nathan NA NA NA NA NA 4 4
## 9 Param 4 4 1 NA NA 5 3.5
## 10 Parshu 4 3 5 5 2 3 3.67
## 11 Prashanth 5 5 5 5 NA 4 4.8
## 12 Shipra NA NA 4 5 NA 3 4
## 13 Sreejaya 5 5 5 4 4 5 4.67
## 14 Steve 4 NA NA NA NA 4 4
## 15 Vuthy 4 5 3 3 3 NA 3.6
## 16 Xingjia NA NA 5 5 NA NA 5
## # … with abbreviated variable names ¹PitchPerfect2, ²StarWarsForce, ³user_avg
<-colMeans(movie_ratings[sapply(movie_ratings, is.numeric)],na.rm=TRUE)
movie_avg movie_avg
## CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2
## 4.272727 4.444444 3.727273 3.900000 2.714286
## StarWarsForce user_avg
## 4.153846 4.030208
<-mean(movie_avg)
mean_movie mean_movie
## [1] 3.891826
<-movie_avg-mean_movie
movie_avg_minus_mean_movie movie_avg_minus_mean_movie
## CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2
## 0.380900895 0.552618066 -0.164553651 0.008173622 -1.177540664
## StarWarsForce user_avg
## 0.262019776 0.138381955
<-movie_ratings$user_avg-mean_movie
user_avg_minus_mean_movie user_avg_minus_mean_movie
## [1] 0.1081736 -0.3918264 1.1081736 0.7748403 -0.6418264 -0.3918264
## [7] -0.5584930 0.1081736 -0.3918264 -0.2251597 0.9081736 0.1081736
## [13] 0.7748403 0.1081736 -0.2918264 1.1081736
library(dplyr)
# select only the rating columns
<- c("CaptainAmerica", "Deadpool", "Frozen", "JungleBook", "PitchPerfect2", "StarWarsForce")
rating_cols <-movie_ratings$user_avg
user_avg
# print the updated movie_ratings data frame
print(movie_avg)
## CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2
## 4.272727 4.444444 3.727273 3.900000 2.714286
## StarWarsForce user_avg
## 4.153846 4.030208
<- cbind(movie_ratings,user_avg,user_avg_minus_mean_movie)
df_new df_new
## Critic CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2
## 1 Burton NA NA NA 4 NA
## 2 Charley 4 5 4 3 2
## 3 Dan NA 5 NA NA NA
## 4 Dieudonne 5 4 NA NA NA
## 5 Matt 4 NA 2 NA 2
## 6 Mauricio 4 NA 3 3 4
## 7 Max 4 4 4 2 2
## 8 Nathan NA NA NA NA NA
## 9 Param 4 4 1 NA NA
## 10 Parshu 4 3 5 5 2
## 11 Prashanth 5 5 5 5 NA
## 12 Shipra NA NA 4 5 NA
## 13 Sreejaya 5 5 5 4 4
## 14 Steve 4 NA NA NA NA
## 15 Vuthy 4 5 3 3 3
## 16 Xingjia NA NA 5 5 NA
## StarWarsForce user_avg user_avg user_avg_minus_mean_movie
## 1 4 4.000000 4.000000 0.1081736
## 2 3 3.500000 3.500000 -0.3918264
## 3 5 5.000000 5.000000 1.1081736
## 4 5 4.666667 4.666667 0.7748403
## 5 5 3.250000 3.250000 -0.6418264
## 6 NA 3.500000 3.500000 -0.3918264
## 7 4 3.333333 3.333333 -0.5584930
## 8 4 4.000000 4.000000 0.1081736
## 9 5 3.500000 3.500000 -0.3918264
## 10 3 3.666667 3.666667 -0.2251597
## 11 4 4.800000 4.800000 0.9081736
## 12 3 4.000000 4.000000 0.1081736
## 13 5 4.666667 4.666667 0.7748403
## 14 4 4.000000 4.000000 0.1081736
## 15 NA 3.600000 3.600000 -0.2918264
## 16 NA 5.000000 5.000000 1.1081736
<-data.frame(movie_avg,movie_avg_minus_mean_movie)
df1 df1
## movie_avg movie_avg_minus_mean_movie
## CaptainAmerica 4.272727 0.380900895
## Deadpool 4.444444 0.552618066
## Frozen 3.727273 -0.164553651
## JungleBook 3.900000 0.008173622
## PitchPerfect2 2.714286 -1.177540664
## StarWarsForce 4.153846 0.262019776
## user_avg 4.030208 0.138381955
Using the general formula for computing the global baseline estimate, we can estimate how Param would rate Pitch Perfect 2 as follows:
Global Baseline Estimate for Pitch Perfect2 = Mean rating for Pitch Perfect2 + Pitch Perfect2 rating relative to average + Param’s rating relative to average
<- (3.5 - 4.030) +(-1.177) + 2.714
Param_rate_pitch Param_rate_pitch
## [1] 1.007
Therefore, the global baseline estimate for Param’s rating of Pitch Perfect 2 is 1.007
This estimate suggests that Param would rate Pitch Perfect 2 lower than the average rating for the movie
We are going to compute the predicted rating for each movie that each user has not rated, using the following formula:
predicted_rating = mean_movie + user_avg_minus_mean_movie + (rating of other users who have rated this movie - mean_movie)
# create an empty data frame to store the predicted ratings
<- data.frame(matrix(ncol = 3, nrow = 0))
predicted_ratings
# add column names to the predicted_ratings data frame
colnames(predicted_ratings) <- c("User", "Movie", "Predicted_Rating")
# loop through each user
for (user in 1:nrow(df_new)) {
# loop through each movie that the user has not rated
for (movie in rating_cols[is.na(df_new[user, rating_cols])]) {
# get the ratings of other users who have rated this movie
<- df_new[!is.na(df_new[, movie]) & df_new$Critic != df_new[user, "Critic"], movie]
other_ratings
# calculate the predicted rating using the formula
<- mean_movie + df_new[user, "user_avg_minus_mean_movie"] + (mean(other_ratings) - mean_movie)
predicted_rating
# add the user, movie, and predicted rating to the predicted_ratings data frame
<- rbind(predicted_ratings, data.frame(User = df_new[user, "Critic"], Movie = movie, Predicted_Rating = predicted_rating))
predicted_ratings
}
}
# print the predicted_ratings data frame
predicted_ratings
## User Movie Predicted_Rating
## 1 Burton CaptainAmerica 4.380901
## 2 Burton Deadpool 4.552618
## 3 Burton Frozen 3.835446
## 4 Burton PitchPerfect2 2.822459
## 5 Dan CaptainAmerica 5.380901
## 6 Dan Frozen 4.835446
## 7 Dan JungleBook 5.008174
## 8 Dan PitchPerfect2 3.822459
## 9 Dieudonne Frozen 4.502113
## 10 Dieudonne JungleBook 4.674840
## 11 Dieudonne PitchPerfect2 3.489126
## 12 Matt Deadpool 3.802618
## 13 Matt JungleBook 3.258174
## 14 Mauricio Deadpool 4.052618
## 15 Mauricio StarWarsForce 3.762020
## 16 Nathan CaptainAmerica 4.380901
## 17 Nathan Deadpool 4.552618
## 18 Nathan Frozen 3.835446
## 19 Nathan JungleBook 4.008174
## 20 Nathan PitchPerfect2 2.822459
## 21 Param JungleBook 3.508174
## 22 Param PitchPerfect2 2.322459
## 23 Prashanth PitchPerfect2 3.622459
## 24 Shipra CaptainAmerica 4.380901
## 25 Shipra Deadpool 4.552618
## 26 Shipra PitchPerfect2 2.822459
## 27 Steve Deadpool 4.552618
## 28 Steve Frozen 3.835446
## 29 Steve JungleBook 4.008174
## 30 Steve PitchPerfect2 2.822459
## 31 Vuthy StarWarsForce 3.862020
## 32 Xingjia CaptainAmerica 5.380901
## 33 Xingjia Deadpool 5.552618
## 34 Xingjia PitchPerfect2 3.822459
## 35 Xingjia StarWarsForce 5.262020
“CaptainAmerica” rated by user “Dan” has the highest predicted rating of 5.380901, followed by “Deadpool” rated by “Xingjia” with a predicted rating of 5.552618
In this analysis, we used the Global Baseline Estimate algorithm to generate movie recommendations based on survey data collected on movie ratings. We started by importing the data into R and computing the user and movie means, and then used these values to compute the Global Baseline Estimate for each movie. We also predicted the ratings for movies that each user had not rated and generated a list of recommended movies based on these predictions.