Set Working drictory
if (rstudioapi::isAvailable()) {
current_dir <- dirname(rstudioapi::getActiveDocumentContext()$path)
setwd(current_dir)
cat("Working directory set to:", getwd(), "\n")
} else {
cat("rstudioapi not available. Please set the working directory manually using setwd().\n")
}
## rstudioapi not available. Please set the working directory manually using setwd().
Introduction
This document predicts Param’s rating for Pitch Perfect 2
using the Global Baseline Estimate algorithm, based on the MovieRatings
dataset. The algorithm uses the formula:
\[
\hat{r}_{ui} = \mu + b_i + b_u
\]
Where: - \(\mu\): Mean movie rating
across all users and movies. - \(b_i\):
Pitch Perfect 2’s bias (movie average rating minus \(\mu\)). - \(b_u\): Param’s bias (user average rating
minus \(\mu\)).
The dataset (MovieRatings.xlsx
) contains ratings from
critics for six movies, with some missing values. The expected result,
based on provided values (\(\mu =
3.93\), \(b_i = -1.22\), \(b_u = -0.43\)), is a predicted rating of
approximately 2.28.
Setup
Load required R packages. Install them if not already installed.
## Warning: package 'readxl' was built under R version 4.4.3
## Warning: package 'dplyr' was built under R version 4.4.3
Load and Clean Data
Read the MovieRatings.xlsx
file and remove duplicate
rows, as the dataset contains repeated entries.
library(readxl)
library(dplyr)
# Read Excel file from GitHub raw link
url <- "https://raw.githubusercontent.com/tanzil64/Global-Baseline-Estimation/main/MovieRatings.xlsx"
# Download the file to a temporary location
temp_file <- tempfile(fileext = ".xlsx")
download.file(url, destfile = temp_file, mode = "wb")
# Read and clean data
ratings <- read_excel(temp_file, sheet = "MovieRatings")
ratings <- distinct(ratings) # Remove duplicates
head(ratings)
## # A tibble: 6 × 7
## Critic CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2 StarWarsForce
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Burton NA NA NA 4 NA 4
## 2 Charley 4 5 4 3 2 3
## 3 Dan NA 5 NA NA NA 5
## 4 Dieudon… 5 4 NA NA NA 5
## 5 Matt 4 NA 2 NA 2 5
## 6 Mauricio 4 NA 3 3 4 NA
#ratings <- read_excel("MovieRatings.xlsx", sheet = "MovieRatings")
#ratings <- distinct(ratings) # Remove duplicates
#head(ratings)
Convert the data to long format (Critic, Movie, Rating) to facilitate
computations, dropping missing ratings.
ratings_long <- ratings %>%
pivot_longer(
cols = c("CaptainAmerica", "Deadpool", "Frozen", "JungleBook", "PitchPerfect2", "StarWarsForce"),
names_to = "Movie",
values_to = "Rating",
values_drop_na = TRUE
)
Compute Global Mean (\(\mu\))
Calculate the mean rating across all movies and users.
mu <- mean(ratings_long$Rating, na.rm = TRUE)
cat("Global Mean Rating (mu):", mu, "\n")
## Global Mean Rating (mu): 3.934426
The computed mean is approximately 3.934426, close to the provided
3.93.
Compute User Biases (\(b_u\))
Calculate each user’s bias as their average rating minus the global
mean. No regularization is applied, as the provided Param’s bias (-0.43)
suggests none was used.
user_bias <- ratings_long %>%
group_by(Critic) %>%
summarise(
user_avg = mean(Rating, na.rm = TRUE),
n_ratings = n(),
.groups = "drop"
) %>%
mutate(b_u = user_avg - mu)
# Extract Param's bias
param_bias <- user_bias %>%
filter(Critic == "Param") %>%
pull(b_u)
cat("Param's Bias (b_u):", param_bias, "\n")
## Param's Bias (b_u): -0.4344262
Param’s bias is approximately -0.434426, matching the provided
value.
Compute Movie Biases (\(b_i\))
Calculate each movie’s bias as its average rating minus the global
mean.
movie_bias <- ratings_long %>%
group_by(Movie) %>%
summarise(
movie_avg = mean(Rating, na.rm = TRUE),
n_ratings = n(),
.groups = "drop"
) %>%
mutate(b_i = movie_avg - mu)
# Extract Pitch Perfect 2's bias
pp2_bias <- movie_bias %>%
filter(Movie == "PitchPerfect2") %>%
pull(b_i)
cat("Pitch Perfect 2's Bias (b_i):", pp2_bias, "\n")
## Pitch Perfect 2's Bias (b_i): -1.220141
Pitch Perfect 2’s bias is approximately -1.220140, matching
the provided value.
This analysis successfully predicts Param’s rating for Pitch
Perfect 2 using the Global Baseline Estimate algorithm.
Predict Param’s Rating for Pitch Perfect 2
Compute the predicted rating using the Global Baseline Estimate,
clipping to the 1–5 scale.
predicted_rating <- mu + param_bias + pp2_bias
predicted_rating <- pmin(pmax(predicted_rating, 1), 5) # Clip to 1-5 scale
cat("Predicted Rating for Param on Pitch Perfect 2:", predicted_rating, "\n")
## Predicted Rating for Param on Pitch Perfect 2: 2.279859
Validation
Compare the predicted rating to the provided expected value
(2.2798594847775178).
expected_rating <- 2.2798594847775178
cat("Provided Expected Rating:", expected_rating, "\n")
## Provided Expected Rating: 2.279859
cat("Difference from Expected:", abs(predicted_rating - expected_rating), "\n")
## Difference from Expected: 0
The predicted rating (~2.279859) matches the expected value,
confirming the computation.
Conclusion:
Using the Global Baseline Estimate, we accurately predicted Param’s
rating for Pitch Perfect 2 as 2.28, matching the expected result. This
confirms the effectiveness of the approach in capturing user and item
biases, even with sparse data. It provides a solid foundation for
building more advanced recommender systems.
LS0tDQp0aXRsZTogIkdsb2JhbCBCYXNlbGluZSBFc3RpbWF0aW9uIg0KYXV0aG9yOiAibWQuIFRhbnppbCBFaHNhbiINCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCm91dHB1dDoNCiAgb3BlbmludHJvOjpsYWJfcmVwb3J0OiBkZWZhdWx0DQogIGh0bWxfZG9jdW1lbnQ6IGRlZmF1bHQNCi0tLQ0KDQojIyBTZXQgV29ya2luZyBkcmljdG9yeQ0KYGBge3J9DQppZiAocnN0dWRpb2FwaTo6aXNBdmFpbGFibGUoKSkgew0KICBjdXJyZW50X2RpciA8LSBkaXJuYW1lKHJzdHVkaW9hcGk6OmdldEFjdGl2ZURvY3VtZW50Q29udGV4dCgpJHBhdGgpDQogIHNldHdkKGN1cnJlbnRfZGlyKQ0KICBjYXQoIldvcmtpbmcgZGlyZWN0b3J5IHNldCB0bzoiLCBnZXR3ZCgpLCAiXG4iKQ0KfSBlbHNlIHsNCiAgY2F0KCJyc3R1ZGlvYXBpIG5vdCBhdmFpbGFibGUuIFBsZWFzZSBzZXQgdGhlIHdvcmtpbmcgZGlyZWN0b3J5IG1hbnVhbGx5IHVzaW5nIHNldHdkKCkuXG4iKQ0KfQ0KYGBgDQoNCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQ0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFKQ0KYGBgDQoNCiMjIEludHJvZHVjdGlvbg0KDQpUaGlzIGRvY3VtZW50IHByZWRpY3RzIFBhcmFtJ3MgcmF0aW5nIGZvciAqUGl0Y2ggUGVyZmVjdCAyKiB1c2luZyB0aGUgR2xvYmFsIEJhc2VsaW5lIEVzdGltYXRlIGFsZ29yaXRobSwgYmFzZWQgb24gdGhlIE1vdmllUmF0aW5ncyBkYXRhc2V0LiBUaGUgYWxnb3JpdGhtIHVzZXMgdGhlIGZvcm11bGE6DQoNClxbDQpcaGF0e3J9X3t1aX0gPSBcbXUgKyBiX2kgKyBiX3UNClxdDQoNCldoZXJlOg0KLSBcKCBcbXUgXCk6IE1lYW4gbW92aWUgcmF0aW5nIGFjcm9zcyBhbGwgdXNlcnMgYW5kIG1vdmllcy4NCi0gXCggYl9pIFwpOiAqUGl0Y2ggUGVyZmVjdCAyKidzIGJpYXMgKG1vdmllIGF2ZXJhZ2UgcmF0aW5nIG1pbnVzIFwoIFxtdSBcKSkuDQotIFwoIGJfdSBcKTogUGFyYW0ncyBiaWFzICh1c2VyIGF2ZXJhZ2UgcmF0aW5nIG1pbnVzIFwoIFxtdSBcKSkuDQoNClRoZSBkYXRhc2V0IChgTW92aWVSYXRpbmdzLnhsc3hgKSBjb250YWlucyByYXRpbmdzIGZyb20gY3JpdGljcyBmb3Igc2l4IG1vdmllcywgd2l0aCBzb21lIG1pc3NpbmcgdmFsdWVzLiBUaGUgZXhwZWN0ZWQgcmVzdWx0LCBiYXNlZCBvbiBwcm92aWRlZCB2YWx1ZXMgKFwoIFxtdSA9IDMuOTMgXCksIFwoIGJfaSA9IC0xLjIyIFwpLCBcKCBiX3UgPSAtMC40MyBcKSksIGlzIGEgcHJlZGljdGVkIHJhdGluZyBvZiBhcHByb3hpbWF0ZWx5IDIuMjguDQoNCiMjIFNldHVwDQoNCkxvYWQgcmVxdWlyZWQgUiBwYWNrYWdlcy4gSW5zdGFsbCB0aGVtIGlmIG5vdCBhbHJlYWR5IGluc3RhbGxlZC4NCg0KYGBge3IgbG9hZC1saWJyYXJpZXMsIG1lc3NhZ2U9RkFMU0V9DQpsaWJyYXJ5KHJlYWR4bCkNCmxpYnJhcnkoZHBseXIpDQpsaWJyYXJ5KHRpZHlyKQ0KYGBgDQoNCiMjIExvYWQgYW5kIENsZWFuIERhdGENCg0KUmVhZCB0aGUgYE1vdmllUmF0aW5ncy54bHN4YCBmaWxlIGFuZCByZW1vdmUgZHVwbGljYXRlIHJvd3MsIGFzIHRoZSBkYXRhc2V0IGNvbnRhaW5zIHJlcGVhdGVkIGVudHJpZXMuDQoNCg0KDQoNCmBgYHtyfQ0KbGlicmFyeShyZWFkeGwpDQpsaWJyYXJ5KGRwbHlyKQ0KDQojIFJlYWQgRXhjZWwgZmlsZSBmcm9tIEdpdEh1YiByYXcgbGluaw0KdXJsIDwtICJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vdGFuemlsNjQvR2xvYmFsLUJhc2VsaW5lLUVzdGltYXRpb24vbWFpbi9Nb3ZpZVJhdGluZ3MueGxzeCINCg0KIyBEb3dubG9hZCB0aGUgZmlsZSB0byBhIHRlbXBvcmFyeSBsb2NhdGlvbg0KdGVtcF9maWxlIDwtIHRlbXBmaWxlKGZpbGVleHQgPSAiLnhsc3giKQ0KZG93bmxvYWQuZmlsZSh1cmwsIGRlc3RmaWxlID0gdGVtcF9maWxlLCBtb2RlID0gIndiIikNCg0KIyBSZWFkIGFuZCBjbGVhbiBkYXRhDQpyYXRpbmdzIDwtIHJlYWRfZXhjZWwodGVtcF9maWxlLCBzaGVldCA9ICJNb3ZpZVJhdGluZ3MiKQ0KcmF0aW5ncyA8LSBkaXN0aW5jdChyYXRpbmdzKSAgIyBSZW1vdmUgZHVwbGljYXRlcw0KaGVhZChyYXRpbmdzKQ0KDQpgYGANCg0KDQpgYGB7ciBsb2FkLWRhdGF9DQojcmF0aW5ncyA8LSByZWFkX2V4Y2VsKCJNb3ZpZVJhdGluZ3MueGxzeCIsIHNoZWV0ID0gIk1vdmllUmF0aW5ncyIpDQojcmF0aW5ncyA8LSBkaXN0aW5jdChyYXRpbmdzKSAgIyBSZW1vdmUgZHVwbGljYXRlcw0KI2hlYWQocmF0aW5ncykNCmBgYA0KDQpDb252ZXJ0IHRoZSBkYXRhIHRvIGxvbmcgZm9ybWF0IChDcml0aWMsIE1vdmllLCBSYXRpbmcpIHRvIGZhY2lsaXRhdGUgY29tcHV0YXRpb25zLCBkcm9wcGluZyBtaXNzaW5nIHJhdGluZ3MuDQoNCmBgYHtyIHJlc2hhcGUtZGF0YX0NCnJhdGluZ3NfbG9uZyA8LSByYXRpbmdzICU+JQ0KICBwaXZvdF9sb25nZXIoDQogICAgY29scyA9IGMoIkNhcHRhaW5BbWVyaWNhIiwgIkRlYWRwb29sIiwgIkZyb3plbiIsICJKdW5nbGVCb29rIiwgIlBpdGNoUGVyZmVjdDIiLCAiU3RhcldhcnNGb3JjZSIpLA0KICAgIG5hbWVzX3RvID0gIk1vdmllIiwNCiAgICB2YWx1ZXNfdG8gPSAiUmF0aW5nIiwNCiAgICB2YWx1ZXNfZHJvcF9uYSA9IFRSVUUNCiAgKQ0KYGBgDQoNCiMjIENvbXB1dGUgR2xvYmFsIE1lYW4gKFwoIFxtdSBcKSkNCg0KQ2FsY3VsYXRlIHRoZSBtZWFuIHJhdGluZyBhY3Jvc3MgYWxsIG1vdmllcyBhbmQgdXNlcnMuDQoNCmBgYHtyIGdsb2JhbC1tZWFufQ0KbXUgPC0gbWVhbihyYXRpbmdzX2xvbmckUmF0aW5nLCBuYS5ybSA9IFRSVUUpDQpjYXQoIkdsb2JhbCBNZWFuIFJhdGluZyAobXUpOiIsIG11LCAiXG4iKQ0KYGBgDQoNClRoZSBjb21wdXRlZCBtZWFuIGlzIGFwcHJveGltYXRlbHkgMy45MzQ0MjYsIGNsb3NlIHRvIHRoZSBwcm92aWRlZCAzLjkzLg0KDQojIyBDb21wdXRlIFVzZXIgQmlhc2VzIChcKCBiX3UgXCkpDQoNCkNhbGN1bGF0ZSBlYWNoIHVzZXLigJlzIGJpYXMgYXMgdGhlaXIgYXZlcmFnZSByYXRpbmcgbWludXMgdGhlIGdsb2JhbCBtZWFuLiBObyByZWd1bGFyaXphdGlvbiBpcyBhcHBsaWVkLCBhcyB0aGUgcHJvdmlkZWQgUGFyYW3igJlzIGJpYXMgKC0wLjQzKSBzdWdnZXN0cyBub25lIHdhcyB1c2VkLg0KDQpgYGB7ciB1c2VyLWJpYXN9DQp1c2VyX2JpYXMgPC0gcmF0aW5nc19sb25nICU+JQ0KICBncm91cF9ieShDcml0aWMpICU+JQ0KICBzdW1tYXJpc2UoDQogICAgdXNlcl9hdmcgPSBtZWFuKFJhdGluZywgbmEucm0gPSBUUlVFKSwNCiAgICBuX3JhdGluZ3MgPSBuKCksDQogICAgLmdyb3VwcyA9ICJkcm9wIg0KICApICU+JQ0KICBtdXRhdGUoYl91ID0gdXNlcl9hdmcgLSBtdSkNCg0KIyBFeHRyYWN0IFBhcmFtJ3MgYmlhcw0KcGFyYW1fYmlhcyA8LSB1c2VyX2JpYXMgJT4lDQogIGZpbHRlcihDcml0aWMgPT0gIlBhcmFtIikgJT4lDQogIHB1bGwoYl91KQ0KY2F0KCJQYXJhbSdzIEJpYXMgKGJfdSk6IiwgcGFyYW1fYmlhcywgIlxuIikNCmBgYA0KDQpQYXJhbeKAmXMgYmlhcyBpcyBhcHByb3hpbWF0ZWx5IC0wLjQzNDQyNiwgbWF0Y2hpbmcgdGhlIHByb3ZpZGVkIHZhbHVlLg0KDQojIyBDb21wdXRlIE1vdmllIEJpYXNlcyAoXCggYl9pIFwpKQ0KDQpDYWxjdWxhdGUgZWFjaCBtb3ZpZeKAmXMgYmlhcyBhcyBpdHMgYXZlcmFnZSByYXRpbmcgbWludXMgdGhlIGdsb2JhbCBtZWFuLg0KDQpgYGB7ciBtb3ZpZS1iaWFzfQ0KbW92aWVfYmlhcyA8LSByYXRpbmdzX2xvbmcgJT4lDQogIGdyb3VwX2J5KE1vdmllKSAlPiUNCiAgc3VtbWFyaXNlKA0KICAgIG1vdmllX2F2ZyA9IG1lYW4oUmF0aW5nLCBuYS5ybSA9IFRSVUUpLA0KICAgIG5fcmF0aW5ncyA9IG4oKSwNCiAgICAuZ3JvdXBzID0gImRyb3AiDQogICkgJT4lDQogIG11dGF0ZShiX2kgPSBtb3ZpZV9hdmcgLSBtdSkNCg0KIyBFeHRyYWN0IFBpdGNoIFBlcmZlY3QgMidzIGJpYXMNCnBwMl9iaWFzIDwtIG1vdmllX2JpYXMgJT4lDQogIGZpbHRlcihNb3ZpZSA9PSAiUGl0Y2hQZXJmZWN0MiIpICU+JQ0KICBwdWxsKGJfaSkNCmNhdCgiUGl0Y2ggUGVyZmVjdCAyJ3MgQmlhcyAoYl9pKToiLCBwcDJfYmlhcywgIlxuIikNCmBgYA0KDQoqUGl0Y2ggUGVyZmVjdCAyKuKAmXMgYmlhcyBpcyBhcHByb3hpbWF0ZWx5IC0xLjIyMDE0MCwgbWF0Y2hpbmcgdGhlIHByb3ZpZGVkIHZhbHVlLg0KDQpUaGlzIGFuYWx5c2lzIHN1Y2Nlc3NmdWxseSBwcmVkaWN0cyBQYXJhbeKAmXMgcmF0aW5nIGZvciAqUGl0Y2ggUGVyZmVjdCAyKiB1c2luZyB0aGUgR2xvYmFsIEJhc2VsaW5lIEVzdGltYXRlIGFsZ29yaXRobS4NCg0KIyMgUHJlZGljdCBQYXJhbeKAmXMgUmF0aW5nIGZvciBQaXRjaCBQZXJmZWN0IDINCg0KQ29tcHV0ZSB0aGUgcHJlZGljdGVkIHJhdGluZyB1c2luZyB0aGUgR2xvYmFsIEJhc2VsaW5lIEVzdGltYXRlLCBjbGlwcGluZyB0byB0aGUgMeKAkzUgc2NhbGUuDQoNCmBgYHtyIHByZWRpY3QtcmF0aW5nfQ0KcHJlZGljdGVkX3JhdGluZyA8LSBtdSArIHBhcmFtX2JpYXMgKyBwcDJfYmlhcw0KcHJlZGljdGVkX3JhdGluZyA8LSBwbWluKHBtYXgocHJlZGljdGVkX3JhdGluZywgMSksIDUpICAjIENsaXAgdG8gMS01IHNjYWxlDQpjYXQoIlByZWRpY3RlZCBSYXRpbmcgZm9yIFBhcmFtIG9uIFBpdGNoIFBlcmZlY3QgMjoiLCBwcmVkaWN0ZWRfcmF0aW5nLCAiXG4iKQ0KYGBgDQoNCiMjIFZhbGlkYXRpb24NCg0KQ29tcGFyZSB0aGUgcHJlZGljdGVkIHJhdGluZyB0byB0aGUgcHJvdmlkZWQgZXhwZWN0ZWQgdmFsdWUgKDIuMjc5ODU5NDg0Nzc3NTE3OCkuDQoNCmBgYHtyIHZhbGlkYXRlfQ0KZXhwZWN0ZWRfcmF0aW5nIDwtIDIuMjc5ODU5NDg0Nzc3NTE3OA0KY2F0KCJQcm92aWRlZCBFeHBlY3RlZCBSYXRpbmc6IiwgZXhwZWN0ZWRfcmF0aW5nLCAiXG4iKQ0KY2F0KCJEaWZmZXJlbmNlIGZyb20gRXhwZWN0ZWQ6IiwgYWJzKHByZWRpY3RlZF9yYXRpbmcgLSBleHBlY3RlZF9yYXRpbmcpLCAiXG4iKQ0KYGBgDQoNClRoZSBwcmVkaWN0ZWQgcmF0aW5nICh+Mi4yNzk4NTkpIG1hdGNoZXMgdGhlIGV4cGVjdGVkIHZhbHVlLCBjb25maXJtaW5nIHRoZSBjb21wdXRhdGlvbi4NCg0KIyMgQ29uY2x1c2lvbjoNClVzaW5nIHRoZSBHbG9iYWwgQmFzZWxpbmUgRXN0aW1hdGUsIHdlIGFjY3VyYXRlbHkgcHJlZGljdGVkIFBhcmFt4oCZcyByYXRpbmcgZm9yIFBpdGNoIFBlcmZlY3QgMiBhcyAyLjI4LCBtYXRjaGluZyB0aGUgZXhwZWN0ZWQgcmVzdWx0LiBUaGlzIGNvbmZpcm1zIHRoZSBlZmZlY3RpdmVuZXNzIG9mIHRoZSBhcHByb2FjaCBpbiBjYXB0dXJpbmcgdXNlciBhbmQgaXRlbSBiaWFzZXMsIGV2ZW4gd2l0aCBzcGFyc2UgZGF0YS4gSXQgcHJvdmlkZXMgYSBzb2xpZCBmb3VuZGF0aW9uIGZvciBidWlsZGluZyBtb3JlIGFkdmFuY2VkIHJlY29tbWVuZGVyIHN5c3RlbXMuDQoNCg0KDQoNCg==