Library

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)

Introduction

The Global Baseline Estimate (GBE) is a simple yet powerful non‐personalized recommendation algorithm.
Rather than modeling each user’s tastes in detail, GBE predicts every user–item rating \(\hat r_{ui}\) as:

\[ \hat r_{ui} \;=\; \mu \;+\; b_u \;+\; b_i \]

where
- \(\mu\) is the global mean of all observed ratings,
- \(b_u\) is the user bias (how much User u tends to rate above or below \(\mu\)),
- \(b_i\) is the item bias (how much Movie i tends to be rated above or below \(\mu\)).

Once these components are computed, for each user we recommend the unrated movie with the highest \(\hat r_{ui}\).
This approach serves as a strong baseline and can be extended with regularization or further personalization.

Load & Tidy Data

ratings_df <- read.csv("https://raw.githubusercontent.com/JaydeeJan/Week-11-Assignment-Extra-Credit/refs/heads/main/MovieRatings1.csv", header = TRUE, stringsAsFactors = FALSE)

ratings_long <- ratings_df %>%
  pivot_longer(
    cols = -Critic,
    names_to = "Movie",
    values_to = "Rating"
  ) %>%
  filter(!is.na(Rating)
  )

cat("Total ratings:", nrow(ratings_long), "\n")

## Total ratings: 61

cat("Unique critics:", n_distinct(ratings_long$Critic), 
    " | Unique movies:", n_distinct(ratings_long$Movie), "\n")

## Unique critics: 16  | Unique movies: 6

Compute Global Mean μ

mu <- mean(ratings_long$Rating)
cat("Global average rating (μ):", round(mu, 3), "\n")

## Global average rating (μ): 3.934

Compute User & Item Biases

b_u <- ratings_long %>%
  group_by(Critic) %>%
  summarise(b_u = mean(Rating - mu), .groups = "drop")

b_i <- ratings_long %>%
  group_by(Movie) %>%
  summarise(b_i = mean(Rating - mu), .groups = "drop")

Build Prediction Grid & Score All Pairs

users <- unique(ratings_long$Critic)
movies <- unique(ratings_long$Movie)

pred_grid <- expand.grid(Critic = users, Movie = movies,
                         stringsAsFactors = FALSE) %>%
  left_join(b_u, by = "Critic") %>%
  left_join(b_i, by = "Movie") %>%
  mutate(
    b_u = coalesce(b_u, 0),
    b_i = coalesce(b_i, 0),
    pred = mu + b_u + b_i
  )

Generate Top‐N Recommendation

recommendations <- pred_grid %>%
  anti_join(ratings_long, by = c("Critic", "Movie")) %>%
  group_by(Critic) %>%
  slice_max(pred, n = 1, with_ties = FALSE) %>%
  ungroup()

print(recommendations)

## # A tibble: 12 × 5
##    Critic    Movie              b_u     b_i  pred
##    <chr>     <chr>            <dbl>   <dbl> <dbl>
##  1 Burton    Deadpool        0.0656  0.510   4.51
##  2 Dan       CaptainAmerica  1.07    0.338   5.34
##  3 Dieudonne JungleBook      0.732  -0.0344  4.63
##  4 Matt      Deadpool       -0.684   0.510   3.76
##  5 Mauricio  Deadpool       -0.434   0.510   4.01
##  6 Nathan    Deadpool        0.0656  0.510   4.51
##  7 Param     JungleBook     -0.434  -0.0344  3.47
##  8 Prashanth PitchPerfect2   0.866  -1.22    3.58
##  9 Shipra    Deadpool        0.0656  0.510   4.51
## 10 Steve     Deadpool        0.0656  0.510   4.51
## 11 Vuthy     StarWarsForce  -0.334   0.219   3.82
## 12 Xingjia   Deadpool        1.07    0.510   5.51

Conclusion

Our analysis of the Global Baseline Estimate for 12 critics highlights three main findings:

1. “Deadpool” is the top recommendation, favored by 7 of 12 critics due to its strong positive bias (+0.51). Even critics who typically rate below average (Matt) receive it as their top unrated pick.

2. Personal bias influences predicted scores. Critics with high bias (Dan, Xingjia) get the highest predictions (~5.5), while those with negative bias (Param, Vuthy) see scores below the global average (~3.93), guided toward movies like Jungle Book or Star Wars.

3. Unrated filtering shapes results. If critics have already rated Deadpool, the algorithm recommends their next best option (Jungle Book), ensuring suggestions align with their viewing history.

Week 11 Assignment

Jayden Jiang

2025-05-17