Week 11 Extra Credit Assignment

Global Baseline Estimate

The Global Baseline Estimate (GBE) is a non-personalized prediction algorithm that uses a global average across all users and items as a baseline estimate. Bias is then mitigated by calculating the differences between those averages and specific users and items.

Recommendations are then determined as follows: the estimate or prediction of a rating by user u of item i = the global average rating + the user bias + the item bias.

Movie Ratings

Below, the GBE is applied to sample movie ratings to make movie recommendations. First, the excel survey data is imported.

library(tidyverse)
library(readxl)

ratings <- read_excel("MovieRatings.xlsx")

Next, the average ratings for each Critic row (user_biases) and movie column (item_biases) are calculated.

user_biases <- data.frame(
  Critic = ratings$Critic,
  user_avg = round(rowMeans(ratings[2:7], na.rm = TRUE), digits = 2)
)

user_global_avg <- round(mean(user_biases$user_avg), digits = 2)

The global average for users is 4.03, which is used to calculate the user bias.

user_biases <- user_biases |>
  mutate(user_bias = user_avg - user_global_avg)

user_biases

##       Critic user_avg user_bias
## 1     Burton     4.00     -0.03
## 2    Charley     3.50     -0.53
## 3        Dan     5.00      0.97
## 4  Dieudonne     4.67      0.64
## 5       Matt     3.25     -0.78
## 6   Mauricio     3.50     -0.53
## 7        Max     3.33     -0.70
## 8     Nathan     4.00     -0.03
## 9      Param     3.50     -0.53
## 10    Parshu     3.67     -0.36
## 11 Prashanth     4.80      0.77
## 12    Shipra     4.00     -0.03
## 13  Sreejaya     4.67      0.64
## 14     Steve     4.00     -0.03
## 15     Vuthy     3.60     -0.43
## 16   Xingjia     5.00      0.97

The process is similar for calculating the item bias, except the ratings table is pivoted longer to tidy the data first and the calculated global average for items is 3.87.

ratings_tidy <- ratings |>
  pivot_longer(!Critic, names_to = "movies", values_to = "rating")

item_biases <- ratings_tidy |>
  group_by(movies) |>
  summarize(item_avg = round(mean(rating, na.rm = TRUE), digits = 2))

item_global_avg <- round(mean(item_biases$item_avg), digits = 2)

item_biases <- item_biases |>
  mutate(item_bias = item_avg - item_global_avg)

item_biases

## # A tibble: 6 × 3
##   movies         item_avg item_bias
##   <chr>             <dbl>     <dbl>
## 1 CaptainAmerica     4.27    0.400 
## 2 Deadpool           4.44    0.57  
## 3 Frozen             3.73   -0.140 
## 4 JungleBook         3.9     0.0300
## 5 PitchPerfect2      2.71   -1.16  
## 6 StarWarsForce      4.15    0.280

These biases are joined onto the tidy dataset for easier calculations.

gbe_ratings <- ratings_tidy |>
  filter(is.na(rating)) |>
  left_join(user_biases, by = "Critic") |>
  left_join(item_biases, by = "movies") |>
  mutate(predicted_rating = user_global_avg + user_bias + item_bias)

gbe_ratings

## # A tibble: 35 × 8
##    Critic   movies rating user_avg user_bias item_avg item_bias predicted_rating
##    <chr>    <chr>   <dbl>    <dbl>     <dbl>    <dbl>     <dbl>            <dbl>
##  1 Burton   Capta…     NA     4      -0.0300     4.27    0.400              4.4 
##  2 Burton   Deadp…     NA     4      -0.0300     4.44    0.57               4.57
##  3 Burton   Frozen     NA     4      -0.0300     3.73   -0.140              3.86
##  4 Burton   Pitch…     NA     4      -0.0300     2.71   -1.16               2.84
##  5 Dan      Capta…     NA     5       0.97       4.27    0.400              5.4 
##  6 Dan      Frozen     NA     5       0.97       3.73   -0.140              4.86
##  7 Dan      Jungl…     NA     5       0.97       3.9     0.0300             5.03
##  8 Dan      Pitch…     NA     5       0.97       2.71   -1.16               3.84
##  9 Dieudon… Frozen     NA     4.67    0.64       3.73   -0.140              4.53
## 10 Dieudon… Jungl…     NA     4.67    0.64       3.9     0.0300             4.7 
## # ℹ 25 more rows

Conclusion

The predicted_rating column of the new gbe_ratings dataframe should give reasonable predictions for each user-item pair. For example, each Critic could be given a list of movies they have not rated, filtered by any prediction over 4.

Week 11 Extra Credit Assignment

Stephanie Chiang

2024-11-13

Global Baseline Estimate

Movie Ratings

Conclusion