Implementation Steps

Connecting/Loading Data

PostgreSQL database connection with DBI and RPostgres, import raw ratings table into R data frame, and convert NULL values in SQL into NA values in R for calculation
Bias Calculation and apply GBE formula With dyplyr, use Group_by function and calculate:
- Mean of all rating
- Group: title –> movies bias
- Group: reviewer_name –> User bias GBE formula for User+movie pair
Comparison and Data Visualization

Raw average vs GBE average predictions and seeing the user bias distribution. Find the 28 Years Later predicted score for a reviewer Jamie

Challenges: Reproducibility of database, finding correct function to create formula and finding out what Jamie would rate 28 Years Later (NA)

Code Base

# Load essential packages to use for data manipulation


library(DBI)
library(RPostgres)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

# Previously, database creation and password was prompted for users opening file. I decided to make reproducibility easier this time by converting the SQL query table into a csv file 

df <- read.csv("movie_ratings.csv", )

# Confirm if table pops up

print(df)

##                        title    name rating
## 1              The Substance     Liz      5
## 2                  Nosferatu     Liz      5
## 3             28 Years Later     Liz      5
## 4                    Sinners     Jed      5
## 5              The Substance     Jed      3
## 6             28 Years Later  Brenda      2
## 7  The Conjuring: Last Rites  Brenda      1
## 8                  Nosferatu  Brenda      4
## 9              The Substance   Jamie      4
## 10              Frankenstein   Jamie      5
## 11             The Substance Justice      5
## 12                 Nosferatu Justice      5
## 13              Frankenstein Justice      5
## 14            28 Years Later Justice      5
## 15                   Sinners Justice      5
## 16 The Conjuring: Last Rites Justice      5

Calculate Global Average - all 16 ratings

global_mu <- mean(df$rating)
print(paste("Global_Average (Mu):", round(global_mu, 4)))

## [1] "Global_Average (Mu): 4.3125"

Calculate Movie Bias for “28 Years Later”: Average rating of “28 Years Later” - Global Average

movie_avg <- df %>%
  filter(title == "28 Years Later") %>%  #single out just that particular movie
  summarize(mean_rating = mean(rating, na.rm = TRUE)) %>% #calculate mean
  pull(mean_rating) ##change to vector from column


b_i <- movie_avg - global_mu

Calculate User Bias for “Jamie” (Average rating of Jamie - Global Average)

user_avg <- df %>%
  filter(name == "Jamie") %>% #single out Jamie from list
  summarize(mean_rating = mean(rating, na.rm = TRUE)) %>% # mean calculation
  pull(mean_rating) #change to vector from column

b_u <- user_avg - global_mu

Calculate the Prediction = Global Average + Movie Bias + User Bias

prediction <- global_mu + b_i + b_u

Print the Results

print(paste("Global Average:", round(global_mu, 4)))  #round and keeps at 4th decimal place

## [1] "Global Average: 4.3125"

print(paste("Movie Bias (28 Years Later):", round(b_i, 4)))

## [1] "Movie Bias (28 Years Later): -0.3125"

print(paste("User Bias (Jamie):", round(b_u, 4)))

## [1] "User Bias (Jamie): 0.1875"

print(paste("FINAL JAMIE'S PREDICTION:", round(prediction, 4)))

## [1] "FINAL JAMIE'S PREDICTION: 4.1875"

Conclusion

Global Average was 4.31, taken for all 16 reviewers ratings. The reviewers in this group are generous with their ratings or the horror/thriller movies selected were very interesting. Due to the baseline being so high, movies under the rating of 4.31 would be under average even if 4.0 is a good score taken into account the ratings received from this data set.

Item bias: 28 Years later = 4.0 Average and has a negative bias of - 0.31. This movie is weaker compare to others in this specific data set

User bias: Jamie’s average rating = 4.5, she has a positive bias of +0.19, this shows the Jamie is little more generous reviewer compare to the average in this group

Global Estimate Baseline formula system predicts that Jamie would rate 28 Years later a 4.19. This predicted score suggests that Jamie would likely enjoy 28 Years Later and the system would recommend this movie to her. Based on the formula, I see that this provides a personalized predictions based on corrections taken from the data set ratings into account instead of just relying on the raw movie averages.

Assignment_3A Approach

Mei Qi Ng

2026-02-16

Approach