Week 11 - Recommender System

Recommender Systems - Global Baseline Estimate

In this lab, we are asked to use the Global Baseline Estimate (GBE) algorithm to calculate how a viewer would rate a movie based on a) how that specific viewer has rated other movies and b) how all movies in a dataset have been rated by all users. The formula to calculate the GBE predicted value is:

GBE = “Mean movie rating overall” + (“Specific movie rating relative to average”) + (“User rating relative to average”)

First we will load in our movie survey data as follows:

# Read data from db

source("login_credentials.R")


mydb <-  dbConnect(MySQL(), user = db_user, password = db_password,
                   dbname = db_name, host = db_host, port = db_port)

query <- "SELECT r.response_id, p.FirstName, m.title, r.rating FROM survey_movie_ratings AS r LEFT JOIN survey_movies AS m ON m.movie_id = r.movie_id LEFT JOIN survey_participants AS p ON p.participant_id = r.participant_id"
rs <- dbSendQuery(mydb, query)
df <-  fetch(rs, n = -1) |>
  rename(viewer = FirstName )
dbDisconnect(mydb)

## Warning: Closing open result sets

## [1] TRUE

head(df, 10)

##    response_id viewer                      title rating
## 1            1  Nadia                         Up      5
## 2            2  Nadia                      Moana     NA
## 3            3  Nadia                 Inside Out      5
## 4            4  Nadia Nightmare Before Christmas      4
## 5            5  Nadia                Beetlejuice      3
## 6            6  Nadia                 Home Alone      2
## 7            7   Luna                         Up      5
## 8            8   Luna                      Moana      5
## 9            9   Luna                 Inside Out      5
## 10          10   Luna Nightmare Before Christmas     NA

Calculating the GBE

To calculate the GBE, we first need to calculate the mean rating for all movies. Next, we need to calculate the Specific movie rating relative to average. We group the data by the title column, then use mutate to create a new field called movie_avg_adj which equals the average rating for each movie subtracted by the mean rating for all movies. This tells us how different the movie rates in comparison to other movies. We use the same approach to calculate the User rating relative to average by grouping by the viewer field and creating a new column called viewer_avg_adj to store the average rating for a given viewer - the mean rating for all movies. This tells us how a differnt a viewer’s typical ratings are from the average rating.

Finally, we rounded the values to match our initial integer ratings from 1-5.

# calculate mean rating for all movies
df <- df |>
  mutate(
    mean_movie_rating = mean(rating,  na.rm=T)
  ) 
  
movie_calculated_rating <- df |>
  group_by(title) |>
  # calculate movie avg - mean movie rating
  mutate( 
    movie_avg_adj = mean(rating,  na.rm=T) - mean_movie_rating
  ) |>
  ungroup()  |>
  group_by(viewer) |>
  # calculate viewer avg - mean movie rating
  mutate(
    viewer_avg_adj = mean(rating,  na.rm=T) - mean_movie_rating
  ) |>
  ungroup() |>
  filter(is.na(rating)) |>
  # for all null ratings, replace rating with global baseline estimate 
  mutate(gbe_rating = mean_movie_rating + movie_avg_adj + viewer_avg_adj)

glimpse(movie_calculated_rating)

## Rows: 9
## Columns: 8
## $ response_id       <int> 2, 10, 11, 16, 17, 20, 25, 26, 27
## $ viewer            <chr> "Nadia", "Luna", "Luna", "Natalia", "Natalia", "Ian"…
## $ title             <chr> "Moana", "Nightmare Before Christmas", "Beetlejuice"…
## $ rating            <int> NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ mean_movie_rating <dbl> 3.777778, 3.777778, 3.777778, 3.777778, 3.777778, 3.…
## $ movie_avg_adj     <dbl> 0.5555556, 0.4722222, -0.2777778, 0.4722222, -0.2777…
## $ viewer_avg_adj    <dbl> 0.02222222, 0.97222222, 0.97222222, 0.47222222, 0.47…
## $ gbe_rating        <dbl> 4.355556, 5.222222, 4.472222, 4.722222, 3.972222, 3.…

# round ratings
movie_gbe_rating <- movie_calculated_rating |>
  subset(select = c(viewer, title, gbe_rating)) |>
  mutate(rounded_gbe = round(gbe_rating))

movie_gbe_rating

## # A tibble: 9 × 4
##   viewer  title                      gbe_rating rounded_gbe
##   <chr>   <chr>                           <dbl>       <dbl>
## 1 Nadia   Moana                            4.36           4
## 2 Luna    Nightmare Before Christmas       5.22           5
## 3 Luna    Beetlejuice                      4.47           4
## 4 Natalia Nightmare Before Christmas       4.72           5
## 5 Natalia Beetlejuice                      3.97           4
## 6 Ian     Moana                            3.76           4
## 7 Kris    Up                               3.49           3
## 8 Kris    Moana                            3.22           3
## 9 Kris    Inside Out                       3.29           3

Week 11 - Recommender System - GBE

Marco Castro

2024-11-12

Recommender Systems - Global Baseline Estimate

Calculating the GBE

Conclusion