Instructions

Using the information you collected on movie ratings, implement a Global Baseline Estimate recommendation system in R.

Most recommender systems use personalized algorithms like “content management” and “item-item collaborative filtering.” Sometimes non-personalized recommenders are also useful or necessary. One of the best non-personalized recommender system algorithms is the “Global Baseline Estimate.

The job here is to use the survey data collected and write the R code that makes a movie recommendation using the Global Baseline Estimate algorithm. Please see the attached spreadsheet for implementation details.

Introduction

I will create a recommender system based on the given movie ratings data and Global Baseline Estimate algorithm. I will show what movie each critic would most likely enjoy out of the movies that they have not seen from the 6 movies in the data.

The global baseline estimate algorithm is GBE = Mean Movie Rating + SPecific movie rating relative to average + specific critic’s average relative to the mean movie rating.

library(janitor)
## Warning: package 'janitor' was built under R version 4.4.2
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Importing dataset

movieratings <- read.csv("https://raw.githubusercontent.com/Chung-Brandon/607/refs/heads/main/movieratings.csv")

# Removing empty rows
movieratings <- movieratings[c(1:16),]

# Cleaning column names to be undercase
movieratings <- movieratings %>%
  clean_names()
# Viewing the movie rating matrix
head(movieratings)
##      critic captain_america deadpool frozen jungle_book pitch_perfect2
## 1    Burton              NA       NA     NA           4             NA
## 2   Charley               4        5      4           3              2
## 3       Dan              NA        5     NA          NA             NA
## 4 Dieudonne               5        4     NA          NA             NA
## 5      Matt               4       NA      2          NA              2
## 6  Mauricio               4       NA      3           3              4
##   star_wars_force
## 1               4
## 2               3
## 3               5
## 4               5
## 5               5
## 6              NA
# Tidying the data
movieratings.tidy <- movieratings %>%
  pivot_longer(
    cols = 2:7,
    names_to = "movie",
    values_to = "rating"
  )

head(movieratings.tidy)
## # A tibble: 6 × 3
##   critic movie           rating
##   <chr>  <chr>            <int>
## 1 Burton captain_america     NA
## 2 Burton deadpool            NA
## 3 Burton frozen              NA
## 4 Burton jungle_book          4
## 5 Burton pitch_perfect2      NA
## 6 Burton star_wars_force      4
# creating df for critic average ratings 

critic.avg <- movieratings.tidy %>%
  group_by(critic) %>%
  summarize(avg.rating = mean(rating, na.rm = TRUE))

head(critic.avg)
## # A tibble: 6 × 2
##   critic    avg.rating
##   <chr>          <dbl>
## 1 Burton          4   
## 2 Charley         3.5 
## 3 Dan             5   
## 4 Dieudonne       4.67
## 5 Matt            3.25
## 6 Mauricio        3.5
# Calculating the mean_movie rating
mean_movie_rating <- mean(movieratings.tidy$rating, na.rm = TRUE)

mean_movie_rating
## [1] 3.934426
# Mutate critic rating relative to mean_movie_rating into critic.avg

critic.avg <- critic.avg %>%
  mutate(critic.mean_relativerating = avg.rating - mean_movie_rating)
# Calculate movie rating relative to mean_movie_rating

movie.avg <- movieratings.tidy %>%
  group_by(movie) %>%
  summarise(avg.rating = mean(rating, na.rm = TRUE))

movie.avg <- movie.avg %>%
  mutate(movie.mean_relativerating = avg.rating - mean_movie_rating)
# Calculating global baseline estimate for each movie

# Join the datasets together for relevent values for GBE calculation
combined <- movieratings.tidy %>%
  left_join(critic.avg, by = "critic") %>%
  left_join(movie.avg, by = "movie")


# Drop unnecessary columns
combined <- combined[,-c(4,6)]

# Mutate in GBE column
combined <- combined %>%
  mutate(gbe = mean_movie_rating + critic.mean_relativerating + movie.mean_relativerating)

head(combined)
## # A tibble: 6 × 6
##   critic movie        rating critic.mean_relative…¹ movie.mean_relativer…²   gbe
##   <chr>  <chr>         <int>                  <dbl>                  <dbl> <dbl>
## 1 Burton captain_ame…     NA                 0.0656                 0.338   4.34
## 2 Burton deadpool         NA                 0.0656                 0.510   4.51
## 3 Burton frozen           NA                 0.0656                -0.207   3.79
## 4 Burton jungle_book       4                 0.0656                -0.0344  3.97
## 5 Burton pitch_perfe…     NA                 0.0656                -1.22    2.78
## 6 Burton star_wars_f…      4                 0.0656                 0.219   4.22
## # ℹ abbreviated names: ¹​critic.mean_relativerating, ²​movie.mean_relativerating
# Creating a recommendations list where rating = NA, and arranging by GBE

recommendations <- combined %>%
  filter(is.na(rating)) %>%
  group_by(critic) %>%
  arrange(desc(gbe)) %>%
  select(critic, movie, gbe)
  

top.recommendation <- combined %>%
  filter(is.na(rating)) %>%
  group_by(critic) %>%
  slice_max(gbe, n = 1) %>%
  select(critic, movie, gbe)

head(top.recommendation)
## # A tibble: 6 × 3
## # Groups:   critic [6]
##   critic    movie             gbe
##   <chr>     <chr>           <dbl>
## 1 Burton    deadpool         4.51
## 2 Dan       captain_america  5.34
## 3 Dieudonne jungle_book      4.63
## 4 Matt      deadpool         3.76
## 5 Mauricio  deadpool         4.01
## 6 Nathan    deadpool         4.51
top.recommendation %>%
  group_by(movie) %>%
  count() %>%
  arrange(desc(n))
## # A tibble: 5 × 2
## # Groups:   movie [5]
##   movie               n
##   <chr>           <int>
## 1 deadpool            7
## 2 jungle_book         2
## 3 captain_america     1
## 4 pitch_perfect2      1
## 5 star_wars_force     1

Conclusion

In conclusion, using the Global Baseline Estimate recommendation system I have found the top recommended movies for each critic, if they have not rated any of the 6 movies listed, and displayed them in the data set top.recommendation. Deadpool rated the highest recommended movie for 7 out of 12 critics, and jungle_book came in second being top rated for 2 out of 12 critics.