Using the information you collected on movie ratings, implement a Global Baseline Estimate recommendation system in R.
Most recommender systems use personalized algorithms like “content management” and “item-item collaborative filtering.” Sometimes non-personalized recommenders are also useful or necessary. One of the best non-personalized recommender system algorithms is the “Global Baseline Estimate.
The job here is to use the survey data collected and write the R code that makes a movie recommendation using the Global Baseline Estimate algorithm. Please see the attached spreadsheet for implementation details.
I will create a recommender system based on the given movie ratings data and Global Baseline Estimate algorithm. I will show what movie each critic would most likely enjoy out of the movies that they have not seen from the 6 movies in the data.
The global baseline estimate algorithm is GBE = Mean Movie Rating + SPecific movie rating relative to average + specific critic’s average relative to the mean movie rating.
library(janitor)
## Warning: package 'janitor' was built under R version 4.4.2
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Importing dataset
movieratings <- read.csv("https://raw.githubusercontent.com/Chung-Brandon/607/refs/heads/main/movieratings.csv")
# Removing empty rows
movieratings <- movieratings[c(1:16),]
# Cleaning column names to be undercase
movieratings <- movieratings %>%
clean_names()
# Viewing the movie rating matrix
head(movieratings)
## critic captain_america deadpool frozen jungle_book pitch_perfect2
## 1 Burton NA NA NA 4 NA
## 2 Charley 4 5 4 3 2
## 3 Dan NA 5 NA NA NA
## 4 Dieudonne 5 4 NA NA NA
## 5 Matt 4 NA 2 NA 2
## 6 Mauricio 4 NA 3 3 4
## star_wars_force
## 1 4
## 2 3
## 3 5
## 4 5
## 5 5
## 6 NA
# Tidying the data
movieratings.tidy <- movieratings %>%
pivot_longer(
cols = 2:7,
names_to = "movie",
values_to = "rating"
)
head(movieratings.tidy)
## # A tibble: 6 × 3
## critic movie rating
## <chr> <chr> <int>
## 1 Burton captain_america NA
## 2 Burton deadpool NA
## 3 Burton frozen NA
## 4 Burton jungle_book 4
## 5 Burton pitch_perfect2 NA
## 6 Burton star_wars_force 4
# creating df for critic average ratings
critic.avg <- movieratings.tidy %>%
group_by(critic) %>%
summarize(avg.rating = mean(rating, na.rm = TRUE))
head(critic.avg)
## # A tibble: 6 × 2
## critic avg.rating
## <chr> <dbl>
## 1 Burton 4
## 2 Charley 3.5
## 3 Dan 5
## 4 Dieudonne 4.67
## 5 Matt 3.25
## 6 Mauricio 3.5
# Calculating the mean_movie rating
mean_movie_rating <- mean(movieratings.tidy$rating, na.rm = TRUE)
mean_movie_rating
## [1] 3.934426
# Mutate critic rating relative to mean_movie_rating into critic.avg
critic.avg <- critic.avg %>%
mutate(critic.mean_relativerating = avg.rating - mean_movie_rating)
# Calculate movie rating relative to mean_movie_rating
movie.avg <- movieratings.tidy %>%
group_by(movie) %>%
summarise(avg.rating = mean(rating, na.rm = TRUE))
movie.avg <- movie.avg %>%
mutate(movie.mean_relativerating = avg.rating - mean_movie_rating)
# Calculating global baseline estimate for each movie
# Join the datasets together for relevent values for GBE calculation
combined <- movieratings.tidy %>%
left_join(critic.avg, by = "critic") %>%
left_join(movie.avg, by = "movie")
# Drop unnecessary columns
combined <- combined[,-c(4,6)]
# Mutate in GBE column
combined <- combined %>%
mutate(gbe = mean_movie_rating + critic.mean_relativerating + movie.mean_relativerating)
head(combined)
## # A tibble: 6 × 6
## critic movie rating critic.mean_relative…¹ movie.mean_relativer…² gbe
## <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 Burton captain_ame… NA 0.0656 0.338 4.34
## 2 Burton deadpool NA 0.0656 0.510 4.51
## 3 Burton frozen NA 0.0656 -0.207 3.79
## 4 Burton jungle_book 4 0.0656 -0.0344 3.97
## 5 Burton pitch_perfe… NA 0.0656 -1.22 2.78
## 6 Burton star_wars_f… 4 0.0656 0.219 4.22
## # ℹ abbreviated names: ¹critic.mean_relativerating, ²movie.mean_relativerating
# Creating a recommendations list where rating = NA, and arranging by GBE
recommendations <- combined %>%
filter(is.na(rating)) %>%
group_by(critic) %>%
arrange(desc(gbe)) %>%
select(critic, movie, gbe)
top.recommendation <- combined %>%
filter(is.na(rating)) %>%
group_by(critic) %>%
slice_max(gbe, n = 1) %>%
select(critic, movie, gbe)
head(top.recommendation)
## # A tibble: 6 × 3
## # Groups: critic [6]
## critic movie gbe
## <chr> <chr> <dbl>
## 1 Burton deadpool 4.51
## 2 Dan captain_america 5.34
## 3 Dieudonne jungle_book 4.63
## 4 Matt deadpool 3.76
## 5 Mauricio deadpool 4.01
## 6 Nathan deadpool 4.51
top.recommendation %>%
group_by(movie) %>%
count() %>%
arrange(desc(n))
## # A tibble: 5 × 2
## # Groups: movie [5]
## movie n
## <chr> <int>
## 1 deadpool 7
## 2 jungle_book 2
## 3 captain_america 1
## 4 pitch_perfect2 1
## 5 star_wars_force 1
In conclusion, using the Global Baseline Estimate recommendation system I have found the top recommended movies for each critic, if they have not rated any of the 6 movies listed, and displayed them in the data set top.recommendation. Deadpool rated the highest recommended movie for 7 out of 12 critics, and jungle_book came in second being top rated for 2 out of 12 critics.