Introduction

In this assignment a Global Baseline Estimate recommender was used to predict user ratings for movies. The data used in this project was imported from my Github Repository. We calculated the global mean, movie and user biases, and used them to estimate missing ratings for some movies.

library(readr)
library(readxl)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(stringr)

Importing the data

The data that we used was collected and used in the past in another assignment. But in this case it was needed to create a single dataframe in MySQL with the same information. I also made sure to use the right separators so it could be read efficiently in R. After that, we uploaded the dataframe so it can be reproducible. That data was upload in R as shown below.

movie_ratings <- read.csv("https://raw.githubusercontent.com/arutam-antunish/DATA607/refs/heads/main/movie_ratings_long.csv")
View(movie_ratings)
movie_ratings[movie_ratings == "NULL"] <- NA
View(movie_ratings)

Here I calculated the global mean for the ratings of all the movies.

movie_ratings$rating <- as.numeric(movie_ratings$rating)
mu <- mean(movie_ratings$rating, na.rm = TRUE)

After that, I got the movie bias by subtracting the global mean from the mean of each movie.

movie_bias <- movie_ratings %>%
  group_by(movie_id) %>%
  summarize(bi = mean(rating, na.rm = TRUE) - mu)

Here I made the same exercise for the user bias.

user_bias <- movie_ratings %>%
  group_by(person_id) %>%
  summarize(bu = mean(rating, na.rm = TRUE) - mu)

Then, I created another data frame where I added the movie and user bias. Finally, I calculated the global base estimate adding the global mean plus the movie and user bias. With this function we found a prediction for the movies that did not have a rating.

ratings_gbe <- movie_ratings %>%
  left_join(movie_bias, by = "movie_id") %>%
  left_join(user_bias, by = "person_id") 

ratings_gbe <- ratings_gbe %>% mutate(gbe = mu + bi + bu)

DATA607_Assignment3A

Arutam Antunish

2025-09-13

Introduction

Importing the data

Conclusion