Data607 - Global Baseline Estimates

General Overview

This assignment for extra credit will utilize the data from Assignment 2 from my GitHub page.

This assignment will also be available on my GitHub Repo.

The purpose of this assignment is to utilize Global Baseline estimates to predict what critics would rate films that they did not see. Similar to what I mentioned on my previous assignment, there will be a scale of 1 to 5, where 1 is the lowest and 5 is the highest. If and only if there are critics that did not see such movie, the data will output as NA. The data set utilizes a random algorithm, which has each of the random 20 names generated to “rate” the 20 videos picked from the dataset taken from MovieLens and output a ratings file. The file was then combined into movieratings.csv for this assignment for simplification purposes.

Movie Ratings File Extraction

We will first extract the data from the movie_ratings.csv file to gather the required variables from Assignment 2 utilizing the following code block:

# Initialize directory
a <- getwd()
setwd(a)

movieRatings <- read.csv("movie_ratings.csv")

Initializing Global Mean

Here we have the globalMean equate to 2.965.

globalMean <- mean(movieRatings$rating, na.rm = TRUE)
print(globalMean)

## [1] 2.965

Critics vs Global Mean

The following table uses the following values to show the average ratings of each user, and the User Average subtracted by the globalMean. The globalMeanUserAvg subtracts userAvg with globalMean. This will be used further in the assignment. The number of digits was condensed into a format of #.## to make the tables a bit smaller.

averageUserRating <- movieRatings %>%
  group_by(userName) %>%
  summarise(userAvg = mean(rating, na.rm = TRUE)) %>%
  mutate(globalMeanUserAvg = userAvg - globalMean)

averageUserRating %>%
  kable(col.names = c("Critic", "Average Rating", "User Average - Mean Movie"), digits = 2, align = 'lcc') %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed"))

Critic	Average Rating	User Average - Mean Movie
Alice	2.95	-0.01
Bob	2.35	-0.61
Charlie	3.05	0.08
David	3.40	0.44
Eve	2.75	-0.21
Frank	2.55	-0.42
Grace	3.45	0.49
Hannah	2.75	-0.21
Ivy	3.05	0.08
Jack	2.60	-0.36
Karen	3.15	0.19
Leo	2.70	-0.26
Mona	2.85	-0.11
Nathan	2.35	-0.61
Olivia	3.65	0.69
Paul	3.40	0.44
Quincy	2.90	-0.06
Rachel	2.95	-0.01
Sam	3.20	0.24
Tina	3.25	0.29

Movies vs Global Mean

The following table uses the following values to show the average ratings of each movie, and the Movie Average subtracted by the globalMean. The globalMeanMovieAvg subtracts MovieAvg with globalMean. This will be used further in the assignment. The number of digits was condensed into a format of #.## to make the tables a bit smaller.

averageUserRating <- movieRatings %>%
  group_by(movieName) %>%
  summarise(movieAvg = mean(rating, na.rm = TRUE)) %>%
  mutate(globalMeanMovieAvg = movieAvg - globalMean)

averageUserRating %>%
  kable(col.names = c("Movie", "Average Rating", "Movie Average - Mean Movie"), digits = 2, align = 'lcc') %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed"))

Movie	Average Rating	Movie Average - Mean Movie
Back to the Future	3.00	0.04
Fight Club	3.05	0.08
Forrest Gump	3.15	0.19
Gladiator	3.15	0.19
Goodfellas	3.00	0.04
Inception	3.30	0.33
Interstellar	2.75	-0.21
Jurassic Park	2.75	-0.21
Pulp Fiction	3.00	0.04
Saving Private Ryan	3.10	0.14
Schindler’s List	3.20	0.24
Se7en	2.85	-0.11
The Dark Knight	2.80	-0.17
The Empire Strikes Back	3.30	0.33
The Godfather	2.85	-0.11
The Lord of the Rings: The Return of the King	2.70	-0.26
The Matrix	2.25	-0.71
The Shawshank Redemption	2.80	-0.17
The Silence of the Lambs	3.05	0.08
The Usual Suspects	3.25	0.29

Generating the Tables

This code block will prepare the table to for the global base line estimate. Since the movie title numbers is greater than 10, I decided to split it apart to make it better visually. I also included a section to calculate the Movie Average on the last row to get the average of each movie in each column. The last column for the last row will be NA. Another line named Movie Avg - Mean Movie is going to output Movie Average subtracted by globalMean. After splitting the two tables, I will utilize the knitr and kableExtra packages to output the tables. The wideRatings will be utilized to calculate the mean rating for each critic, and will tkae the difference from globalMean. The columns that are for userAvg and globalMeanUserAvg will be renamed to Average Rating and User Average - Mean Movie respectively (combines the previous tables in a condensed format).

wideRatings <- movieRatings %>%
  select(userName, movieName, rating) %>%
  pivot_wider(names_from = movieName, values_from = rating)

globalMean <- mean(movieRatings$rating, na.rm = TRUE)

wideRatings <- wideRatings %>%
  rowwise() %>%
  mutate(userAvg = mean(c_across(where(is.numeric)), na.rm = TRUE),
         globalMeanUserAvg = userAvg - globalMean)

wideRatings <- wideRatings %>%
  rename(Critic = userName, 
         `Average Rating` = userAvg, 
         `User Average - Mean Movie` = globalMeanUserAvg)

moviecolumns <- colnames(wideRatings)[2:(ncol(wideRatings) - 2)]
firsttenMovies <- moviecolumns[1:10]
lasttenMovies  <- moviecolumns[11:20]

firstMovieset <- wideRatings %>%
  select(Critic, all_of(firsttenMovies), `Average Rating`, `User Average - Mean Movie`)

secondMovieset <- wideRatings %>%
  select(Critic, all_of(lasttenMovies), `Average Rating`, `User Average - Mean Movie`)

firstMovieSetAvg <- movieRatings %>%
  filter(movieName %in% firsttenMovies) %>%
  group_by(movieName) %>%
  summarise(movie_avg = mean(rating, na.rm = TRUE)) %>%
  pivot_wider(names_from = movieName, values_from = movie_avg) %>%
  mutate(Critic = "Movie Average", `Average Rating` = globalMean, `User Average - Mean Movie` = NA)

secondMovieSetAvg <- movieRatings %>%
  filter(movieName %in% lasttenMovies) %>%
  group_by(movieName) %>%
  summarise(movie_avg = mean(rating, na.rm = TRUE)) %>%
  pivot_wider(names_from = movieName, values_from = movie_avg) %>%
  mutate(Critic = "Movie Average", `Average Rating` = globalMean, `User Average - Mean Movie` = NA)

firstMoviesetGAvg <- firstMovieSetAvg %>%
  mutate(across(all_of(firsttenMovies), ~ . - globalMean)) %>%
  mutate(Critic = "Movie Avg - Mean Movie", `Average Rating` = NA, `User Average - Mean Movie` = NA)

secondMoviesetGAvg <- secondMovieSetAvg %>%
  mutate(across(all_of(lasttenMovies), ~ . - globalMean)) %>%
  mutate(Critic = "Movie Avg - Mean Movie", `Average Rating` = NA, `User Average - Mean Movie` = NA)

firstMovieset <- bind_rows(firstMovieset, firstMovieSetAvg, firstMoviesetGAvg)
secondMovieset <- bind_rows(secondMovieset, secondMovieSetAvg, secondMoviesetGAvg)

First 10 Movies

This table outputs the first 10 movies with the Global Base Estimate:

firstMovieset %>%
  kable(digits = 2, align = 'lcccccccccccc') %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed"))

Critic	The Shawshank Redemption	The Godfather	The Dark Knight	Pulp Fiction	Schindler’s List	Forrest Gump	Inception	Fight Club	The Matrix	Goodfellas	Average Rating	User Average - Mean Movie
Alice	3.00	2.00	5.00	5.00	4.00	5.00	1.00	2.00	3.00	5.00	2.95	-0.01
Bob	3.00	1.00	5.00	5.00	2.00	3.00	2.00	1.00	3.00	2.00	2.35	-0.61
Charlie	2.00	3.00	4.00	3.00	2.00	1.00	3.00	2.00	1.00	3.00	3.05	0.08
David	2.00	4.00	5.00	1.00	4.00	2.00	4.00	5.00	4.00	5.00	3.40	0.44
Eve	3.00	1.00	2.00	4.00	4.00	1.00	3.00	5.00	2.00	4.00	2.75	-0.21
Frank	5.00	3.00	1.00	1.00	1.00	2.00	1.00	1.00	1.00	5.00	2.55	-0.42
Grace	4.00	5.00	1.00	1.00	3.00	5.00	5.00	2.00	2.00	2.00	3.45	0.49
Hannah	1.00	4.00	3.00	3.00	3.00	3.00	5.00	5.00	4.00	1.00	2.75	-0.21
Ivy	2.00	2.00	1.00	4.00	1.00	4.00	2.00	4.00	5.00	4.00	3.05	0.08
Jack	3.00	5.00	5.00	1.00	3.00	4.00	3.00	2.00	1.00	2.00	2.60	-0.36
Karen	5.00	1.00	1.00	3.00	5.00	1.00	5.00	2.00	1.00	5.00	3.15	0.19
Leo	3.00	1.00	2.00	5.00	2.00	4.00	1.00	3.00	1.00	1.00	2.70	-0.26
Mona	3.00	2.00	4.00	3.00	3.00	1.00	4.00	1.00	5.00	5.00	2.85	-0.11
Nathan	1.00	3.00	4.00	2.00	2.00	3.00	2.00	1.00	3.00	1.00	2.35	-0.61
Olivia	4.00	4.00	3.00	5.00	5.00	4.00	4.00	5.00	1.00	4.00	3.65	0.69
Paul	1.00	5.00	1.00	5.00	5.00	3.00	5.00	5.00	2.00	3.00	3.40	0.44
Quincy	1.00	5.00	2.00	3.00	3.00	5.00	5.00	3.00	2.00	1.00	2.90	-0.06
Rachel	5.00	3.00	1.00	2.00	4.00	4.00	5.00	2.00	1.00	1.00	2.95	-0.01
Sam	3.00	1.00	2.00	2.00	4.00	4.00	5.00	5.00	2.00	2.00	3.20	0.24
Tina	2.00	2.00	4.00	2.00	4.00	4.00	1.00	5.00	1.00	4.00	3.25	0.29
Movie Average	2.80	2.85	2.80	3.00	3.20	3.15	3.30	3.05	2.25	3.00	2.96	NA
Movie Avg - Mean Movie	-0.17	-0.11	-0.17	0.04	0.24	0.19	0.33	0.08	-0.71	0.04	NA	NA

Last 10 Movies

This will output the last 10 movies with the Global Base Estimate:

secondMovieset %>%
  kable(digits = 2, align = 'lcccccccccccc') %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed"))

Critic	The Empire Strikes Back	Interstellar	Gladiator	The Lord of the Rings: The Return of the King	Back to the Future	Saving Private Ryan	The Silence of the Lambs	Se7en	The Usual Suspects	Jurassic Park	Average Rating	User Average - Mean Movie
Alice	2.00	5.00	3.00	3.00	2.00	3.00	1.00	2.00	1.00	2.00	2.95	-0.01
Bob	1.00	1.00	4.00	5.00	2.00	2.00	1.00	2.00	1.00	1.00	2.35	-0.61
Charlie	5.00	1.00	3.00	5.00	5.00	5.00	3.00	2.00	4.00	4.00	3.05	0.08
David	5.00	2.00	3.00	3.00	4.00	4.00	4.00	1.00	4.00	2.00	3.40	0.44
Eve	1.00	1.00	3.00	1.00	4.00	2.00	4.00	3.00	5.00	2.00	2.75	-0.21
Frank	1.00	2.00	5.00	3.00	5.00	1.00	5.00	4.00	1.00	3.00	2.55	-0.42
Grace	5.00	5.00	3.00	4.00	4.00	4.00	2.00	3.00	4.00	5.00	3.45	0.49
Hannah	5.00	1.00	2.00	2.00	1.00	1.00	2.00	3.00	5.00	1.00	2.75	-0.21
Ivy	2.00	3.00	3.00	5.00	2.00	4.00	4.00	3.00	3.00	3.00	3.05	0.08
Jack	2.00	2.00	1.00	1.00	1.00	5.00	2.00	4.00	2.00	3.00	2.60	-0.36
Karen	5.00	2.00	5.00	4.00	2.00	4.00	1.00	4.00	5.00	2.00	3.15	0.19
Leo	1.00	5.00	1.00	3.00	3.00	2.00	4.00	3.00	4.00	5.00	2.70	-0.26
Mona	1.00	4.00	4.00	1.00	2.00	2.00	3.00	4.00	4.00	1.00	2.85	-0.11
Nathan	5.00	1.00	2.00	2.00	3.00	4.00	1.00	2.00	1.00	4.00	2.35	-0.61
Olivia	5.00	5.00	2.00	3.00	1.00	3.00	5.00	3.00	3.00	4.00	3.65	0.69
Paul	4.00	2.00	4.00	3.00	4.00	5.00	3.00	4.00	3.00	1.00	3.40	0.44
Quincy	5.00	5.00	1.00	1.00	2.00	2.00	4.00	2.00	3.00	3.00	2.90	-0.06
Rachel	5.00	2.00	4.00	2.00	3.00	3.00	2.00	1.00	5.00	4.00	2.95	-0.01
Sam	4.00	2.00	5.00	1.00	5.00	3.00	5.00	3.00	2.00	4.00	3.20	0.24
Tina	2.00	4.00	5.00	2.00	5.00	3.00	5.00	4.00	5.00	1.00	3.25	0.29
Movie Average	3.30	2.75	3.15	2.70	3.00	3.10	3.05	2.85	3.25	2.75	2.96	NA
Movie Avg - Mean Movie	0.33	-0.21	0.19	-0.26	0.04	0.14	0.08	-0.11	0.29	-0.21	NA	NA