Introduction

This assignment needs to implement global baseline recommendation system upon six movies. Global_mean, user_bias and movie_bias are needed to be calculated first, and then I set a pred_rating() to calculate missing values and use for loops to fill in the blanks.

recom <- read.csv("C:/data/Global Baseline.csv")

global_mean <- round(mean(as.matrix(recom[,-1]), na.rm = TRUE),2)

user_bias <- round(rowMeans(recom[,-1], na.rm = TRUE) - global_mean, 2)

names(user_bias) <- recom[,1]

movie_bias <- round(colMeans(recom[,-1], na.rm = TRUE) - global_mean, 2)

names(movie_bias) <- colnames(recom[,-1])

pred_rating <- function(user, movie) {
  round(global_mean + user_bias[user] + movie_bias[movie],2)
}
na_cells <- which(is.na(recom[,-1]), arr.ind = TRUE)

for (i in 1:nrow(na_cells)) {
  row <- na_cells[i, 'row']
  col <- na_cells[i, 'col']
  critic <- recom[row,1]
  movie <- colnames(recom[,-1])[col]
  predictions <- pred_rating(critic, movie)
  recom[row, col+1] <- predictions
}

Plot

I created a new data frame to make a bar chart to show the avg_ratings among six movies. It’s easy to compare each.

avg_rating <- colMeans(recom[,-1])
avg_df <- data.frame(
  movie = names(avg_rating),
  avg_rating = avg_rating
)
ggplot(data = avg_df, aes(x= movie, y = avg_rating)) + geom_col(width = 0.4)

Conclusion

According to the global baseline system, Pitch Perfect 2 has lowest average rating, while Captain America and Deadpool have similar and highest average ratings. I think global baseline method is quite balanced on both user_bias and movie_bias.