This assignment needs to implement global baseline recommendation system upon six movies. Global_mean, user_bias and movie_bias are needed to be calculated first, and then I set a pred_rating() to calculate missing values and use for loops to fill in the blanks.
recom <- read.csv("C:/data/Global Baseline.csv")
global_mean <- round(mean(as.matrix(recom[,-1]), na.rm = TRUE),2)
user_bias <- round(rowMeans(recom[,-1], na.rm = TRUE) - global_mean, 2)
names(user_bias) <- recom[,1]
movie_bias <- round(colMeans(recom[,-1], na.rm = TRUE) - global_mean, 2)
names(movie_bias) <- colnames(recom[,-1])
pred_rating <- function(user, movie) {
round(global_mean + user_bias[user] + movie_bias[movie],2)
}
na_cells <- which(is.na(recom[,-1]), arr.ind = TRUE)
for (i in 1:nrow(na_cells)) {
row <- na_cells[i, 'row']
col <- na_cells[i, 'col']
critic <- recom[row,1]
movie <- colnames(recom[,-1])[col]
predictions <- pred_rating(critic, movie)
recom[row, col+1] <- predictions
}
I created a new data frame to make a bar chart to show the avg_ratings among six movies. It’s easy to compare each.
avg_rating <- colMeans(recom[,-1])
avg_df <- data.frame(
movie = names(avg_rating),
avg_rating = avg_rating
)
ggplot(data = avg_df, aes(x= movie, y = avg_rating)) + geom_col(width = 0.4)
According to the global baseline system, Pitch Perfect 2 has lowest average rating, while Captain America and Deadpool have similar and highest average ratings. I think global baseline method is quite balanced on both user_bias and movie_bias.