Intro

For this project we will build recommender systems using matrix factorization methods on collaborative filtering models. We will also experiment with adding diversity to recommendations.


Load libraries

library(dplyr)
library(ggplot2)
library(recommenderlab)
library(tidytext)
library(knitr)


Get data

A dataset of over 35,000 movie ratings from FilmTrust was acquired from https://www.librec.net/datasets.html and hosted on Github. We import the data frame and convert it into a User-Item matrix.

ratings_df <- read.delim('https://raw.githubusercontent.com/mehtablocker/cuny_612/master/data_files/filmtrust/ratings.txt', header=F, sep=" ")
names(ratings_df) <- c("user_id", "item_id", "rating")
ratings_df %>% head() %>% kable()
user_id item_id rating
1 1 2.0
1 2 4.0
1 3 3.5
1 4 3.0
1 5 4.0
1 6 3.5
ui_mat <- ratings_df %>% 
  arrange(user_id, item_id) %>% 
  cast_sparse(user_id, item_id, rating) %>% 
  as.matrix()
ui_mat[ui_mat==0] <- NA
ui_mat[1:5, 1:5] %>% kable()
1 2 3 4 5
2 4 3.5 3 4
NA NA NA NA NA
NA NA NA NA NA
3 NA 2.5 NA NA
NA NA NA NA NA


Split and normalize data

We first break the data into training and test sets, then normalize the training data by subtracting out the baseline of overall mean, user bias, and item bias.

### Create test and train indexes
pct_test <- 0.2
n_test <- round( pct_test * sum(!is.na(ui_mat)), 0)
na_ind <- which(is.na(ui_mat))
test_ind <- sample((1:length(ui_mat))[-na_ind], n_test, replace = F)
train_ind <- (1:length(ui_mat))[-c(na_ind, test_ind)]

### Break matrix into test and train
train_mat <- ui_mat
train_mat[test_ind] <- NA
test_mat <- ui_mat
test_mat[train_ind] <- NA

### Calculate baseline and normalize matrix
global_mean <- mean(train_mat, na.rm = T)
global_mean
## [1] 3.000687
u_bias_vec <- unname(rowMeans(train_mat, na.rm=T)) - global_mean
u_bias_mat <- matrix(u_bias_vec, nrow=nrow(train_mat), ncol=ncol(train_mat))
i_bias_vec <- unname(colMeans(train_mat, na.rm=T)) - global_mean
i_bias_mat <- matrix(i_bias_vec, nrow=nrow(train_mat), ncol=ncol(train_mat), byrow=T)
baseline <- global_mean + u_bias_mat + i_bias_mat
train_mat_norm <- train_mat - baseline
train_mat_norm[is.na(train_mat_norm)] <- 0


Build and evaluate models

We compare four different models: one built using Singular Value Decomposition, one using normalization but no matrix factorization, one using just global means, and one using Stochastic Gradient Descent (Funk SVD). The comparisons are made by calculating RMSE on the Test data.

### Perform SVD
svd_list_train <- svd(train_mat_norm)
k <- 10
u_comp <- svd_list_train$u[, 1:k]
d_comp <- svd_list_train$d[1:k]
v_comp <- svd_list_train$v[, 1:k]
train_mat_norm_comp <- u_comp %*% diag(d_comp) %*% t(v_comp)
dimnames(train_mat_norm_comp) <- dimnames(train_mat_norm)
train_mat_pred <- train_mat_norm_comp + baseline
train_mat_pred[train_mat_pred<1] <- 1
train_mat_pred[train_mat_pred>5] <- 5

### Calculate RMSE on Test data
rmse <- function(predicted, observed, rmna=FALSE){
  sqrt(mean((predicted - observed)^2, na.rm=rmna))
}

rmse_test <- rmse(train_mat_pred, test_mat, rmna=T)
rmse_test
## [1] 0.8200985
### RMSE of simpler model (with normalization but without compression)
rmse_simpler <- rmse(train_mat_norm + baseline, test_mat, rmna=T)
rmse_simpler
## [1] 0.8329799
### RMSE of very simple model (prediction of global average every time)
prediction_mat_chance <- matrix(mean(train_mat, na.rm=T), nrow = nrow(train_mat), ncol = ncol(train_mat))
rmse_simplest <- rmse(prediction_mat_chance, test_mat, rmna=T)
rmse_simplest
## [1] 0.9168212
### Perform SGD (Funk SVD) and calculate RMSE on Test data
train_mat_norm <- train_mat - baseline   #can go back to keeping NA values
f_svd_list_train <- funkSVD(train_mat_norm, k = k)
train_mat_norm_comp <- f_svd_list_train$U %*% t(f_svd_list_train$V)
dimnames(train_mat_norm_comp) <- dimnames(train_mat_norm)
train_mat_pred_sgd <- train_mat_norm_comp + baseline
train_mat_pred_sgd[train_mat_pred_sgd<1] <- 1
train_mat_pred_sgd[train_mat_pred_sgd>5] <- 5
rmse_test_sgd <- rmse(train_mat_pred_sgd, test_mat, rmna=T)
rmse_test_sgd
## [1] 0.8233933

The two matrix factorization models performed best, so we continue with those.


Add diversity

Recommendations of items would normally be based on the predicted ratings, which are modelled on previous ratings. One potential drawback of this is a “feedback loop” in which users are never encouraged to branch out from items similar to past items. We can break this feedback loop and add diversity to the recommendations by taking some extremely dissimilar items and “bumping them up.” Here we test this by changing a small percentage of the lowest predicted ratings to average ratings and then re-calculating the RMSE on Test data.

pct_div <- 0.01

### Change SVD predictions
div_inds <- order(train_mat_pred) %>% head(pct_div * length(train_mat_pred))
train_mat_pred_div <- train_mat_pred
train_mat_pred_div[div_inds] <- global_mean

### Change SGD predictions
div_inds <- order(train_mat_pred_sgd) %>% head(pct_div * length(train_mat_pred_sgd))
train_mat_pred_sgd_div <- train_mat_pred_sgd
train_mat_pred_sgd_div[div_inds] <- global_mean

### Calculate RMSE
rmse_test_div <- rmse(train_mat_pred_div, test_mat, rmna=T)
rmse_test_div
## [1] 0.8312192
rmse_test_sgd_div <- rmse(train_mat_pred_sgd_div, test_mat, rmna=T)
rmse_test_sgd_div
## [1] 0.8341205

The RMSE is a little worse after adding diversity to the ratings, which is not surprising since we were purposely contradicting our optimized model for a few items.


Conclusion

Using the FilmTrust data of movie ratings, we split and normalized our dataset and then built several different ratings prediction models. After comparing each of their RMSE on the Test data, we then added diversity to the predicted ratings of the two advanced models and re-calculated the RMSE. Unsurprisingly, the RMSE was a little worse after adding diversity.

One thing that should be noted is the concept of selection bias. Our RMSE calculations were done on Test data for which users were not actually given diversified recommendations. If we had the ability to do online evaluation, our out-of-sample data would be a better representation of our methods, and we could more accurately test the consequences of the diversity addition to our recommendations.