Data 612 Project 4

Project 4 - Accuracy & Beyond

library(tidyverse)
library(recommenderlab)
library(knitr)

Data selection

MovieLense is a dataset that i havent used in the past and also is a reasonable size and an interesting dataset to work with specially for recommendation.

data("MovieLense")
ncol(MovieLense)

## [1] 1664

nrow(MovieLense)

## [1] 943

1664 Movie Names with 943 user ratings

Sample data

getRatingMatrix(MovieLense)[1:10, 1:30]

## 10 x 30 sparse Matrix of class "dgCMatrix"
##                                                               
## 1  5 3 4 3 3 5 4 1 5 3 2 5 5 5 5 5 3 4 5 4 1 4 4 3 4 3 2 4 1 3
## 2  4 . . . . . . . . 2 . . 4 4 . . . . 3 . . . . . 4 . . . . .
## 3  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 4  . . . . . . . . . . 4 . . . . . . . . . . . . . . . . . . .
## 5  4 3 . . . . . . . . . . . . . . 4 . . . 3 . . 4 3 . . . 4 .
## 6  4 . . . . . 2 4 4 . . 4 2 5 3 . . . 4 . 3 3 4 . . . . 2 . .
## 7  . . . 5 . . 5 5 5 4 3 5 . . . . . . . . . 5 3 . 3 . 4 5 3 .
## 8  . . . . . . 3 . . . 3 . . . . . . . . . . 5 . . . . . . . .
## 9  . . . . . 5 4 . . . . . . . . . . . . . . . . . . . . . . .
## 10 4 . . 4 . . 4 . 4 . 4 5 3 . . 4 . . . . . 5 5 . . . . . . .

Distribution of Ratings

Two distributions are plotted . One which takes the rowcount which gives the number of rating provided by a user and the column count which provides the number of rating for a movie. Lets evaluate how they stack up in a histogram chart. I will also be plotting the mean and the median on the histogram to highlight the skewness of the data if it exist.

hist(rowCounts(MovieLense), breaks = 40, main = "Distribution of Rating  per User")
abline(v=mean(rowCounts(MovieLense)),col="red")
abline(v=median(rowCounts(MovieLense)),col="blue")

summary(rowCounts(MovieLense))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    19.0    32.0    64.0   105.4   147.5   735.0

hist(colCounts(MovieLense), breaks = 40, main = "Distribution of Rating  per User")
abline(v=mean(colCounts(MovieLense)),col="red")
abline(v=median(colCounts(MovieLense)),col="blue")

#ggplot (data, aes (x = colname)) + geom_vline(xintercept=mean(data$colname), color="red")
summary(colCounts(MovieLense))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   27.00   59.73   80.00  583.00

Inference : both The distributions are right skewed where the mean is to the right of the median.

Now, i would like to filter the MovieLens dataset to a more meaningful and a smaller subset to work with where the users have atleast rated 50 movies and movies that have ratings by atleast 100 users. This gives a better set to work with

rdata <- MovieLense[rowCounts(MovieLense) >= 50, colCounts(MovieLense) >= 100]
hist(rowCounts(rdata), breaks = 40, main = "Distribution of Rating Count per User")

rdata

## 565 x 336 rating matrix of class 'realRatingMatrix' with 55832 ratings.

The resulting ratings matrix contains 55,832 ratings from 565 users and 336 movies; the matrix is almost 71% sparse. Now that the data has been prepped.

1. Recommender models

For our analysis of ratings accuracy, we develop recommender models based on the following algorithms:

RANDOM: randomly chosen items
SVD: singular value decomposition method
SVDF: singular value decomposition method with stochastic gradient descent optimization (Funk)
ALS: alternating least squares algorithm for latent factors.

For development of the recommender models, we use the following evaluation :

90/10 split of the ratings data into training and holdout / test samples.
For the holdout sample, the model is given 15 ratings (“given-15 protocol”) and then will predict the remaining ratings.
For predicted ratings, a rating of 3.5 or higher will be classified as a good rating.

evaldata <- evaluationScheme(rdata, method = "split", train = 0.9, given = 15, goodRating = 3.5)
evaldata

## Evaluation scheme with 15 items given
## Method: 'split' with 1 run(s).
## Training set proportion: 0.900
## Good ratings: >=3.500000
## Data set: 565 x 336 rating matrix of class 'realRatingMatrix' with 55832 ratings.

algorithms <- list(
    "Random" = list(name="RANDOM", param = NULL),
    "SVD" = list(name="SVD", param = NULL),
    "SVDF" = list(name="SVDF", param = NULL),
    "ALS" = list(name="ALS", param = NULL)
)

Deliverables

As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data.

1. Evaluation

To evaluate the accuracy of the prediction models, we will train the models using the training data and the use the test data to predict ratings using the given-15 protocol. Also we would measure the accuracy of the predicted vs the actual ratings using the following error metrics

RMSE - Root mean squared error.
MSE - Mean Squared Error
MAE - Mean average error

results <- evaluate(evaldata, algorithms, type = "ratings")

## RANDOM run fold/sample [model time/prediction time]
##   1  [0.01sec/0.03sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.06sec/0sec] 
## SVDF run fold/sample [model time/prediction time]
##   1  [90.84sec/12.77sec] 
## ALS run fold/sample [model time/prediction time]
##   1  [0sec/8.31sec]

plot(results)
title(main = "Model Error: Predicted vs. Actual Ratings")

As expected, it is apparent that the SVD, SVDF, and ALS methods all have higher accuracy than randomly selecting ratings.

2. Hybrid recommender model

Based on the algorithms implemented above, the recommender models could be used to generate top-N recommended movies for a given user based on the highest predicted ratings for movies that the user hasn’t rated yet. As we saw in a prior project, when we compared UBCF (user-based collaborative filtering), IBCF (item-based collaborative filtering), and POPULAR algorithms, the recommended movies may tend to be dominated by the most popular movies. This results from the strong correlation between ratings and popularity, i.e., highly rated movies tend to be the most popular movies, and hence are recommended more often regardless of model algorithm. In order to counteract this effect, we will add diversity to the recommended movies, by creating a hybrid model.

Let’s consider increasing the diversity of recommendations by adding an element of chance to the model. We do this by creating a hybrid model that builds on the most accurate algorithm above (SVDF) while adding diversity through the RANDOM algorithm. Specifically, we create a hybrid recommender based on a 50/50 weighting of the SVDF and RANDOM algorithms.

Step 1 :

recom_svdf <- Recommender(getData(evaldata, "train"), "SVDF")
recom_svdf

## Recommender of type 'SVDF' for 'realRatingMatrix' 
## learned using 508 users.

Step 2 :

recom_rand <- Recommender(getData(evaldata, "train"), "RANDOM")
recom_rand

## Recommender of type 'RANDOM' for 'realRatingMatrix' 
## learned using 508 users.

Step 3 :

recom_hyb <- HybridRecommender(
    recom_svdf,
    recom_rand,
    weights = c(0.5, 0.5)
)
recom_hyb

## Recommender of type 'HYBRID' for 'ratingMatrix' 
## learned using NA users.

3 Evaluation

For the hybrid model, we use the same evaluation scheme as before (90/10 training / test split with given-15 protocol) to predict ratings in the test set and then evaluate prediction accuracy.

pred_svdf <- predict(recom_svdf, getData(evaldata, "known"), type = "ratings")
pred_svdf

## 57 x 336 rating matrix of class 'realRatingMatrix' with 18297 ratings.

pred_rand <- predict(recom_rand, getData(evaldata, "known"), type = "ratings")
pred_rand

## 57 x 336 rating matrix of class 'realRatingMatrix' with 18297 ratings.

pred_hyb <- predict(recom_hyb, getData(evaldata, "known"), type = "ratings")
pred_hyb

## 57 x 336 rating matrix of class 'realRatingMatrix' with 18297 ratings.

error <- rbind(
    SVDF = calcPredictionAccuracy(pred_svdf, getData(evaldata, "unknown")),
    RANDOM = calcPredictionAccuracy(pred_rand, getData(evaldata, "unknown")),
    HYBRID = calcPredictionAccuracy(pred_hyb, getData(evaldata, "unknown"))
    )
error %>% round(3) %>% kable(caption = "Error Metrics for Hybrid Model of SVDF & RANDOM")

Error Metrics for Hybrid Model of SVDF & RANDOM
	RMSE	MSE	MAE
SVDF	0.932	0.869	0.731
RANDOM	1.312	1.720	1.034
HYBRID	1.043	1.087	0.824

Judging by the error metrics, the prediction accuracy of the hybrid recommender falls between the SVDF and RANDOM models, but closer to the SVDF model. Based on a 50/50 blend of the SVDF and RANDOM algorithms, the RMSE of the hybrid model weakened by 14% (from 0.888 to 1.014) compared to the SVDF model. However, at the cost of a slight deterioration in ratings accuracy, we would hope that movie viewers would benefit from greater diversity in movie recommendations, particularly for less popular movies.

4. Conclusion

Of the algorithms tested (RANDOM, SVD, SVDF, and ALS), SVDF had the highest prediction accuracy when measured in terms of error metrics such as RMSE, MSE, and MAE. However, it is important to keep in mind that we didn’t optimize calculation parameters for the various algorithms, and consequently, our conclusions rely on the default parameters.

It was interesting that ALS was only slightly less accurate than SVDF, but it was a much faster model (by orders of magnitude) to train given this dataset. For a business application in a production environment, it may be preferable to use a faster / computationally less expensive algorithm that has slightly worse accuracy, compared to the highest accuracy algorithm.

Properties of different recommendation algorithms can be blended using an ensemble approach. In this project, we created a hybrid recommender using a 50/50 blend of the SVDF and RANDOM algorithms, in order to add diversity to the model outputs (for instance, recommendations that include less popular movies).

Error metrics of the hybrid model fell between the error metrics of the SVDF and RANDOM models. Compared to the pure-play SVDF model, the RMSE of the hybrid model increased by 14% when adding diversity from the RANDOM model. As a practical matter, the RMSE of 0.9 from the SVDF model and 1.0 from the hybrid model are likely indistinguishable from a user perspective, in which case the benefit of greater diversity in recommendations outweighs the cost of slightly worse accuracy.