library("recommenderlab")
library("ggplot2")
I will begin by loading the movies lense data-set from the recommender lab into the environment. I will also subset the data for movies have been rated by more than 50 user and films rated more than 100 times.
data("MovieLense")
ratings_movies = MovieLense[rowCounts(MovieLense)>50,colCounts(MovieLense)>100]
The movie rating distribution is right-skewed with the median of
quantile(colMeans(ratings_movies, na.rm = T), .5)
## 50%
## 3.684294
movie_mean_rating = colMeans(ratings_movies, na.rm = T)
qplot(movie_mean_rating) + ggtitle("Distribution of movie Ratings")
rm(movie_mean_rating)
The user rating distribution seems to somewhat normal with the median of
quantile(rowMeans(ratings_movies, na.rm = T), .5)
## 50%
## 3.740624
user_mean_rating = rowMeans(ratings_movies, na.rm = T)
qplot(user_mean_rating) + ggtitle("Distribution of User Ratings")
rm(user_mean_rating)
A heat map of the first 100 users and movies reveals sparsity, but also some very active users.
image(ratings_movies[1:100,1:100], main="Heatmap First 100 Users & movies")
To create a train and test set an evaluation scheme is needed. The function takes the rating data. The data will be split 80% train and 20% test.
set.seed(143)
eval_sets = evaluationScheme(data =ratings_movies,method ="cross-validation",train=.8,given=15,goodRating=3,k=5)
rm(MovieLense,MovieLenseMeta,ratings_movies)
The first model is an Item Base collaborative filtering model. The data will be normalized using centering. The distance measurement used is cosine. The models will be evaluated using the unknown data set and stored in a data frame for comparison to other models. Finally, the top predictions for the model will be printed.
IBCF_cosine_model = Recommender(data=getData(eval_sets,"train"),method ="IBCF",parameter=list( normalize = "center", method="cosine"))
pred_IBCF_cos = predict(object =IBCF_cosine_model,newdata=getData(eval_sets,"known"),n=10,type="ratings")
model_stats =data.frame(IBCF_cosine_model=calcPredictionAccuracy(x=pred_IBCF_cos,data=getData(eval_sets,"unknown")))
pred_IBCF_cos_top = predict(object =IBCF_cosine_model,newdata=getData(eval_sets,"known"),n=10,type="topNList")
pred_IBCF_cos_top@itemLabels[pred_IBCF_cos_top@items[[1]]]
## [1] "Toy Story (1995)" "GoldenEye (1995)"
## [3] "Twelve Monkeys (1995)" "Babe (1995)"
## [5] "Dead Man Walking (1995)" "Seven (Se7en) (1995)"
## [7] "Usual Suspects, The (1995)" "Taxi Driver (1976)"
## [9] "Rumble in the Bronx (1995)" "Birdcage, The (1996)"
rm(IBCF_cosine_model,pred_IBCF_cos,pred_IBCF_cos_top)
IBCF model 2 uses Pearson distance measurement otherwise it is similar to IBCF model one.
IBCF_pearson_model = Recommender(data=getData(eval_sets,"train"),method ="IBCF",parameter=list( normalize = "center", method="pearson"))
pred_IBCF_pear = predict(object =IBCF_pearson_model,newdata=getData(eval_sets,"known"),n=10,type="ratings")
model_stats$IBCF_pearson_model= cbind( calcPredictionAccuracy(x=pred_IBCF_pear,data=getData(eval_sets,"unknown")))
pred_IBCF_pear_top = predict(object =IBCF_pearson_model,newdata=getData(eval_sets,"known"),n=10,type="topNList")
pred_IBCF_pear_top@itemLabels[pred_IBCF_pear_top@items[[1]]]
## [1] "GoldenEye (1995)" "Get Shorty (1995)"
## [3] "Babe (1995)" "Dead Man Walking (1995)"
## [5] "Seven (Se7en) (1995)" "Usual Suspects, The (1995)"
## [7] "Taxi Driver (1976)" "Rumble in the Bronx (1995)"
## [9] "Apollo 13 (1995)" "Batman Forever (1995)"
rm(IBCF_pearson_model,pred_IBCF_pear,pred_IBCF_pear_top)
The user base collaborative filtering model will use the same methodology implemented earlier. The model will be normalization using centering and the distance measurement used is cosine.
UBCF_cosine_model = Recommender(data=getData(eval_sets,"train"),method ="UBCF",parameter=list( normalize = "center", method="cosine"))
pred_UBCF_cos = predict(object =UBCF_cosine_model,newdata=getData(eval_sets,"known"),n=10,type="ratings")
model_stats$UBCF_cosine_model= cbind( calcPredictionAccuracy(x=pred_UBCF_cos,data=getData(eval_sets,"unknown")))
pred_UBCF_cos_top = predict(object =UBCF_cosine_model,newdata=getData(eval_sets,"known"),n=10,type="topNList")
pred_UBCF_cos_top@itemLabels[pred_UBCF_cos_top@items[[1]]]
## [1] "Silence of the Lambs, The (1991)" "Godfather, The (1972)"
## [3] "Fargo (1996)" "Shawshank Redemption, The (1994)"
## [5] "Wizard of Oz, The (1939)" "Twelve Monkeys (1995)"
## [7] "Usual Suspects, The (1995)" "Fugitive, The (1993)"
## [9] "Toy Story (1995)" "Star Wars (1977)"
rm(UBCF_cosine_model,pred_UBCF_cos,pred_UBCF_cos_top)
UBCF model 2 will use Z-score normalization technique and a distance measurement of Pearson.
UBCF_pearson_modelz = Recommender(data=getData(eval_sets,"train"),method ="UBCF",parameter=list( normalize = "Z-score", method="pearson"))
pred_UBCF_pear = predict(object =UBCF_pearson_modelz,newdata=getData(eval_sets,"known"),n=10,type="ratings")
model_stats$UBCF_pearson_modelz= cbind( calcPredictionAccuracy(x=pred_UBCF_pear,data=getData(eval_sets,"unknown")))
pred_UBCF_pear_top = predict(object =UBCF_pearson_modelz,newdata=getData(eval_sets,"known"),n=10,type="topNList")
pred_UBCF_pear_top@itemLabels[pred_UBCF_pear_top@items[[1]]]
## [1] "Silence of the Lambs, The (1991)" "Titanic (1997)"
## [3] "Contact (1997)" "Fargo (1996)"
## [5] "Princess Bride, The (1987)" "Fugitive, The (1993)"
## [7] "Raiders of the Lost Ark (1981)" "Godfather, The (1972)"
## [9] "Shawshank Redemption, The (1994)" "Usual Suspects, The (1995)"
rm(UBCF_pearson_modelz,pred_UBCF_pear,pred_UBCF_pear_top)
After evaluating all four models, the user based collaborative filtering models performed better than IBCF using RMSE. In my next build, I would like to search techniques that allow me to evaluate multiple models using ROC/AUC, and implementations of hybrid methods for collaborative filtering.
knitr::kable(model_stats,caption="Model Statistics")
IBCF_cosine_model | IBCF_pearson_model | UBCF_cosine_model | UBCF_pearson_modelz | |
---|---|---|---|---|
RMSE | 1.0874451 | 1.2054091 | 0.9639279 | 0.9866335 |
MSE | 1.1825369 | 1.4530111 | 0.9291571 | 0.9734457 |
MAE | 0.7935224 | 0.8880822 | 0.7572562 | 0.7812307 |
rm(model_stats)
I will start by creating the models I would like to evaluate. Seven models will be evaluated and plotted. After model evaluation metrics will be plotted and evaluated.
evaluation_models = list(
IBCF_cos = list(name = "IBCF", param = list(method = "cosine")),
IBCF_pea = list(name = "IBCF", param = list(method = "pearson")),
IBCF_cos_ctr = list(name = "IBCF", param = list(normalize = "center",method = "cosine")),
UBCF_cos = list(name = "UBCF", param = list(method = "cosine")),
UBCF_pea = list(name = "UBCF", param = list(method = "pearson")),
UBCF_pea_Z = list(name = "UBCF", param = list(normalize = "Z-score",method = "pearson")),
random = list(name = "RANDOM", param=NULL)
)
n_recommendations = c(1, 3, 5, 10, 15, 25, 40,50,60,70,80,90,100)
results = evaluate(x = eval_sets,
method = evaluation_models,
n = n_recommendations)
## IBCF run fold/sample [model time/prediction time]
## 1 [0.4sec/0.04sec]
## 2 [0.41sec/0.04sec]
## 3 [0.43sec/0.04sec]
## 4 [0.43sec/0.03sec]
## 5 [0.41sec/0.05sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.56sec/0.04sec]
## 2 [0.47sec/0.04sec]
## 3 [0.5sec/0.05sec]
## 4 [0.45sec/0.05sec]
## 5 [0.43sec/0.08sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.46sec/0.03sec]
## 2 [0.43sec/0.05sec]
## 3 [0.41sec/0.05sec]
## 4 [0.31sec/0.14sec]
## 5 [0.41sec/0.04sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.02sec/0.19sec]
## 2 [0sec/0.2sec]
## 3 [0sec/0.23sec]
## 4 [0sec/0.23sec]
## 5 [0sec/0.31sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.17sec]
## 2 [0sec/0.19sec]
## 3 [0sec/0.28sec]
## 4 [0sec/0.28sec]
## 5 [0sec/0.17sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.02sec/0.2sec]
## 2 [0.01sec/0.3sec]
## 3 [0sec/0.3sec]
## 4 [0.02sec/0.17sec]
## 5 [0.01sec/0.19sec]
## RANDOM run fold/sample [model time/prediction time]
## 1 [0sec/0.17sec]
## 2 [0sec/0.05sec]
## 3 [0sec/0.06sec]
## 4 [0.02sec/0.05sec]
## 5 [0sec/0.04sec]
After running the recommendation models, it is possible to extract performance measures which I have named eval_metrics. I will create more measures from the underlying metric data. These measures are Accuracy, sensitivity, and F1Score. The F1 score is a measure of a test’s accuracy by considering precision and recall. A zero F score indicates bad precision and recall.
\[Accuracy=\frac{TP+TN}{TP+FP+TN+FN}\]
eval_metrics = as.data.frame(getConfusionMatrix(results[["UBCF_cos"]])[[1]][,1:8])
eval_metrics$Accuracy = (eval_metrics$TP+eval_metrics$TN)/(eval_metrics$TP+eval_metrics$FP+eval_metrics$TN+eval_metrics$FN)
\[Sensitivity=\frac{TP}{TP+FN}\]
eval_metrics$Sensitivity = eval_metrics$TP/(eval_metrics$TP+eval_metrics$FN)
\[F1 Score=\frac{2*Precision*Sensitivity}{Precision+Sensitivity}\]
eval_metrics$F1Score = (2*eval_metrics$precision*eval_metrics$Sensitivity)/(eval_metrics$precision+eval_metrics$Sensitivity)
knitr::kable(eval_metrics)
TP | FP | FN | TN | precision | recall | TPR | FPR | Accuracy | Sensitivity | F1Score | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.6696429 | 0.3303571 | 71.94643 | 244.0536 | 0.6696429 | 0.0113003 | 0.0113003 | 0.0012857 | 0.7719975 | 0.0092217 | 0.0181928 |
3 | 1.9107143 | 1.0892857 | 70.70536 | 243.2946 | 0.6369048 | 0.0312770 | 0.0312770 | 0.0042346 | 0.7735185 | 0.0263126 | 0.0505373 |
5 | 2.8660714 | 2.1339286 | 69.75000 | 242.2500 | 0.5732143 | 0.0467346 | 0.0467346 | 0.0084264 | 0.7732368 | 0.0394688 | 0.0738525 |
10 | 5.4285714 | 4.5714286 | 67.18750 | 239.8125 | 0.5428571 | 0.0850203 | 0.0850203 | 0.0179160 | 0.7736311 | 0.0747572 | 0.1314168 |
15 | 7.4821429 | 7.5178571 | 65.13393 | 236.8661 | 0.4988095 | 0.1155771 | 0.1155771 | 0.0296455 | 0.7708146 | 0.1030370 | 0.1707938 |
25 | 11.1607143 | 13.8392857 | 61.45536 | 230.5446 | 0.4464286 | 0.1674012 | 0.1674012 | 0.0548011 | 0.7624775 | 0.1536948 | 0.2286655 |
40 | 15.7857143 | 24.2142857 | 56.83036 | 220.1696 | 0.3946429 | 0.2340940 | 0.2340940 | 0.0966304 | 0.7443387 | 0.2173860 | 0.2803457 |
50 | 18.7321429 | 31.2678571 | 53.88393 | 213.1161 | 0.3746429 | 0.2728021 | 0.2728021 | 0.1249642 | 0.7313824 | 0.2579614 | 0.3055414 |
60 | 21.5000000 | 38.5000000 | 51.11607 | 205.8839 | 0.3583333 | 0.3122786 | 0.3122786 | 0.1543660 | 0.7172995 | 0.2960777 | 0.3242443 |
70 | 24.2857143 | 45.7142857 | 48.33036 | 198.6696 | 0.3469388 | 0.3496372 | 0.3496372 | 0.1835282 | 0.7033292 | 0.3344399 | 0.3405747 |
80 | 26.7678571 | 53.2321429 | 45.84821 | 191.1518 | 0.3345982 | 0.3836178 | 0.3836178 | 0.2141774 | 0.6874437 | 0.3686217 | 0.3507869 |
90 | 28.9732143 | 61.0267857 | 43.64286 | 183.3571 | 0.3219246 | 0.4126958 | 0.4126958 | 0.2459086 | 0.6698119 | 0.3989918 | 0.3563389 |
100 | 31.0803571 | 68.9196429 | 41.53571 | 175.4643 | 0.3108036 | 0.4381393 | 0.4381393 | 0.2779551 | 0.6515604 | 0.4280093 | 0.3601097 |
plot(results, annotate=1, legend = "bottomright")
title("ROC curve")
plot(results, "prec/rec", annotate = 1, legend = "topright")
title("Precision Recall")
results_ratings = evaluate(x = eval_sets,
method = evaluation_models,
type="ratings")
## IBCF run fold/sample [model time/prediction time]
## 1 [0.4sec/0.02sec]
## 2 [0.4sec/0.01sec]
## 3 [0.39sec/0.02sec]
## 4 [0.39sec/0.02sec]
## 5 [0.39sec/0.02sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.43sec/0.02sec]
## 2 [0.42sec/0.01sec]
## 3 [0.43sec/0.01sec]
## 4 [0.42sec/0.01sec]
## 5 [0.53sec/0.02sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.38sec/0.02sec]
## 2 [0.41sec/0.01sec]
## 3 [0.41sec/0.01sec]
## 4 [0.42sec/0.01sec]
## 5 [0.41sec/0.02sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.01sec/0.14sec]
## 2 [0sec/0.26sec]
## 3 [0.02sec/0.14sec]
## 4 [0sec/0.15sec]
## 5 [0sec/0.15sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.27sec]
## 2 [0sec/0.15sec]
## 3 [0.02sec/0.16sec]
## 4 [0sec/0.16sec]
## 5 [0.02sec/0.15sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.02sec/0.28sec]
## 2 [0.02sec/0.26sec]
## 3 [0.02sec/0.14sec]
## 4 [0.02sec/0.14sec]
## 5 [0.01sec/0.15sec]
## RANDOM run fold/sample [model time/prediction time]
## 1 [0sec/0.02sec]
## 2 [0sec/0.13sec]
## 3 [0sec/0.01sec]
## 4 [0sec/0.01sec]
## 5 [0sec/0.02sec]
table = as.data.frame( t(sapply(avg(results_ratings), rbind)))
colnames(table) = c("RMSE", "MSE", "MAE")
knitr::kable(table)
RMSE | MSE | MAE | |
---|---|---|---|
IBCF_cos | 1.1021610 | 1.2149475 | 0.8137186 |
IBCF_pea | 1.2102673 | 1.4649911 | 0.8996828 |
IBCF_cos_ctr | 1.1021610 | 1.2149475 | 0.8137186 |
UBCF_cos | 0.9707491 | 0.9423672 | 0.7650323 |
UBCF_pea | 0.9930580 | 0.9861776 | 0.7863773 |
UBCF_pea_Z | 0.9931664 | 0.9863920 | 0.7865486 |
random | 1.3345242 | 1.7810893 | 1.0461225 |
Based on the prior exercises various Item and user recommendation models where built. The new approach allowed the evaluating of many models with a more simplified workflow.
The best model based on AUC was the UBCF with cosine similarity. The model also yielded the lowest RMSE of .97. To increase serendipity in the model evaluation, I would generate a random selection from one of the favorite user genre. Analyzing performance with offline data can be trivial, but if the system were online, we could undoubtedly separate a portion of the user base into an A/B testing and compare the results. Also, consideration of the system’s resources must be tested before implementation.