Project 4
1. As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data.
2. Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity.
3. Compare and report on any change in accuracy before and after you’ve made the change in #2.
4. As part of your textual conclusion, discuss one or more additional experiments that could be performed and/or metrics that could be evaluated only if online evaluation was possible. Also, briefly propose how you would design a reasonable online evaluation environment.
I will be considering my dataset to be movielense which comes preloaded with recommenderlab library.
Find users who rate more than 50 movies, and the movies that be rated more than 100 times.
## 358 x 591 rating matrix of class 'realRatingMatrix' with 60084 ratings.
Splitting the data into Training and Tesing sets, 80% to training and 20% to Test
For the test set, we want users has ratings so that we need to find what is the minimum number of movies that users rate and set the parameter is lower than it. Considering rating threshold to 3. Less than 3 is bad rating and more then 3 is good rating.
items_to_keep <- 50
rating_threshold <- 3
n_eval <- 1
evaluation_set <- evaluationScheme(data=mov_ratings,method = "split", train = percentage_training, given = items_to_keep, goodRating = rating_threshold, k = n_eval)
evaluation_set
## Evaluation scheme with 50 items given
## Method: 'split' with 1 run(s).
## Training set proportion: 0.800
## Good ratings: >=3.000000
## Data set: 358 x 591 rating matrix of class 'realRatingMatrix' with 60084 ratings.
Constructing training set.
## 286 x 591 rating matrix of class 'realRatingMatrix' with 48155 ratings.
## [1] 0.7988827
## [1] 0.2011173
Evaluate the ratings in order to recommend the movies to new users. We will try k=10.
evaluation_ratingSet <- evaluationScheme(data = mov_ratings, method = "cross-validation", k = 10, given = items_to_keep,goodRating = rating_threshold)
evaluation_ratingSet
## Evaluation scheme with 50 items given
## Method: 'cross-validation' with 10 run(s).
## Good ratings: >=3.000000
## Data set: 358 x 591 rating matrix of class 'realRatingMatrix' with 60084 ratings.
IBCF Model
model_to_evaluate <- "IBCF"
model_parameters <- NULL
eval_recommender <- Recommender(data=getData(evaluation_set,"train"),method = model_to_evaluate, parameter= model_parameters)
# Recommend top 5 Items
eval_prediction <- predict(object = eval_recommender, newdata = getData(evaluation_set,"known"),n=5,type="ratings")
qplot(rowCounts(eval_prediction)) +
geom_histogram(binwidth = 30) +
ggtitle("Movies per user")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The number of movies per users is between 350 and 500. The peak is around 400.
We will measure the accuracy and computer RMSE, MSE, and MAE.
eval_accuracy <- calcPredictionAccuracy( x = eval_prediction, data = getData(evaluation_set, "unknown"),byUser = TRUE)
head(eval_accuracy)
## RMSE MSE MAE
## 1 1.906756 3.635719 1.5380384
## 16 1.211916 1.468739 0.8173816
## 49 1.615534 2.609951 1.3559561
## 56 1.343744 1.805648 1.0464031
## 70 1.206167 1.454839 0.9608684
## 85 1.246719 1.554307 1.0074523
qplot(eval_accuracy[,"RMSE"])+
geom_histogram(binwidth = 0.2)+
ggtitle("Distribution of the RMSE by user")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The range for RMSE is from 1.0 to 2.0. The peak is around 1.2.
Evaluating the recommendations. Comparing the recommendations with the items having a postive ratings. we have defined that rating which is below 3 is negative rating, above 3 is positive rating.
## IBCF run fold/sample [model time/prediction time]
## 1 [1.194sec/0.04sec]
## TP FP FN TN precision recall TPR
## 10 2.694444 7.305556 94.51389 436.4861 0.2694444 0.02693348 0.02693348
## 20 4.930556 15.069444 92.27778 428.7222 0.2465278 0.05039840 0.05039840
## 30 6.875000 23.125000 90.33333 420.6667 0.2291667 0.07076234 0.07076234
## 40 8.569444 31.430556 88.63889 412.3611 0.2142361 0.08735163 0.08735163
## 50 10.527778 39.472222 86.68056 404.3194 0.2105556 0.10665070 0.10665070
## FPR
## 10 0.01629513
## 20 0.03379447
## 30 0.05198270
## 40 0.07072478
## 50 0.08878909
TP: recommended items that have been purchased.
FP: recommended items that haven’t been purchased.
FN: not recommended items that have been purchased.
TN: not recommended items that haven’t been purchased.
A perfect model would have only TP and TN.
ROC curve with true positive rate and false positive rate.
True positive rate: percentage of purchased items that have been recommended.
TP rate = TP/ (TP+FN).
False postitive rate: percentage of not purchase items that have been recommened.
FP rate = FP / (FP+TN)
Precision-recall curve with the rate of precision and recall.
precision: percentage of recommended items that have been purchased. precision = FP / (TP+FP)
recall: percentage of purchased items that have been recommended, recall = TP / (TP + FN) = True positive rate.
models_to_evaluate <- list(
IBCF_cos = list(name = "IBCF", param = list(method = "cosine")),
IBCF_cor = list(name = "IBCF", param= list (method = "pearson")),
UBCF_cos = list(name = "UBCF", param = list(method = "cosine")),
UBCF_cor = list(name = "UBCF", param = list(method = "pearson")),
random=list(name="RANDOM", param =NULL)
)
n_recommendations <- c (1,5, seq(10,50,10))
list_results <- evaluate(x = evaluation_set, method = models_to_evaluate, n = n_recommendations)
## IBCF run fold/sample [model time/prediction time]
## 1 [1.183sec/0.069sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.982sec/0.035sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.004sec/0.173sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.004sec/0.177sec]
## RANDOM run fold/sample [model time/prediction time]
## 1 [0.001sec/0.026sec]
The highest AUC, the area under the ROC curve is from UBCF with pearson, so, in our case, UBCF_cor is the best-performing technique.
Optimizing k-value.
vector_k <- c(5,10,20,30,40)
models_to_evaluate <- lapply(vector_k, function(k){
list(name="IBCF",param = list(method = "cosine",k = k))
})
names(models_to_evaluate) <- paste0("IBCF_K_", vector_k)
n_recommendations <- c (1,5, seq(10,50,10))
list_results <- evaluate(x = evaluation_set, method = models_to_evaluate, n = n_recommendations)
## IBCF run fold/sample [model time/prediction time]
## 1 [1.064sec/0.018sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.691sec/0.024sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.874sec/0.034sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.867sec/0.034sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.899sec/0.036sec]
The k having the biggest AUC is 40, the second one is 30. and if we want to fine the precision, we set k =20.
Now, We run the SVD, SVDF, and ALS algorithms for the recommender system.
# Create the recommender based on SVD, SVDF, and ALS using the training data
r.svd <- Recommender(getData(evaluation_set, "train"),"SVD")
r.svdf <- Recommender(getData(evaluation_set, "train"),"SVDF")
r.als <- Recommender(getData(evaluation_set, "train"),"ALS")
#Compute predicted ratings for the test data.
p.svd <- predict(r.svd, getData(evaluation_set,"known"),type="ratings")
p.svdf <- predict(r.svdf,getData(evaluation_set,"known"),type="ratings")
p.als <- predict(r.als, getData(evaluation_set,"known"),type="ratings")
error <- rbind(svd = calcPredictionAccuracy(p.svd,getData(evaluation_set, "unknown")),
svdf = calcPredictionAccuracy(p.svdf,getData(evaluation_set,"unknown")),
als = calcPredictionAccuracy(p.als, getData(evaluation_set,"unknown")))
error
## RMSE MSE MAE
## svd 1.0062905 1.0126206 0.8087823
## svdf 0.8863277 0.7855768 0.6946384
## als 0.8927944 0.7970818 0.7110758
From this error matrix, we find that svdf has the smallest RMSE and MAE.
models_to_evaluate <- list(
svd = list(name = "svd", param = list(method = "SVD",type="topNList")),
svdf = list(name = "svdf", param= list (method = "SVDF",type="topNList")),
als = list(name = "als", param = list(method = "ALS",type="topNList"))
)
n_recommendations <- c (1,5, seq(10,50,10))
list_results <- evaluate(x = evaluation_set, method = models_to_evaluate, n = n_recommendations)
## svd run fold/sample [model time/prediction time]
## 1
## Warning: Unknown parameters: method, type
## Available parameter (with default values):
## k = 10
## maxiter = 100
## normalize = center
## verbose = FALSE
## [0.054sec/0.041sec]
## svdf run fold/sample [model time/prediction time]
## 1
## Warning: Unknown parameters: method, type
## Available parameter (with default values):
## k = 10
## gamma = 0.015
## lambda = 0.001
## min_epochs = 50
## max_epochs = 200
## min_improvement = 1e-06
## normalize = center
## verbose = FALSE
## [40.403sec/9.342sec]
## als run fold/sample [model time/prediction time]
## 1
## Warning: Unknown parameters: method, type
## Available parameter (with default values):
## normalize = NULL
## lambda = 0.1
## n_factors = 10
## n_iterations = 10
## min_item_nr = 1
## seed = NULL
## verbose = FALSE
## [0.002sec/11.586sec]
vector_k <- c(5,10,20,30,40)
models_to_evaluate <- lapply(vector_k, function(k){
list(name="SVD",param = list(method = "SVD",k = k,type="topNList"))
})
names(models_to_evaluate) <- paste0("SVD_K_", vector_k)
n_recommendations <- c (1,5, seq(10,50,10))
list_results <- evaluate(x = evaluation_set, method = models_to_evaluate, n = n_recommendations)
## SVD run fold/sample [model time/prediction time]
## 1
## Warning: Unknown parameters: method, type
## Available parameter (with default values):
## k = 10
## maxiter = 100
## normalize = center
## verbose = FALSE
## [0.033sec/0.057sec]
## SVD run fold/sample [model time/prediction time]
## 1
## Warning: Unknown parameters: method, type
## Available parameter (with default values):
## k = 10
## maxiter = 100
## normalize = center
## verbose = FALSE
## [0.032sec/0.04sec]
## SVD run fold/sample [model time/prediction time]
## 1
## Warning: Unknown parameters: method, type
## Available parameter (with default values):
## k = 10
## maxiter = 100
## normalize = center
## verbose = FALSE
## [0.043sec/0.054sec]
## SVD run fold/sample [model time/prediction time]
## 1
## Warning: Unknown parameters: method, type
## Available parameter (with default values):
## k = 10
## maxiter = 100
## normalize = center
## verbose = FALSE
## [0.058sec/0.052sec]
## SVD run fold/sample [model time/prediction time]
## 1
## Warning: Unknown parameters: method, type
## Available parameter (with default values):
## k = 10
## maxiter = 100
## normalize = center
## verbose = FALSE
## [0.075sec/0.05sec]
We see the SVD has the highest AOC. For K value of k=10 is higher presision.
Recommender systems can be evaluated offline or online. Online evaluation can reduce the transaction costs of finding and selecting items online. Collaborative filtering technique is still the most well-known and the most commonly implemented. For svd, SVD would be very time consuming when we deal with the massive data online.