Different ways of implementing and configuring a recommender system.
Jester dataset from recommenderlab.
library(recommenderlab)
library(ggplot2)
data(Jester5k)
Jester5k dataset included in recommenderlab package which has “5000 users from the anonymous ratings data from the Jester Online Joke Recommender System.”
Printing rowcounts summary
## number of ratings per user
summary(rowCounts(Jester5k))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 36.00 53.00 72.00 72.42 100.00 100.00
As we can see, each user included has rated at least 36 jokes and maximum 100 jokes.
Printing histogram
hist(getRatings(Jester5k), main="Distribution of ratings")
As we can see rating is range is between -10.00 and 10.00.
Now we are creating training (0.75%) and testing data (0.25%) set.
#Normalize the ratings
Jester5k <- normalize(Jester5k)
train_records <- evaluationScheme(data = Jester5k, method = "split", train = 0.75, given = 25, goodRating = 0.1)
Two algorithms are used for implementing recommender system using Cosine similarity method. - User-based collaborative filtering (UBCF) - Item-based collaboriatve filtering (IBCF)
algorithms <- list(
UBCF_cos = list(name = "UBCF", param = list(method = "cosine")),
IBCF_cos = list(name = "IBCF", param = list(method = "cosine"))
)
Evaluated both algorithms using for the test dataset.
result_records <- evaluate(x = train_records, method = algorithms, n = 1:20)
## UBCF run fold/sample [model time/prediction time]
## 1 [0.01sec/28.44sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.87sec/1sec]
algorithm_performance <- lapply(result_records, avg)
Precision-Recall(PR) charts are provided for both algorithms
This shows that user-based collaborative filtering model is performing better than the item-based collaborative filtering model.