Goal

Different ways of implementing and configuring a recommender system.

Data

Jester dataset from recommenderlab.

Import required libraries and data

library(recommenderlab)
library(ggplot2)
data(Jester5k) 

About data

Jester5k dataset included in recommenderlab package which has “5000 users from the anonymous ratings data from the Jester Online Joke Recommender System.”

Printing rowcounts summary

## number of ratings per user
summary(rowCounts(Jester5k)) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   36.00   53.00   72.00   72.42  100.00  100.00

As we can see, each user included has rated at least 36 jokes and maximum 100 jokes.

Printing histogram

hist(getRatings(Jester5k), main="Distribution of ratings")

As we can see rating is range is between -10.00 and 10.00.

Training and Testing Data

Now we are creating training (0.75%) and testing data (0.25%) set.

#Normalize the ratings
Jester5k <- normalize(Jester5k)
train_records <- evaluationScheme(data = Jester5k, method = "split", train = 0.75, given = 25, goodRating = 0.1)

Algorithms Considered

Two algorithms are used for implementing recommender system using Cosine similarity method. - User-based collaborative filtering (UBCF) - Item-based collaboriatve filtering (IBCF)

algorithms <- list(
  UBCF_cos = list(name = "UBCF", param = list(method = "cosine")), 
  IBCF_cos = list(name = "IBCF", param = list(method = "cosine"))
)

Assessment & Checking correcness

Evaluated both algorithms using for the test dataset.

result_records <- evaluate(x = train_records, method = algorithms, n = 1:20)
## UBCF run fold/sample [model time/prediction time]
##   1  [0.01sec/28.44sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.87sec/1sec]
algorithm_performance <- lapply(result_records, avg)

Performance

Precision-Recall(PR) charts are provided for both algorithms

This shows that user-based collaborative filtering model is performing better than the item-based collaborative filtering model.