The following demonstration is a film recommender system designed to help users find new movies based upon user to movie rankings contained in the MovieLens dataset. Techniques covered are user-user and item-item collaborative filtering methods. The recommenderlab library is used for the model training and prediction logic.
# Data loading
library(recommenderlab)
data(MovieLense)
hist(getRatings(MovieLense), main="Distribution of ratings", breaks=6)#Data pre-processing
movies <- as(MovieLense, 'data.frame')
movies$user <- as.numeric(movies$user)
movies$item <- as.numeric(movies$item)To be functional inside of the recommenderlab library, the movies data frame must be converted into a sparse matrix and then into a “realRatingMatrix”. Bear in mind that the prediction engine is limited in its processing capacity and your matrix may exceed R’s potential, as was the case with alternative datasets.
sparse_ratings <- sparseMatrix(i = movies$user, j = movies$item, x = movies$rating,
dims = c(length(unique(movies$user)), length(unique(movies$item))),
dimnames = list(paste("u", 1:length(unique(movies$user)), sep = ""),
paste("m", 1:length(unique(movies$item)), sep = "")))
real_ratings <- new("realRatingMatrix", data = sparse_ratings)
real_ratings## 943 x 1664 rating matrix of class 'realRatingMatrix' with 99392 ratings.
A similarity matrix is a recommenderlab function that takes the “realRatingMatrix”" and calculates a cosine similarity which aids in the investigation of model development. The function can be toggled between users or items.
#similarity matrix
similarity_users <- similarity(real_ratings[1:25, ], method = "cosine", which = "users")
image(as.matrix(similarity_users), main = "User similarity")Constructing the user-user model, with an evaluationScheme object from recommenderlab using the “split” technique to validate our model. A confusion matrix is an alternative method for validation. The validation is used for the root mean square error (RMSE) checking after our prediction. A “Recommender” object is then given the “UBCF” (User-based collaborative filter), with a center normalization, cosine method, with 25 nearest neighbors.
#Evaluation object for RMSE checking.
set.seed(1)
e <- evaluationScheme(real_ratings, method="split", train=0.8, given=-5)
# Creation of the model - U(ser) B(ased) C(ollaborative) F(iltering)
Rec.model <- Recommender(real_ratings, method = "UBCF",
param=list(normalize = "center", method="Cosine", nn=25))
#Making predictions
prediction <- predict(Rec.model, real_ratings[1:25, ], type="ratings")
as(prediction, "matrix")[,1:5]## m1 m2 m3 m4 m5
## u1 3.605166 3.605166 NA NA 3.562997
## u2 4.206522 4.206522 4.223339 NA 4.152807
## u3 3.107143 3.107143 3.061856 3.107143 3.093476
## u4 2.895522 2.895522 NA 2.881361 2.882393
## u5 2.615741 2.615741 2.383553 2.823319 2.615741
## u6 3.620690 3.620690 3.600638 3.752644 3.620690
## u7 2.663636 2.663636 2.719601 2.692688 NA
## u8 3.363636 3.363636 3.363636 3.363636 3.396348
## u9 3.765625 3.765625 3.654908 3.822140 3.765625
## u10 2.952381 2.952381 2.949733 2.952381 2.945525
## u11 3.575758 3.575758 3.505602 3.615067 3.555972
## u12 3.418998 3.440171 3.074245 NA 3.440171
## u13 3.455556 3.455556 3.419717 3.475778 3.455556
## u14 3.045113 3.045113 3.032400 3.078071 3.045113
## u15 3.565217 3.565217 3.565217 3.565217 3.565217
## u16 3.661331 3.688889 3.688889 3.691235 3.679981
## u17 3.693878 3.693878 3.681136 3.685483 3.707989
## u18 3.604167 3.604167 3.493174 3.655711 3.604167
## u19 3.934783 3.934783 3.818656 NA 3.914433
## u20 3.007143 3.005200 3.007143 3.096234 3.007143
## u21 3.905882 3.905882 3.937660 3.937755 3.905882
## u22 4.661972 4.661972 4.484950 4.764733 4.687777
## u23 3.949721 3.949721 3.877516 3.961044 3.967266
## u24 4.392157 4.390376 4.400138 4.485823 4.392157
## u25 3.505860 3.576923 3.467787 3.576923 3.576923
#Estimating RMSE
set.seed(1)
RMSE.model <- Recommender(getData(e, "train"), method = "UBCF",
param=list(normalize = "center", method="Cosine", nn=25))
prediction <- predict(RMSE.model, getData(e, "known"), type="ratings")
rmse_ubcf <- calcPredictionAccuracy(prediction, getData(e, "unknown"))[1]
rmse_ubcf## RMSE
## 1.031304
The only alterations to the user-user approach are the “IBCF” parameter input, and other self-explanatory axis variable switches.
#Building model
model <- Recommender(real_ratings, method = "IBCF",
param=list(normalize = "center", method="Cosine", k=350))
#Making predictions
prediction <- predict(model, real_ratings[1:25], type="ratings")
as(prediction, "matrix")[,1:5]## m1 m2 m3 m4 m5
## u1 3.606855 3.432844 NA NA 3.364729
## u2 4.274994 4.100377 4.453378 NA 4.346925
## u3 2.457169 NA 4.000000 2.707015 2.816183
## u4 2.625211 4.000000 NA 3.121320 3.043051
## u5 2.643749 2.661201 2.777874 2.767238 2.758360
## u6 3.473926 3.670153 3.547413 3.690713 3.955558
## u7 2.458963 5.000000 2.912939 3.170524 NA
## u8 4.267140 NA 3.554982 NA 3.692531
## u9 4.113253 3.914352 3.925738 4.172249 3.895860
## u10 2.695618 NA 2.561915 1.339196 2.675744
## u11 3.835681 3.000000 3.521053 3.880685 3.300969
## u12 3.445636 3.797407 3.653746 NA 3.276212
## u13 3.351456 3.619745 3.588417 3.458750 3.669085
## u14 3.066369 3.021650 3.087120 2.997720 3.102580
## u15 3.397783 NA 4.000000 NA 3.598078
## u16 4.122097 NA 3.460183 3.408894 3.913976
## u17 3.697942 4.000000 3.780144 3.800102 3.465968
## u18 3.571328 3.476977 3.862008 3.712382 3.000000
## u19 3.897629 3.951641 4.386356 NA 3.636076
## u20 2.845824 3.130143 3.314887 3.604163 3.037190
## u21 3.674012 4.476616 4.041865 4.152875 4.166474
## u22 4.737747 4.635225 4.616880 4.868661 4.853315
## u23 3.939223 4.316042 3.964131 4.196925 3.742027
## u24 4.733500 4.498465 4.486323 4.512190 4.611544
## u25 3.586579 NA 4.040942 3.454709 3.308393
#Estimating RMSE
set.seed(1)
model <- Recommender(getData(e, "train"), method = "IBCF",
param=list(normalize = "center", method="Cosine",k=350))
prediction <- predict(model, getData(e, "known"), type="ratings")
rmse_ubcf <- calcPredictionAccuracy(prediction, getData(e, "unknown"))[1]
rmse_ubcf## RMSE
## 1.061466
Based on our RMSE values, our user model is apparently superior. Let’s take an example user “Bob”, or user 610.
real_ratings[610,]## 1 x 1664 rating matrix of class 'realRatingMatrix' with 295 ratings.
The top 5 items relating to user affinity to Bob are:
recommended.items.u610<- predict(Rec.model, real_ratings[610,], n=5)
as(recommended.items.u610, "list")## $u610
## [1] "m336" "m1319" "m500" "m211" "m306"
This submission really only provides the rubric for a basic collaborative approach and gives a good intial “mapping” of an approach towards a better recommendation model. A hybrid approach of some kind could bolster this method, however a more rigorous optimization of the item-item, and user-user methods would be more wise at this point in time. Many of the parameters available have yet to be tested in not only the recommenderlab R package, but in this particular system itself.