There are quite a number of missing values for this dataset. There are some movie ratings that have unknown ratings from particular users. The purpose of this recommender system is to fill the missing values.
We will build a toy dataset as follows and then replace all of the missing values with column means.
moviesraw <- matrix(c(4, NA, 3, 5, NA, 5, 4, NA, 5, 4, 2, NA, 2, 4, NA, 3, 3, 4, 5, NA), nrow = 5, byrow = T)
colnames(moviesraw) <- c("Batman Begins", "Alice in Wonderland", "Dumb and Dumber", "Equilibrium")
rownames(moviesraw) <- c("Adam", "Benjamin", "Charlie", "David", "Edward")
movies <- matrix(NA, nrow = nrow(moviesraw), ncol = ncol(moviesraw))
for(i in 1:ncol(movies)){
movies[,i] <- moviesraw[,i]
movies[is.na(moviesraw[,i]), i] <- trunc(mean(moviesraw[,i], na.rm = TRUE),1)
}
colnames(movies) <- colnames(moviesraw)
rownames(movies) <- rownames(moviesraw)
movies
## Batman Begins Alice in Wonderland Dumb and Dumber Equilibrium
## Adam 4 4 3 5
## Benjamin 3 5 4 4
## Charlie 5 4 2 4
## David 2 4 3 3
## Edward 3 4 5 4
The following function renders the similarity matrix also known as the cosine matrix. The reason this is called the cosine matrix is because the dot product of two vectors are computed and then divided by the product of the magnitude of the two vectors.
similarity_matrix <- function(v1, v2) {
cosine_similarity <- sum(v1*v2, na.rm = TRUE)/(sqrt(sum(v1^2, na.rm = TRUE))*sqrt(sum(v2^2, na.rm = TRUE)))
}
This is the similarity matrix that finds other users whose past rating behavior is similar to that of the current user and then uses their ratings on other items to predict what the current user will like.
user_sim <- data.frame(matrix(NA, nrow = nrow(movies), ncol = nrow(movies)))
for(i in 1:nrow(user_sim)){
for(j in 1:ncol(user_sim)){
user_sim[i,j] <- similarity_matrix(t(movies[i,]), t(movies[j,]))
}
}
colnames(user_sim) <- rownames(user_sim) <- rownames(movies)
user_sim
## Adam Benjamin Charlie David Edward
## Adam 1.0000000 0.9696970 0.9771355 0.9584677 0.9545455
## Benjamin 0.9696970 1.0000000 0.9298548 0.9984038 0.9848485
## Charlie 0.9771355 0.9298548 1.0000000 0.9138943 0.8983343
## David 0.9584677 0.9984038 0.9138943 1.0000000 0.9784358
## Edward 0.9545455 0.9848485 0.8983343 0.9784358 1.0000000
This is the similarity matrix that finds other movies that have past ratings similar to those of the current movie and then uses their ratings by other users to predict how the current movie will be liked.
movie_sim <- data.frame(matrix(NA, nrow = ncol(movies), ncol = ncol(movies)))
for(i in 1:nrow(movie_sim)){
for(j in 1:ncol(movie_sim)){
movie_sim[i,j] <- similarity_matrix(t(movies[,i]), t(movies[,j]))
}
}
colnames(movie_sim) <- rownames(movie_sim) <- colnames(movies)
movie_sim
## Batman Begins Alice in Wonderland Dumb and Dumber
## Batman Begins 1.0000000 0.9481850 0.8730159
## Alice in Wonderland 0.9481850 1.0000000 0.9615397
## Dumb and Dumber 0.8730159 0.9615397 1.0000000
## Equilibrium 0.9739145 0.9832803 0.9460884
## Equilibrium
## Batman Begins 0.9739145
## Alice in Wonderland 0.9832803
## Dumb and Dumber 0.9460884
## Equilibrium 1.0000000
In order to evaluate the user-user similarities, the ratings provided by the users have to be mean-centered so that the ratings are viewed relative to their own average ratings. We accomplish this by subtracting the calculated mean for each user from each rating.
user_means <- as.matrix(trunc(rowMeans(movies)))
user_means_adj <- matrix(NA, nrow = 5, ncol = 4)
for (i in 1:ncol(user_means_adj)){
user_means_adj[,i] <- abs(movies[,i] - user_means)
}
colnames(user_means_adj) <- colnames(movies)
rownames(user_means_adj) <- rownames(movies)
user_means_adj
## Batman Begins Alice in Wonderland Dumb and Dumber Equilibrium
## Adam 0 0 1 1
## Benjamin 1 1 0 0
## Charlie 2 1 1 1
## David 1 1 0 0
## Edward 1 0 1 0
In order to evaluate the movie-movie similarities, the ratings provided for the movies have to be mean-centered so that the ratings are viewed relative to their own average ratings. We accomplish this by subtracting the calculated mean for each movie from each rating.
movie_means <- as.matrix(trunc(colMeans(movies)))
movie_means_adj <- matrix(NA, nrow = 5, ncol = 4)
for (i in 1:ncol(movie_means_adj)){
movie_means_adj[,i] <- abs(movies[,i] - movie_means[i])
}
colnames(movie_means_adj) <- colnames(movies)
rownames(movie_means_adj) <- rownames(movies)
movie_means_adj
## Batman Begins Alice in Wonderland Dumb and Dumber Equilibrium
## Adam 1 0 0 1
## Benjamin 0 1 1 0
## Charlie 2 0 1 0
## David 1 0 0 1
## Edward 0 0 2 0
The following function genarates a collaborative user-based recommendation. Let us see how Benjamin would rate the movie Equilibrium.
genrec_user <- function(user, movie){
prediction <- user_means[user, ] + (as.matrix(user_sim[user, ])%*%as.matrix(user_means_adj[,movie]))/sum(user_sim[user,])
return(as.numeric(prediction))
}
handcode_rec1 <- genrec_user("Benjamin","Equilibrium")
handcode_rec1
## [1] 4.389029
The following function genarates a collaborative movie-based recommendation. Let us see how Benjamin would rate the movie Equilibrium.
genrec_movie <- function(user, movie){
prediction <- movie_means[movie, ] + (movie_means_adj[user,]%*%as.matrix(movie_sim[,movie]))/sum(movie_sim[movie,])
return(as.numeric(prediction))
}
handcode_rec2 <- genrec_movie("Benjamin", "Equilibrium")
handcode_rec2
## [1] 4.494294
The user-based model is created using the original dataset containing all of the unknown values.
library(recommenderlab)
## Loading required package: Matrix
## Loading required package: arules
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
## Loading required package: proxy
##
## Attaching package: 'proxy'
## The following object is masked from 'package:Matrix':
##
## as.matrix
## The following objects are masked from 'package:stats':
##
## as.dist, dist
## The following object is masked from 'package:base':
##
## as.matrix
## Loading required package: registry
reclab_ratings <- as(moviesraw, 'realRatingMatrix')
recc_model <- Recommender(data = reclab_ratings, method = 'UBCF', parameter = list(method = "Cosine"))
The model is applied to the dataset.
recom <- predict(recc_model, reclab_ratings, type = "ratings")
The result of the user-based model is as follows.
recom <- as(recom, 'matrix')
builtin_rec1 <- recom['Benjamin','Equilibrium']
builtin_rec1
## [1] 4.732056
The movie-based model is created using the original dataset containing all of the unknown values.
library(recommenderlab)
recc_model <- Recommender(data = reclab_ratings, method = 'IBCF', parameter = list(method = "Cosine"))
The model is applied to the dataset.
recom <- predict(recc_model, reclab_ratings, type = "ratings")
The result of the movie-based model is as follows.
recom <- as(recom, 'matrix')
builtin_rec2 <- recom['Benjamin','Equilibrium']
builtin_rec2
## [1] 4
display <- data.frame(types = c('User-User by Hand', 'Movie-Movie by Hand',
'User-User by Package', 'Movie-Movie by Package'),
ratings = c(handcode_rec1, handcode_rec2,
builtin_rec1, builtin_rec2))
library(knitr)
kable(display)
| types | ratings |
|---|---|
| User-User by Hand | 4.389029 |
| Movie-Movie by Hand | 4.494294 |
| User-User by Package | 4.732056 |
| Movie-Movie by Package | 4.000000 |
The computed values and pre-packaged values are precise and at least 4. This shows that Benjamin might like the movie Equilibrium. But due to the fact that this dataset is random and not too big, the conclusions drawn may not be very relevant.