Recommender systems - project 2

Alvaro Bueno

For this project I decided to test a truncated version the MovieLens 20m database, which is comprised of millions of ratings submitted by volunteers and compiled by the GroupLens foundation. We are discarding the Timestamp from the reviews.csv table and only using the user Id, the movie Id and the review itself for making the calculation, I’m also doing my work based in the professor’s example, building the matrix and check its sparsity.

# load the first 100000 reviews 
movie_reviews <- read.csv('ml-20m/ratings.csv')
movie_trunc <- movie_reviews[1:50000,c(1,2,3)]
colnames(movie_trunc) <- c("user","movie","rating")
movie_sparse <- dcast(movie_trunc, user~movie, value.var="rating", fill=0, fun.aggregate=mean)
rownames(movie_sparse) <- movie_sparse$user
#create the matrix
movie_sparse <- movie_sparse[,-1]
movie_sparse <- as.matrix(movie_sparse)

#detect sparsity
movieInter <- movie_sparse
is.na(movieInter) <- movieInter == 0
sum(is.na(movieInter))/(nrow(movieInter)*ncol(movieInter))
## [1] 0.9791168
# Detect means
movieMeans <- apply(movieInter, 2, mean, na.rm=T)
userMeans <- apply(movieInter, 1, mean, na.rm=T)
universalMean <- mean(movieInter,na.rm=T)
universalMean
## [1] 3.50627
#creating bias matrices
userBias <- userMeans - universalMean
movieBias <- movieMeans - universalMean

Based on all the examples presented, I chose ALS to make the predictions as it’s good enough on performance with the 50000 ratings matrix we chose

RMSE <- function(predictionMatrix){ 
  sqrt(mean((predictionMatrix-movieInter)^2,na.rm=T))
} 
lambda <- 0.1
n_factors <- 100
m <- nrow(movie_sparse)
n <- ncol(movie_sparse)
n_iterations <- 20

X <- matrix(5*runif(m*n_factors),nrow=m,ncol=n_factors)
Y <- matrix(5*runif(n_factors*n),nrow=n_factors,ncol=n)
errors = c()
for (i in 1:n_iterations){
  X = t(solve(Y %*% t(Y) + lambda * diag(n_factors), Y%*%t(movie_sparse)))
  Y = solve(t(X) %*% X + lambda * diag(n_factors), t(X)%*%movie_sparse)
  errors[i] <- RMSE(X%*%Y)
}
ALS <- X%*%Y
RMSE(ALS)
## [1] 1.322068

There’s a function to calculate the rating directly using the getRating function, if we try the UserID 1 to see what’s the prediction for the movieID 1 (Toy Story) we proceed as follows.

getRating <- function(user,movie,method){
  if(movie_sparse[user,movie]!=0){
    paste("Already rated: ",round(movie_sparse[user,movie],1)) }
else{
    predicted <- method[user,movie]
    predicted <- predicted + universalMean + userBias[user] + movieBias[movie]
    paste("Predicted rating: ",round(predicted,1))
  }
}
getRating('1','1',ALS)
## [1] "Predicted rating:  3.5"

Now, using recommender Lab, let’s predict the ratings using different techniques. It’s important that we evaluate using the techniques mentioned, such as SVD and ALS. which can be controlled through the method variable in the Recommender function.

Please not that I’m now using the MovieLense data abject that comes bundled with R for ease of use and to focus

data(MovieLense)
e <- evaluationScheme(MovieLense, method="split", train=0.8,
k=2, given=-1)

rec <- Recommender(getData(e, "train"), method = "SVDF")
rec
## Recommender of type 'SVDF' for 'realRatingMatrix' 
## learned using 754 users.
pre <- predict(rec, getData(e, "known"), n=3) # predict using the recommender method created above
as(pre, 'list')[1:10] #display the first 10 predicted results
## $`2`
## [1] "Casablanca (1942)"                                     
## [2] "Close Shave, A (1995)"                                 
## [3] "Wallace & Gromit: The Best of Aardman Animation (1996)"
## 
## $`4`
## [1] "Wrong Trousers, The (1993)"                            
## [2] "Close Shave, A (1995)"                                 
## [3] "Wallace & Gromit: The Best of Aardman Animation (1996)"
## 
## $`8`
## [1] "Close Shave, A (1995)"                                 
## [2] "Wallace & Gromit: The Best of Aardman Animation (1996)"
## [3] "Wrong Trousers, The (1993)"                            
## 
## $`16`
## [1] "Star Wars (1977)"           "Return of the Jedi (1983)" 
## [3] "Princess Bride, The (1987)"
## 
## $`22`
## [1] "Close Shave, A (1995)"    "Titanic (1997)"          
## [3] "L.A. Confidential (1997)"
## 
## $`25`
## [1] "Top Gun (1986)"                            
## [2] "Star Trek III: The Search for Spock (1984)"
## [3] "Clear and Present Danger (1994)"           
## 
## $`31`
## [1] "Paradise Lost: The Child Murders at Robin Hood Hills (1996)"
## [2] "Secrets & Lies (1996)"                                      
## [3] "Dead Man Walking (1995)"                                    
## 
## $`46`
## [1] "Wrong Trousers, The (1993)" "Close Shave, A (1995)"     
## [3] "Casablanca (1942)"         
## 
## $`63`
## [1] "Pulp Fiction (1994)"        "Clockwork Orange, A (1971)"
## [3] "Godfather, The (1972)"     
## 
## $`68`
## [1] "Fargo (1996)"                                          
## [2] "Close Shave, A (1995)"                                 
## [3] "Wallace & Gromit: The Best of Aardman Animation (1996)"

Here are the results for the first 10 users,

Using the recommenderLab functions is a bit more complicated because it looks like it is always asking to cast to RatingMatrix formats and at the time of writing the response was returning blank for many options.