Slope One

if("SlopeOne" %in% rownames(installed.packages()) == FALSE){
  install_github(repo = "SlopeOne", username = "tarashnot")
}
library(SVDApproximation)
## Warning: replacing previous import 'data.table::melt' by 'reshape2::melt'
## when loading 'SVDApproximation'
## Warning: replacing previous import 'data.table::dcast' by 'reshape2::dcast'
## when loading 'SVDApproximation'
library(SlopeOne)
library(data.table)

data(ratings)
class(ratings)
## [1] "data.table" "data.frame"
dim(ratings)
## [1] 1000209       3
head(ratings)
##    user item rating
## 1:    1    1      5
## 2:    6    1      4
## 3:    8    1      4
## 4:    9    1      5
## 5:   10    1      5
## 6:   18    1      4
summary(ratings)
##       user           item          rating     
##  Min.   :   1   Min.   :   1   Min.   :1.000  
##  1st Qu.:1506   1st Qu.: 966   1st Qu.:3.000  
##  Median :3070   Median :1658   Median :4.000  
##  Mean   :3025   Mean   :1731   Mean   :3.582  
##  3rd Qu.:4476   3rd Qu.:2566   3rd Qu.:4.000  
##  Max.   :6040   Max.   :3706   Max.   :5.000
names(ratings) <- c("user_id", "item_id", "rating")
ratings <- data.table(ratings)
samp <-sample(nrow(ratings),0.1 * nrow(ratings))
ratings <-ratings[samp,]

ratings[, user_id := as.character(user_id)]
ratings[, item_id := as.character(item_id)]

setkey(ratings, user_id, item_id)

set.seed(1)

in_train <- rep(TRUE, nrow(ratings))
in_train[sample(1:nrow(ratings), size = round(0.2 * length(unique(ratings$user_id)), 0) * 5)] <- FALSE

ratings_train <- ratings[(in_train)]
ratings_test <- ratings[(!in_train)]

ratings_train_norm <- normalize_ratings(ratings_train)

model <- build_slopeone(ratings_train_norm$ratings)

predictions <- predict_slopeone(model,
                                ratings_test[ , c(1, 2), with = FALSE],
                                ratings_train_norm$ratings)
unnormalized_predictions <- unnormalize_ratings(normalized = ratings_train_norm,
                                                ratings = predictions)

rmse_slopeone <- sqrt(mean((unnormalized_predictions$predicted_rating - ratings_test$rating) ^ 2))
rmse_slopeone
## [1] 1.310448

Summary and Findings:

  • Slope one was introduced by Daniel Lemire and Anna Maclachlan
  • Item-based collaborative filtering
  • Works with data.table objects
  • Simple, using liner regression f(x) = ax + b
  • Fast, but for huge dataset it needs a lot of RAM
  • Accurate, as often on par with more complicated and computationally expensive algorithms

Reference:

https://rpubs.com/tarashnot/recommender_comparison