Project 3 | Matrix Factorization methods

To implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.

1. Load libraries

library(recommenderlab)
## Loading required package: Matrix
## Loading required package: arules
## Warning: package 'arules' was built under R version 3.6.2
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
## Loading required package: proxy
## Warning: package 'proxy' was built under R version 3.6.2
## 
## Attaching package: 'proxy'
## The following object is masked from 'package:Matrix':
## 
##     as.matrix
## The following objects are masked from 'package:stats':
## 
##     as.dist, dist
## The following object is masked from 'package:base':
## 
##     as.matrix
## Loading required package: registry
## Registered S3 methods overwritten by 'registry':
##   method               from 
##   print.registry_field proxy
##   print.registry_entry proxy
library(reshape2)
library(knitr)
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ tidyr::pack()   masks Matrix::pack()
## ✖ dplyr::recode() masks arules::recode()
## ✖ tidyr::unpack() masks Matrix::unpack()

2. Load data and subset data

data(MovieLense)
MovieLense
## 943 x 1664 rating matrix of class 'realRatingMatrix' with 99392 ratings.

Subsetting users have rated at least 50 movies and movies have been watched more than 100 times.

ratings_m <- MovieLense[rowCounts(MovieLense)>50, colCounts(MovieLense)>100]
ratings_m
## 560 x 332 rating matrix of class 'realRatingMatrix' with 55298 ratings.

3. train and test datasets

#1 is users in train dataset, 0 is in test datset#
which_train<- sample(x = c(1, 0), size = nrow(ratings_m),replace = TRUE, prob = c(0.8, 0.2))
head(which_train)
## [1] 0 1 1 1 0 1

Train dataset
Test dataset

train_m <- ratings_m[which_train,]
test_m <- ratings_m[!which_train,]
train_m
## 433 x 332 rating matrix of class 'realRatingMatrix' with 74909 ratings.
test_m
## 127 x 332 rating matrix of class 'realRatingMatrix' with 13102 ratings.

4. Singular Value Decomposition

svd_model <- Recommender(data=train_m, method="SVD", parameter= list(k=20))
svd_model
## Recommender of type 'SVD' for 'realRatingMatrix' 
## learned using 433 users.
svd_predict <- predict(object = svd_model, newdata = test_m, n=6)
svd_predict
## Recommendations as 'topNList' with n = 6 for 127 users.

Recommendation for the first five user.

svd_predict@items[1:5]
## $`1`
## [1] 237 185 276 216 246 255
## 
## $`6`
## [1] 132 119 135 299 144 152
## 
## $`21`
## [1] 110 100 234  48 105  98
## 
## $`24`
## [1]  76  90  20 308 209 309
## 
## $`26`
## [1]  46 107 259  48 152 138