The goal of this assignment is give you practice working with Matrix Factorization techniques. Your task is implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.
“A typical machine learning problem might have several hundred or more variables, while many machine learning algorithms will break down if presented with more than a few dozen. This makes singular value decomposition indispensable in ML for variable reduction.”
library(recommenderlab)
## Warning: package 'recommenderlab' was built under R version 3.5.3
## Loading required package: Matrix
## Loading required package: arules
## Warning: package 'arules' was built under R version 3.5.3
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
## Loading required package: proxy
## Warning: package 'proxy' was built under R version 3.5.3
##
## Attaching package: 'proxy'
## The following object is masked from 'package:Matrix':
##
## as.matrix
## The following objects are masked from 'package:stats':
##
## as.dist, dist
## The following object is masked from 'package:base':
##
## as.matrix
## Loading required package: registry
## Warning: package 'registry' was built under R version 3.5.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.2
## -- Attaching packages ---------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.2.5
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.5.3
## Warning: package 'tibble' was built under R version 3.5.3
## Warning: package 'readr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.3
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## -- Conflicts ------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x dplyr::recode() masks arules::recode()
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 3.5.3
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(knitr)
library("ggplot2")
Loading the datasets
ratings <- read.csv("C:\\Users\\Violet\\Desktop\\ratings.csv", sep = ",", header = T, stringsAsFactors = F)
movies<- read.csv("C:\\Users\\Violet\\Desktop\\movies.csv", sep = ",", header = T, stringsAsFactors = F) %>% select(movieId, title)
head(ratings)
## userId movieId rating timestamp
## 1 3 1371 2 1306463561
## 2 4 1883 3 964622642
## 3 4 4166 4 986848700
## 4 6 61 5 845555454
## 5 6 310 2 845556119
## 6 7 4844 1 1161738352
head(movies)
## movieId title
## 1 1 Toy Story (1995)
## 2 2 Jumanji (1995)
## 3 3 Grumpier Old Men (1995)
## 4 4 Waiting to Exhale (1995)
## 5 5 Father of the Bride Part II (1995)
## 6 6 Heat (1995)
#checking to see if there's any missing values
sum(is.na(ratings))
## [1] 0
There is no missing values Singular value decomposition takes a rectangular matrix of gene expression data (defined as A, where A is a n x p matrix) in which the n rows represents the genes, and the p columns represents the experimental conditions. The SVD theorem states:
Anxp= Unxn Snxp V^Tpxp
Where
U^TU = Inxn
V^TV = Ipxp (i.e. U and V are orthogonal)
#as.list.data.frame(ratings)
# compute SVD of the rating matrix
ratings_svd <- as(ratings, "matrix")
ratings_svd <- svd(ratings)
# get U, V, Sigma
u <- ratings_svd$u
v <- ratings_svd$v
sigma <- ratings_svd$d
#inspect u and v
head(u)
## [,1] [,2] [,3] [,4]
## [1,] -0.003358060 -0.002085816 0.006131694 0.0046533763
## [2,] -0.002479411 -0.001458963 0.004485879 -0.0006280368
## [3,] -0.002536540 -0.001284000 0.004562636 -0.0031611118
## [4,] -0.002173368 -0.001426917 0.003909578 -0.0068148534
## [5,] -0.002173369 -0.001403729 0.003906463 0.0011514496
## [6,] -0.002986066 -0.001517166 0.005332888 0.0061364171
head(v)
## [,1] [,2] [,3] [,4]
## [1,] -2.645632e-07 -7.367654e-04 -9.999997e-01 2.700543e-04
## [2,] -1.819234e-05 9.999997e-01 -7.367675e-04 -7.579146e-06
## [3,] -2.812234e-09 -7.778111e-06 -2.700487e-04 -1.000000e+00
## [4,] -1.000000e+00 -1.819214e-05 2.779674e-07 2.878670e-09
# inspect sigma
head(sigma, 5) %>% kable(col.names = "Strength of Sigma") %>% kable_styling(full_width = F)
| Strength of Sigma |
|---|
| 3.890531e+11 |
| 1.073746e+07 |
| 5.857169e+04 |
| 3.763496e+02 |
# visualize the sigma
plot(sigma)