The goal of this assignment is give you practice working with Matrix Factorization techniques. Your task is implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.

“A typical machine learning problem might have several hundred or more variables, while many machine learning algorithms will break down if presented with more than a few dozen. This makes singular value decomposition indispensable in ML for variable reduction.”

library(recommenderlab)
## Warning: package 'recommenderlab' was built under R version 3.5.3
## Loading required package: Matrix
## Loading required package: arules
## Warning: package 'arules' was built under R version 3.5.3
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
## Loading required package: proxy
## Warning: package 'proxy' was built under R version 3.5.3
## 
## Attaching package: 'proxy'
## The following object is masked from 'package:Matrix':
## 
##     as.matrix
## The following objects are masked from 'package:stats':
## 
##     as.dist, dist
## The following object is masked from 'package:base':
## 
##     as.matrix
## Loading required package: registry
## Warning: package 'registry' was built under R version 3.5.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.2
## -- Attaching packages ---------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.2.5
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.5.3
## Warning: package 'tibble' was built under R version 3.5.3
## Warning: package 'readr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.3
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## -- Conflicts ------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x dplyr::recode() masks arules::recode()
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 3.5.3
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(knitr)
library("ggplot2")

Loading the datasets

ratings <- read.csv("C:\\Users\\Violet\\Desktop\\ratings.csv", sep = ",", header = T, stringsAsFactors = F)

movies<- read.csv("C:\\Users\\Violet\\Desktop\\movies.csv", sep = ",", header = T, stringsAsFactors = F) %>% select(movieId, title)
head(ratings)
##   userId movieId rating  timestamp
## 1      3    1371      2 1306463561
## 2      4    1883      3  964622642
## 3      4    4166      4  986848700
## 4      6      61      5  845555454
## 5      6     310      2  845556119
## 6      7    4844      1 1161738352
head(movies)
##   movieId                              title
## 1       1                   Toy Story (1995)
## 2       2                     Jumanji (1995)
## 3       3            Grumpier Old Men (1995)
## 4       4           Waiting to Exhale (1995)
## 5       5 Father of the Bride Part II (1995)
## 6       6                        Heat (1995)
#checking to see if there's any missing values 
sum(is.na(ratings))
## [1] 0

There is no missing values Singular value decomposition takes a rectangular matrix of gene expression data (defined as A, where A is a n x p matrix) in which the n rows represents the genes, and the p columns represents the experimental conditions. The SVD theorem states:

Anxp= Unxn Snxp V^Tpxp

Where

U^TU = Inxn

V^TV = Ipxp (i.e. U and V are orthogonal)

#as.list.data.frame(ratings)
# compute SVD of the rating matrix 
ratings_svd <- as(ratings, "matrix")
ratings_svd <- svd(ratings)
# get U, V, Sigma
u <- ratings_svd$u
v <- ratings_svd$v
sigma <- ratings_svd$d
#inspect u and v
head(u)
##              [,1]         [,2]        [,3]          [,4]
## [1,] -0.003358060 -0.002085816 0.006131694  0.0046533763
## [2,] -0.002479411 -0.001458963 0.004485879 -0.0006280368
## [3,] -0.002536540 -0.001284000 0.004562636 -0.0031611118
## [4,] -0.002173368 -0.001426917 0.003909578 -0.0068148534
## [5,] -0.002173369 -0.001403729 0.003906463  0.0011514496
## [6,] -0.002986066 -0.001517166 0.005332888  0.0061364171
head(v)
##               [,1]          [,2]          [,3]          [,4]
## [1,] -2.645632e-07 -7.367654e-04 -9.999997e-01  2.700543e-04
## [2,] -1.819234e-05  9.999997e-01 -7.367675e-04 -7.579146e-06
## [3,] -2.812234e-09 -7.778111e-06 -2.700487e-04 -1.000000e+00
## [4,] -1.000000e+00 -1.819214e-05  2.779674e-07  2.878670e-09
# inspect sigma
head(sigma, 5) %>% kable(col.names = "Strength of Sigma") %>% kable_styling(full_width = F)
Strength of Sigma
3.890531e+11
1.073746e+07
5.857169e+04
3.763496e+02
# visualize the sigma
plot(sigma)