Data612 Project3

The goal of this assignment is give you practice working with Matrix Factorization techniques. Your task is implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.

“A typical machine learning problem might have several hundred or more variables, while many machine learning algorithms will break down if presented with more than a few dozen. This makes singular value decomposition indispensable in ML for variable reduction.”

library(recommenderlab)

## Warning: package 'recommenderlab' was built under R version 3.5.3

## Loading required package: Matrix

## Loading required package: arules

## Warning: package 'arules' was built under R version 3.5.3

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

## Loading required package: proxy

## Warning: package 'proxy' was built under R version 3.5.3

## 
## Attaching package: 'proxy'

## The following object is masked from 'package:Matrix':
## 
##     as.matrix

## The following objects are masked from 'package:stats':
## 
##     as.dist, dist

## The following object is masked from 'package:base':
## 
##     as.matrix

## Loading required package: registry

## Warning: package 'registry' was built under R version 3.5.2

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 3.5.2

## -- Attaching packages ---------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.2.1     v purrr   0.2.5
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.3.0

## Warning: package 'ggplot2' was built under R version 3.5.3

## Warning: package 'tibble' was built under R version 3.5.3

## Warning: package 'readr' was built under R version 3.5.2

## Warning: package 'dplyr' was built under R version 3.5.3

## Warning: package 'stringr' was built under R version 3.5.2

## Warning: package 'forcats' was built under R version 3.5.2

## -- Conflicts ------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x dplyr::recode() masks arules::recode()

library(kableExtra)

## Warning: package 'kableExtra' was built under R version 3.5.3

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(knitr)
library("ggplot2")

Loading the datasets

ratings <- read.csv("C:\\Users\\Violet\\Desktop\\ratings.csv", sep = ",", header = T, stringsAsFactors = F)

movies<- read.csv("C:\\Users\\Violet\\Desktop\\movies.csv", sep = ",", header = T, stringsAsFactors = F) %>% select(movieId, title)
head(ratings)

##   userId movieId rating  timestamp
## 1      3    1371      2 1306463561
## 2      4    1883      3  964622642
## 3      4    4166      4  986848700
## 4      6      61      5  845555454
## 5      6     310      2  845556119
## 6      7    4844      1 1161738352

head(movies)

##   movieId                              title
## 1       1                   Toy Story (1995)
## 2       2                     Jumanji (1995)
## 3       3            Grumpier Old Men (1995)
## 4       4           Waiting to Exhale (1995)
## 5       5 Father of the Bride Part II (1995)
## 6       6                        Heat (1995)

#checking to see if there's any missing values 
sum(is.na(ratings))

## [1] 0

There is no missing values Singular value decomposition takes a rectangular matrix of gene expression data (defined as A, where A is a n x p matrix) in which the n rows represents the genes, and the p columns represents the experimental conditions. The SVD theorem states:

Anxp= Unxn Snxp V^Tpxp

Where

U^TU = Inxn

V^TV = Ipxp (i.e. U and V are orthogonal)

#as.list.data.frame(ratings)
# compute SVD of the rating matrix 
ratings_svd <- as(ratings, "matrix")
ratings_svd <- svd(ratings)

# get U, V, Sigma
u <- ratings_svd$u
v <- ratings_svd$v
sigma <- ratings_svd$d

#inspect u and v
head(u)

##              [,1]         [,2]        [,3]          [,4]
## [1,] -0.003358060 -0.002085816 0.006131694  0.0046533763
## [2,] -0.002479411 -0.001458963 0.004485879 -0.0006280368
## [3,] -0.002536540 -0.001284000 0.004562636 -0.0031611118
## [4,] -0.002173368 -0.001426917 0.003909578 -0.0068148534
## [5,] -0.002173369 -0.001403729 0.003906463  0.0011514496
## [6,] -0.002986066 -0.001517166 0.005332888  0.0061364171

head(v)

##               [,1]          [,2]          [,3]          [,4]
## [1,] -2.645632e-07 -7.367654e-04 -9.999997e-01  2.700543e-04
## [2,] -1.819234e-05  9.999997e-01 -7.367675e-04 -7.579146e-06
## [3,] -2.812234e-09 -7.778111e-06 -2.700487e-04 -1.000000e+00
## [4,] -1.000000e+00 -1.819214e-05  2.779674e-07  2.878670e-09

# inspect sigma
head(sigma, 5) %>% kable(col.names = "Strength of Sigma") %>% kable_styling(full_width = F)

Strength of Sigma
3.890531e+11
1.073746e+07
5.857169e+04
3.763496e+02

# visualize the sigma
plot(sigma)

Data612 Project3

Violet Stoyanova

3/16/2020