Data 612 ass 3

The goal of this assignment is give you practice working with Matrix Factorization techniques. The task is implement a matrix factorization method-such as singular value decomposition (SVD) or Alternating Least Squares (ALS)-in the context of a recommender system.

First, we will load the data from the Movie Lense site. We will only deal with movies that have been rated more that 100 times, because it will make the data slightly more accurate.

data(MovieLense)
# use only users with more than 100 ratings
MovieLense100 <- MovieLense[rowCounts(MovieLense) >100,]
MovieLense100

## 358 x 1664 rating matrix of class 'realRatingMatrix' with 73610 ratings.

m <- (MovieLense100@data)

Then, I will apply the SVD method to the data.

svd <- svd(m)
U <- svd$u
D <- diag(svd$d)
V <- t(svd$v)

Any low values in D correspond to weaker variables in U and V. So, for any value in D that is less than 2, I want to remove that value from the diagonal matrix and the corresponding rows from V and columns from U. This reduction reduces the error in the data in the new matrix, m1.

for (i in 1:nrow(D)) {
  for (j in 1:ncol(D)) {
    D[i, j] <- ifelse(D[i, j] < -12, D[i,j],
                      ifelse(D[i, j] > 12, D[i,j],
                             0))
  }
}

ds <- rowSums(D)
ds

##   [1] 606.59788 211.76954 169.45361 142.92725 122.59383 114.54857 110.54390
##   [8]  98.18352  89.38512  88.57463  79.92661  78.85833  76.64809  72.53838
##  [15]  71.53005  70.47636  68.22213  66.96664  66.40485  65.91301  64.97195
##  [22]  64.82929  63.62658  62.91665  62.61400  62.12962  61.35277  60.28441
##  [29]  60.07714  59.88122  59.41470  58.93947  58.80220  58.11954  57.85017
##  [36]  57.74264  56.94006  56.74116  56.09044  55.94141  55.40316  55.14509
##  [43]  54.57156  54.38956  54.24445  53.88474  53.41853  53.24322  52.89310
##  [50]  52.67108  52.48966  52.16037  51.77744  51.66522  51.40997  51.33912
##  [57]  50.97528  50.38898  50.37301  50.23717  50.09776  49.90872  49.63490
##  [64]  49.37046  48.90782  48.74286  48.57046  48.42741  48.32198  48.19053
##  [71]  48.03818  47.75636  47.50418  47.31968  47.26752  47.13101  46.87892
##  [78]  46.77185  46.45994  46.30057  46.02110  45.75202  45.49751  45.36188
##  [85]  45.23521  45.12420  44.94628  44.81221  44.56059  44.27310  44.22280
##  [92]  44.02076  43.75539  43.63721  43.60555  43.35414  43.19032  42.97900
##  [99]  42.72413  42.62203  42.30772  42.22184  41.91670  41.84787  41.80917
## [106]  41.56525  41.53070  41.35085  41.11721  41.05680  40.78122  40.63843
## [113]  40.61193  40.50964  40.31866  40.12253  40.05905  39.89596  39.74860
## [120]  39.54459  39.42228  39.27615  39.18718  39.05142  38.73227  38.65508
## [127]  38.52068  38.35508  38.27219  38.20061  37.92703  37.80293  37.67237
## [134]  37.54252  37.33984  37.18749  37.10762  36.80339  36.68501  36.61067
## [141]  36.48582  36.32870  36.20390  36.09063  35.99219  35.86134  35.64939
## [148]  35.52571  35.36280  35.17469  35.03419  34.90962  34.80718  34.62589
## [155]  34.49020  34.45981  34.31102  34.22245  34.13452  33.98136  33.93024
## [162]  33.76545  33.67185  33.54921  33.44014  33.32043  33.19651  33.14644
## [169]  32.73714  32.67642  32.59891  32.51929  32.39743  32.19904  32.13502
## [176]  32.02702  31.95976  31.73464  31.72730  31.51546  31.39149  31.15636
## [183]  31.11848  30.93236  30.88598  30.63740  30.59034  30.48165  30.36402
## [190]  30.23450  30.09323  30.04595  29.94130  29.72950  29.62705  29.57681
## [197]  29.43327  29.33714  29.30067  29.17522  29.12107  28.92980  28.87468
## [204]  28.77251  28.60940  28.44704  28.33007  28.23837  28.12929  27.97092
## [211]  27.90417  27.87464  27.76637  27.57490  27.51590  27.35042  27.16629
## [218]  27.09779  27.02893  26.93440  26.85133  26.76281  26.71627  26.55293
## [225]  26.44628  26.37665  26.23725  26.18979  26.12733  25.97286  25.71827
## [232]  25.61436  25.58751  25.45281  25.31112  25.18680  25.07476  24.91415
## [239]  24.81204  24.78151  24.64187  24.52625  24.41682  24.37756  24.26572
## [246]  24.14400  24.02681  23.96622  23.86477  23.63236  23.55891  23.47775
## [253]  23.45228  23.38505  23.23397  23.05518  23.02862  22.91867  22.81720
## [260]  22.78807  22.66259  22.50099  22.46303  22.44020  22.36694  22.19566
## [267]  22.12478  22.01430  21.89996  21.78882  21.65358  21.63006  21.52969
## [274]  21.35822  21.25001  21.14968  21.01590  20.95653  20.87615  20.84683
## [281]  20.80078  20.56660  20.54972  20.45157  20.39622  20.32412  20.14306
## [288]  19.96548  19.91265  19.72891  19.66726  19.62558  19.53170  19.42862
## [295]  19.20141  19.13867  19.09014  18.92807  18.87900  18.80071  18.73227
## [302]  18.59207  18.51582  18.40531  18.25312  18.17892  18.10764  18.03573
## [309]  17.90838  17.82909  17.70632  17.59977  17.42134  17.31358  17.24759
## [316]  17.08303  16.98805  16.86636  16.77547  16.68054  16.60219  16.56498
## [323]  16.47224  16.32488  16.26156  16.16614  16.09433  15.95678  15.91664
## [330]  15.83083  15.80589  15.65079  15.59060  15.36385  15.31586  15.19587
## [337]  15.06080  15.00936  14.95648  14.78125  14.70372  14.59309  14.35830
## [344]  14.29015  14.28805  14.04227  13.91317  13.75149  13.53965  13.41681
## [351]  13.26802  13.21649  13.09160  12.92293  12.61393  12.43940  12.23970
## [358]   0.00000

#remove any row in V and any column in U where the row sum is 0 in D, which in this case was row 358
D <- D[-358,-358]
U <- U[,-358]
V <- V[-358,]
#re multiply UDV to get new matrix m1
m1 <- U%*%D%*%V
dimnames(m1) <- dimnames(m)
#compare m and m1
m1[1:5,1:5] %>% kable()

	Toy Story (1995)	GoldenEye (1995)	Four Rooms (1995)	Get Shorty (1995)	Copycat (1995)
1	5.0035247	2.9956137	4.0008953	2.9951227	3.0017912
5	3.9949725	3.0062565	-0.0012770	0.0069568	-0.0025549
6	4.0100210	-0.0124706	0.0025454	-0.0138666	0.0050926
7	-0.0013897	0.0017294	-0.0003530	5.0019230	-0.0007062
10	3.9950791	0.0061237	-0.0012499	4.0068092	-0.0025007

m[1:5,1:5]

## 5 x 5 sparse Matrix of class "dgCMatrix"
##    Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995)
## 1                 5                3                 4                 3
## 5                 4                3                 .                 .
## 6                 4                .                 .                 .
## 7                 .                .                 .                 5
## 10                4                .                 .                 4
##    Copycat (1995)
## 1               3
## 5               .
## 6               .
## 7               .
## 10              .

As we can see, our adjusted m1 matrix is very similar to our original m matrix.

Let’s look at the similarity heatmap to see which movies are more similar.

similar_items <- similarity(MovieLense[, 1:25], method = "pearson", which = "items")
heatmap(as.matrix(similar_items))

Now I will create train and test data sets using the adjusted data and a recommender system.

#change m1 to real rating matrix
m1 <- as(m1, "realRatingMatrix")

#creat training set and test set
mtrain <- as(m1@data[1:80,], "realRatingMatrix")
mtest <- as(m1@data[90:91,], "realRatingMatrix")

#recommender systemusing the training set
rec <- Recommender(mtrain, method = "IBCF")

Now I will create a recommendation table using the recommender system we just created and the test data set.

predicted <- predict(object = rec, newdata = mtest, n = 10, type = "ratings")

reco <- sapply(predicted@data, function(x) {colnames(m)}) 
reco[1:5,1:5] %>% kable() %>% kable_styling(full_width = T)

Toy Story (1995)	Toy Story (1995)	Toy Story (1995)	Toy Story (1995)	Toy Story (1995)
GoldenEye (1995)	GoldenEye (1995)	GoldenEye (1995)	GoldenEye (1995)	GoldenEye (1995)
Four Rooms (1995)	Four Rooms (1995)	Four Rooms (1995)	Four Rooms (1995)	Four Rooms (1995)
Get Shorty (1995)	Get Shorty (1995)	Get Shorty (1995)	Get Shorty (1995)	Get Shorty (1995)
Copycat (1995)	Copycat (1995)	Copycat (1995)	Copycat (1995)	Copycat (1995)

My intent was to thee the top recommended movies for each user. We used the SVD method so our adjusted matrix is a more accurate predictor than the original rating matrix. and I wanted to compare to the previous assignments results. But, I’m having issues with the data almost disappearing. I’m not sure at what step, but after line 69, I believe, the ratings all turn to 0, which makes for a bland recommender system. I’m not sure how to fix this, but I do have a firm understanding of the SVD method. Just not the debugging method I needed for this assignment, I suppose.