The goal of this assignment is give you practice working with Matrix Factorization techniques. The task is implement a matrix factorization method-such as singular value decomposition (SVD) or Alternating Least Squares (ALS)-in the context of a recommender system.
First, we will load the data from the Movie Lense site. We will only deal with movies that have been rated more that 100 times, because it will make the data slightly more accurate.
data(MovieLense)
# use only users with more than 100 ratings
MovieLense100 <- MovieLense[rowCounts(MovieLense) >100,]
MovieLense100
## 358 x 1664 rating matrix of class 'realRatingMatrix' with 73610 ratings.
m <- (MovieLense100@data)
Then, I will apply the SVD method to the data.
svd <- svd(m)
U <- svd$u
D <- diag(svd$d)
V <- t(svd$v)
Any low values in D correspond to weaker variables in U and V. So, for any value in D that is less than 2, I want to remove that value from the diagonal matrix and the corresponding rows from V and columns from U. This reduction reduces the error in the data in the new matrix, m1.
for (i in 1:nrow(D)) {
for (j in 1:ncol(D)) {
D[i, j] <- ifelse(D[i, j] < -12, D[i,j],
ifelse(D[i, j] > 12, D[i,j],
0))
}
}
ds <- rowSums(D)
ds
## [1] 606.59788 211.76954 169.45361 142.92725 122.59383 114.54857 110.54390
## [8] 98.18352 89.38512 88.57463 79.92661 78.85833 76.64809 72.53838
## [15] 71.53005 70.47636 68.22213 66.96664 66.40485 65.91301 64.97195
## [22] 64.82929 63.62658 62.91665 62.61400 62.12962 61.35277 60.28441
## [29] 60.07714 59.88122 59.41470 58.93947 58.80220 58.11954 57.85017
## [36] 57.74264 56.94006 56.74116 56.09044 55.94141 55.40316 55.14509
## [43] 54.57156 54.38956 54.24445 53.88474 53.41853 53.24322 52.89310
## [50] 52.67108 52.48966 52.16037 51.77744 51.66522 51.40997 51.33912
## [57] 50.97528 50.38898 50.37301 50.23717 50.09776 49.90872 49.63490
## [64] 49.37046 48.90782 48.74286 48.57046 48.42741 48.32198 48.19053
## [71] 48.03818 47.75636 47.50418 47.31968 47.26752 47.13101 46.87892
## [78] 46.77185 46.45994 46.30057 46.02110 45.75202 45.49751 45.36188
## [85] 45.23521 45.12420 44.94628 44.81221 44.56059 44.27310 44.22280
## [92] 44.02076 43.75539 43.63721 43.60555 43.35414 43.19032 42.97900
## [99] 42.72413 42.62203 42.30772 42.22184 41.91670 41.84787 41.80917
## [106] 41.56525 41.53070 41.35085 41.11721 41.05680 40.78122 40.63843
## [113] 40.61193 40.50964 40.31866 40.12253 40.05905 39.89596 39.74860
## [120] 39.54459 39.42228 39.27615 39.18718 39.05142 38.73227 38.65508
## [127] 38.52068 38.35508 38.27219 38.20061 37.92703 37.80293 37.67237
## [134] 37.54252 37.33984 37.18749 37.10762 36.80339 36.68501 36.61067
## [141] 36.48582 36.32870 36.20390 36.09063 35.99219 35.86134 35.64939
## [148] 35.52571 35.36280 35.17469 35.03419 34.90962 34.80718 34.62589
## [155] 34.49020 34.45981 34.31102 34.22245 34.13452 33.98136 33.93024
## [162] 33.76545 33.67185 33.54921 33.44014 33.32043 33.19651 33.14644
## [169] 32.73714 32.67642 32.59891 32.51929 32.39743 32.19904 32.13502
## [176] 32.02702 31.95976 31.73464 31.72730 31.51546 31.39149 31.15636
## [183] 31.11848 30.93236 30.88598 30.63740 30.59034 30.48165 30.36402
## [190] 30.23450 30.09323 30.04595 29.94130 29.72950 29.62705 29.57681
## [197] 29.43327 29.33714 29.30067 29.17522 29.12107 28.92980 28.87468
## [204] 28.77251 28.60940 28.44704 28.33007 28.23837 28.12929 27.97092
## [211] 27.90417 27.87464 27.76637 27.57490 27.51590 27.35042 27.16629
## [218] 27.09779 27.02893 26.93440 26.85133 26.76281 26.71627 26.55293
## [225] 26.44628 26.37665 26.23725 26.18979 26.12733 25.97286 25.71827
## [232] 25.61436 25.58751 25.45281 25.31112 25.18680 25.07476 24.91415
## [239] 24.81204 24.78151 24.64187 24.52625 24.41682 24.37756 24.26572
## [246] 24.14400 24.02681 23.96622 23.86477 23.63236 23.55891 23.47775
## [253] 23.45228 23.38505 23.23397 23.05518 23.02862 22.91867 22.81720
## [260] 22.78807 22.66259 22.50099 22.46303 22.44020 22.36694 22.19566
## [267] 22.12478 22.01430 21.89996 21.78882 21.65358 21.63006 21.52969
## [274] 21.35822 21.25001 21.14968 21.01590 20.95653 20.87615 20.84683
## [281] 20.80078 20.56660 20.54972 20.45157 20.39622 20.32412 20.14306
## [288] 19.96548 19.91265 19.72891 19.66726 19.62558 19.53170 19.42862
## [295] 19.20141 19.13867 19.09014 18.92807 18.87900 18.80071 18.73227
## [302] 18.59207 18.51582 18.40531 18.25312 18.17892 18.10764 18.03573
## [309] 17.90838 17.82909 17.70632 17.59977 17.42134 17.31358 17.24759
## [316] 17.08303 16.98805 16.86636 16.77547 16.68054 16.60219 16.56498
## [323] 16.47224 16.32488 16.26156 16.16614 16.09433 15.95678 15.91664
## [330] 15.83083 15.80589 15.65079 15.59060 15.36385 15.31586 15.19587
## [337] 15.06080 15.00936 14.95648 14.78125 14.70372 14.59309 14.35830
## [344] 14.29015 14.28805 14.04227 13.91317 13.75149 13.53965 13.41681
## [351] 13.26802 13.21649 13.09160 12.92293 12.61393 12.43940 12.23970
## [358] 0.00000
#remove any row in V and any column in U where the row sum is 0 in D, which in this case was row 358
D <- D[-358,-358]
U <- U[,-358]
V <- V[-358,]
#re multiply UDV to get new matrix m1
m1 <- U%*%D%*%V
dimnames(m1) <- dimnames(m)
#compare m and m1
m1[1:5,1:5] %>% kable()
| Toy Story (1995) | GoldenEye (1995) | Four Rooms (1995) | Get Shorty (1995) | Copycat (1995) | |
|---|---|---|---|---|---|
| 1 | 5.0035247 | 2.9956137 | 4.0008953 | 2.9951227 | 3.0017912 |
| 5 | 3.9949725 | 3.0062565 | -0.0012770 | 0.0069568 | -0.0025549 |
| 6 | 4.0100210 | -0.0124706 | 0.0025454 | -0.0138666 | 0.0050926 |
| 7 | -0.0013897 | 0.0017294 | -0.0003530 | 5.0019230 | -0.0007062 |
| 10 | 3.9950791 | 0.0061237 | -0.0012499 | 4.0068092 | -0.0025007 |
m[1:5,1:5]
## 5 x 5 sparse Matrix of class "dgCMatrix"
## Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995)
## 1 5 3 4 3
## 5 4 3 . .
## 6 4 . . .
## 7 . . . 5
## 10 4 . . 4
## Copycat (1995)
## 1 3
## 5 .
## 6 .
## 7 .
## 10 .
As we can see, our adjusted m1 matrix is very similar to our original m matrix.
Let’s look at the similarity heatmap to see which movies are more similar.
similar_items <- similarity(MovieLense[, 1:25], method = "pearson", which = "items")
heatmap(as.matrix(similar_items))
Now I will create train and test data sets using the adjusted data and a recommender system.
#change m1 to real rating matrix
m1 <- as(m1, "realRatingMatrix")
#creat training set and test set
mtrain <- as(m1@data[1:80,], "realRatingMatrix")
mtest <- as(m1@data[90:91,], "realRatingMatrix")
#recommender systemusing the training set
rec <- Recommender(mtrain, method = "IBCF")
Now I will create a recommendation table using the recommender system we just created and the test data set.
predicted <- predict(object = rec, newdata = mtest, n = 10, type = "ratings")
reco <- sapply(predicted@data, function(x) {colnames(m)})
reco[1:5,1:5] %>% kable() %>% kable_styling(full_width = T)
| Toy Story (1995) | Toy Story (1995) | Toy Story (1995) | Toy Story (1995) | Toy Story (1995) |
| GoldenEye (1995) | GoldenEye (1995) | GoldenEye (1995) | GoldenEye (1995) | GoldenEye (1995) |
| Four Rooms (1995) | Four Rooms (1995) | Four Rooms (1995) | Four Rooms (1995) | Four Rooms (1995) |
| Get Shorty (1995) | Get Shorty (1995) | Get Shorty (1995) | Get Shorty (1995) | Get Shorty (1995) |
| Copycat (1995) | Copycat (1995) | Copycat (1995) | Copycat (1995) | Copycat (1995) |
My intent was to thee the top recommended movies for each user. We used the SVD method so our adjusted matrix is a more accurate predictor than the original rating matrix. and I wanted to compare to the previous assignments results. But, I’m having issues with the data almost disappearing. I’m not sure at what step, but after line 69, I believe, the ratings all turn to 0, which makes for a bland recommender system. I’m not sure how to fix this, but I do have a firm understanding of the SVD method. Just not the debugging method I needed for this assignment, I suppose.