Project 3. I would like to use the Jester5k to try out SVD method for this project. For the missing values, I will use “0” to replace.

library(recommenderlab)
library(MASS)
library(dplyr)
data(Jester5k)
table<-Jester5k@data[1:6,1:5]
m <- as.matrix(table)
m
##           j1    j2    j3    j4    j5
## u2841   7.91  9.17  5.34  8.16 -8.74
## u15547 -3.20 -3.50 -9.56 -8.74 -6.36
## u15221 -1.70  1.21  1.55  2.77  5.58
## u15573 -7.38 -8.93 -3.88 -7.23 -4.90
## u21505  0.10  4.17  4.90  1.55  5.53
## u15994  0.83 -4.90  0.68 -7.18  0.34
s <- svd(m)
D <- diag(s$d)
s$u
##            [,1]        [,2]       [,3]       [,4]        [,5]
## [1,] -0.5494799  0.72062507 -0.1859783 -0.1541509 -0.22025276
## [2,]  0.5136803  0.42520374  0.3326147  0.5491099 -0.32993562
## [3,] -0.1167465 -0.38365425  0.2827983 -0.1043534 -0.08118277
## [4,]  0.5528455  0.12057500 -0.1052661 -0.7521554 -0.31817155
## [5,] -0.2335625 -0.36371465 -0.1305911  0.1379519 -0.85497796
## [6,]  0.2457336 -0.07670743 -0.8641005  0.2811721  0.06180716
D
##          [,1]     [,2]     [,3]    [,4]     [,5]
## [1,] 25.76416  0.00000 0.000000 0.00000 0.000000
## [2,]  0.00000 14.87809 0.000000 0.00000 0.000000
## [3,]  0.00000  0.00000 6.638025 0.00000 0.000000
## [4,]  0.00000  0.00000 0.000000 5.62734 0.000000
## [5,]  0.00000  0.00000 0.000000 0.00000 3.212596
v <- s$v
v
##            [,1]        [,2]        [,3]       [,4]       [,5]
## [1,] -0.3761462  0.26897418 -0.44736373  0.5329321  0.5495598
## [2,] -0.5469938  0.16387443  0.31668561  0.4358271 -0.6194416
## [3,] -0.4327081 -0.20927871 -0.68599258 -0.4351754 -0.3301555
## [4,] -0.5985125  0.01455442  0.47026258 -0.4821208  0.4335708
## [5,] -0.1177214 -0.92562879  0.08856188  0.3228312  0.1314908

u matrix is the user-to concept similarity.D is strenght of concepts. In our case, D55 is the smallest value which we can set to 0 later. v is “item to concept” similarity matrix. For example v11 and v12 are corresponding to the original matrix m the m11 and m12. Now let’s predict if user 7-10 which are outside our small matrix.We can predict which one the users 7 -10 likes better. We will map them into concept space first. so we use the users’ rating matrix multiply with the jokes-to-concept similiarities matrix and find the following results.

# set user7_10, find the strength of concept.
u7_10 <- Jester5k@data[7:10,1:5]
u7_10 <- as.matrix(u7_10)

u7_10 %*% v
##             [,1]      [,2]       [,3]        [,4]        [,5]
## u238   -3.698276  1.562757  0.2055951  3.62536959 -1.34932288
## u5809   3.506150 -2.421667 -2.2720254 -5.13539445  1.67513232
## u16636  8.535836 -2.310339  2.2210927  0.08461509  2.60333030
## u12843  6.835218  2.369505 -5.2379484 -4.42379964 -0.07638406

From the matrix, we predict that u238 related concept of joke 4, u5809 related concept of joke 1, u16636 related concept of joke 1, and u12843 related concept of joke 1. In this case, we find that from user 7 - 10, even though they have zero ratings in common, majority of them related to joke 1. It means that they may all share the similar taste of jokes. So when we do the recommendations, we can recommend the jokes that related to or similar to joke 1 for those users.

Now, let’s find the similarities of jokes.

similarity_items <- similarity(Jester5k[ ,1:20], method = "cosine", which = "items")
as.matrix(similarity_items)
##              j1         j2         j3        j4         j5         j6
## j1  0.000000000 0.38393179 0.39165632 0.2357251 0.21152767 0.24153003
## j2  0.383931795 0.00000000 0.27646389 0.3020971 0.22346564 0.23110510
## j3  0.391656324 0.27646389 0.00000000 0.3532990 0.21790252 0.22771818
## j4  0.235725103 0.30209706 0.35329896 0.0000000 0.18745880 0.19149993
## j5  0.211527673 0.22346564 0.21790252 0.1874588 0.00000000 0.18491730
## j6  0.241530026 0.23110510 0.22771818 0.1914999 0.18491730 0.00000000
## j7  0.168238641 0.19006045 0.16334908 0.1956888 0.18639396 0.08013409
## j8  0.193242378 0.11543525 0.20761578 0.1382612 0.14616123 0.06302279
## j9  0.292743590 0.25072299 0.32055990 0.4018848 0.12001660 0.23935861
## j10 0.359134402 0.26909332 0.28351712 0.1794243 0.18222383 0.28430838
## j11 0.343916460 0.29458043 0.45788631 0.2096112 0.26954705 0.40363945
## j12 0.289463520 0.31572300 0.29771696 0.2346522 0.21776854 0.46628119
## j13 0.129854737 0.17078156 0.14749156 0.2338249 0.12925004 0.00363323
## j14 0.237825927 0.30662441 0.21740754 0.2487826 0.20864490 0.44280407
## j15 0.101299554 0.11438478 0.10471017 0.1674108 0.08275926 0.06594991
## j16 0.002838974 0.02058043 0.03008794 0.2403652 0.07463251 0.10706753
## j17 0.021933156 0.09059244 0.11381000 0.3284677 0.19543428 0.09956409
## j18 0.153766581 0.11866627 0.08818456 0.1474768 0.15742454 0.09009739
## j19 0.138108172 0.11167287 0.11363153 0.2003521 0.21789993 0.25053338
## j20 0.064158449 0.08704760 0.16651544 0.2595146 0.18777826 0.16743049
##             j7         j8         j9        j10        j11        j12
## j1  0.16823864 0.19324238 0.29274359 0.35913440 0.34391646 0.28946352
## j2  0.19006045 0.11543525 0.25072299 0.26909332 0.29458043 0.31572300
## j3  0.16334908 0.20761578 0.32055990 0.28351712 0.45788631 0.29771696
## j4  0.19568885 0.13826117 0.40188485 0.17942433 0.20961120 0.23465218
## j5  0.18639396 0.14616123 0.12001660 0.18222383 0.26954705 0.21776854
## j6  0.08013409 0.06302279 0.23935861 0.28430838 0.40363945 0.46628119
## j7  0.00000000 0.20713865 0.11789743 0.12269677 0.14825554 0.12802683
## j8  0.20713865 0.00000000 0.17060498 0.16456091 0.17006435 0.06303139
## j9  0.11789743 0.17060498 0.00000000 0.32335579 0.19418189 0.27399679
## j10 0.12269677 0.16456091 0.32335579 0.00000000 0.38851004 0.31962718
## j11 0.14825554 0.17006435 0.19418189 0.38851004 0.00000000 0.45080726
## j12 0.12802683 0.06303139 0.27399679 0.31962718 0.45080726 0.00000000
## j13 0.18466320 0.25398672 0.20912358 0.05859746 0.01755421 0.04451035
## j14 0.13434229 0.08619025 0.22036764 0.25302209 0.38660354 0.52456928
## j15 0.23527096 0.23940342 0.16457469 0.03275018 0.01943678 0.04610056
## j16 0.17862981 0.18023209 0.25295040 0.01127270 0.12446200 0.10485485
## j17 0.21420352 0.26912581 0.09883005 0.03518699 0.09485327 0.11119028
## j18 0.24175775 0.15407281 0.24927202 0.14765923 0.02823828 0.09221122
## j19 0.25873457 0.09770614 0.19270960 0.10435320 0.14663097 0.19305927
## j20 0.27328622 0.16473483 0.22806003 0.07761260 0.15779120 0.18947384
##            j13        j14        j15         j16        j17        j18
## j1  0.12985474 0.23782593 0.10129955 0.002838974 0.02193316 0.15376658
## j2  0.17078156 0.30662441 0.11438478 0.020580425 0.09059244 0.11866627
## j3  0.14749156 0.21740754 0.10471017 0.030087936 0.11381000 0.08818456
## j4  0.23382490 0.24878263 0.16741081 0.240365182 0.32846767 0.14747677
## j5  0.12925004 0.20864490 0.08275926 0.074632511 0.19543428 0.15742454
## j6  0.00363323 0.44280407 0.06594991 0.107067528 0.09956409 0.09009739
## j7  0.18466320 0.13434229 0.23527096 0.178629812 0.21420352 0.24175775
## j8  0.25398672 0.08619025 0.23940342 0.180232093 0.26912581 0.15407281
## j9  0.20912358 0.22036764 0.16457469 0.252950404 0.09883005 0.24927202
## j10 0.05859746 0.25302209 0.03275018 0.011272696 0.03518699 0.14765923
## j11 0.01755421 0.38660354 0.01943678 0.124461997 0.09485327 0.02823828
## j12 0.04451035 0.52456928 0.04610056 0.104854853 0.11119028 0.09221122
## j13 0.00000000 0.04118729 0.35530753 0.301165864 0.26600753 0.18870890
## j14 0.04118729 0.00000000 0.05210241 0.107832997 0.17589078 0.06464369
## j15 0.35530753 0.05210241 0.00000000 0.331022875 0.17419098 0.26137723
## j16 0.30116586 0.10783300 0.33102287 0.000000000 0.19416262 0.32562549
## j17 0.26600753 0.17589078 0.17419098 0.194162623 0.00000000 0.10821787
## j18 0.18870890 0.06464369 0.26137723 0.325625492 0.10821787 0.00000000
## j19 0.12283031 0.19049044 0.11308620 0.179495862 0.17722367 0.30754674
## j20 0.14127688 0.17344579 0.13798220 0.242707102 0.22199130 0.26264929
##            j19        j20
## j1  0.13810817 0.06415845
## j2  0.11167287 0.08704760
## j3  0.11363153 0.16651544
## j4  0.20035213 0.25951459
## j5  0.21789993 0.18777826
## j6  0.25053338 0.16743049
## j7  0.25873457 0.27328622
## j8  0.09770614 0.16473483
## j9  0.19270960 0.22806003
## j10 0.10435320 0.07761260
## j11 0.14663097 0.15779120
## j12 0.19305927 0.18947384
## j13 0.12283031 0.14127688
## j14 0.19049044 0.17344579
## j15 0.11308620 0.13798220
## j16 0.17949586 0.24270710
## j17 0.17722367 0.22199130
## j18 0.30754674 0.26264929
## j19 0.00000000 0.38500304
## j20 0.38500304 0.00000000
image(as.matrix(similarity_items),main="cosine for Item similarity")

From joke 1 to joke 20, the high similarities to joke 1 are j3(0.39),j2(0.38),j10(0.36),j11(0.34). So, when we build the recommender system, we can send out these 4 jokes for users who like joke 1 better.

SVD, as my own opinion, is more reliable than others that we learned and it makes sense to me for the rating system. However, as a lot of people mentioned, it could be very expensive to compute, also need a lot of time to analyze the result.

Let’s try it for the first 100 users and first 100 jokes. I will do 60% of them as my traing set and another 25% as my tesing set.

u100 <- Jester5k@data[1:100,1:100]
set.seed(1)
n <- nrow(u100)
shuffled_df <- u100[sample(n), ]
train_indices <- 1: round(0.6*n)
train <- shuffled_df[train_indices, ]
test_indices <- (round(0.6*n)+1):n
test <- shuffled_df[test_indices, ]
train <- as.matrix(train)
test <- as.matrix(test)
s <- svd(train)
D <- diag(s$d)
#s$u
#D
v <- s$v

max_position <- as.vector(max.col(test %*% v))
max_position
##  [1]  2  6 55  2 10  2  2  1  2  1  2  2  2 29  1  2 10  1 15  2  1  1  2
## [24] 12 16  2  2  2  1  2 25 14  2  2  1 11 11  2  2  2
table(max_position)
## max_position
##  1  2  6 10 11 12 14 15 16 25 29 55 
##  8 20  1  2  2  1  1  1  1  1  1  1
similarity_items <- similarity(Jester5k[ ,1:100], method = "cosine", which = "items")
similarity_items<- as.matrix(similarity_items)
image(as.matrix(similarity_items),main="cosine for Item similarity")

max_position_joke <- max.col(similarity_items)
max_position_joke[1]
## [1] 3
max_position_joke[2]
## [1] 25

From here, we find that many of the 40 people in the test group are interested in joke 2 and we also find that joke 25 is the most similar to joke 2. We can recommend jok2 25 for those users who are interested in joke 2. The second popular joke is joke 1, and we find the similiar joke to joke 1 is joke 3, so we can recommend joke 3 for the users who like joke 1.