DATA 643 Recommender Systems Assignment 3

Sarah Wigodsky

2018-06-24


Matrix Factorization Methods

I will build a recommender system to recommend movies to users. I am using the MovieLens data set.
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

The list of ratings can be downloaded from http://files.grouplens.org/datasets/movielens/ml-100k/u.data

The list of movies titles and genres and can downloaded from http://files.grouplens.org/datasets/movielens/ml-100k/u.item

The MovieLens data set has 943 users and 1682 movies. The movie ratings range from 1 to 5. There are 100,000 ratings.

The movies are rated as being in the following genres: unknown, Action, Adventure, Animation, Children’s, Comedy, Crime, Documentary, Drama, Fantasy,Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western

The first 6 movies and their genres are shown below. A one indicates that it is that genre, a zero indicates that it is not.

movie_id movie_name unknown Action Adventure Animation Children’s Comedy Crime Documentary Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western
1 Toy Story (1995) 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 GoldenEye (1995) 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
3 Four Rooms (1995) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
4 Get Shorty (1995) 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
5 Copycat (1995) 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0
6 Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

The data frame needs to be organized differently. The dcast command creates a row for each user and a column for each movie. If a user did not rate a movie, a value of zero is inputted. If the user rated the same movie more than once, the mean of the movie is taken.

’Til There Was You (1997) 1-900 (1994) 101 Dalmatians (1996) 12 Angry Men (1957) 187 (1997)
0 0 2 5 0
0 0 0 0 0
0 0 0 0 2
0 0 0 0 0
0 0 2 0 0

The dataframe will be stored as a realRatingMatrix, which supports the compact storage of sparce matrices.

Similarity Between the First 10 users

##             1          2          3          4          5          6
## 2  0.16893670                                                       
## 3  0.04838832 0.11339323                                            
## 4  0.06456101 0.17969404 0.34978137                                 
## 5  0.37967009 0.07362338 0.02159212 0.03180425                      
## 6  0.42968246 0.24210639 0.07401814 0.06843135 0.23863583           
## 7  0.44309651 0.10860441 0.06742251 0.09150706 0.37473288 0.49352888
## 8  0.32007948 0.10425672 0.08441897 0.18806031 0.24892997 0.20251387
## 9  0.07838506 0.16246959 0.06203868 0.10128356 0.05684700 0.18499701
## 10 0.37773263 0.16127340 0.06621711 0.06085923 0.20142701 0.55485085
##             7          8          9
## 2                                  
## 3                                  
## 4                                  
## 5                                  
## 6                                  
## 7                                  
## 8  0.28581467                      
## 9  0.14609199 0.08594195           
## 10 0.48850114 0.23328945 0.19822253

The smaller the distances, the more related the users. The more related the users, the more red the box appears. It can be seen that user 1 is strongly related to users 3, 4 and 9. User 2 is strongly related to users 5, 7 and 8. User 3 is strongly related to users 1, 5, 6, 7, 8, 9 and 10. User 4 is strongly related to 1, 5, 6, 7 and 10. Etc.

Similarity Between the First 6 movies

##                                             'Til There Was You (1997)
## 1-900 (1994)                                              0.000000000
## 101 Dalmatians (1996)                                     0.024560685
## 12 Angry Men (1957)                                       0.099560619
## 187 (1997)                                                0.185236404
## 2 Days in the Valley (1996)                               0.159264620
## 20,000 Leagues Under the Sea (1954)                       0.000000000
## 2001: A Space Odyssey (1968)                              0.052202521
## 3 Ninjas: High Noon At Mega Mountain (1998)               0.000000000
## 39 Steps, The (1935)                                      0.033325524
##                                             1-900 (1994)
## 1-900 (1994)                                            
## 101 Dalmatians (1996)                        0.014138837
## 12 Angry Men (1957)                          0.009294164
## 187 (1997)                                   0.007354134
## 2 Days in the Valley (1996)                  0.004701732
## 20,000 Leagues Under the Sea (1954)          0.010055458
## 2001: A Space Odyssey (1968)                 0.067037734
## 3 Ninjas: High Noon At Mega Mountain (1998)  0.000000000
## 39 Steps, The (1935)                         0.000000000
##                                             101 Dalmatians (1996)
## 1-900 (1994)                                                     
## 101 Dalmatians (1996)                                            
## 12 Angry Men (1957)                                   0.167005813
## 187 (1997)                                            0.061104935
## 2 Days in the Valley (1996)                           0.143878445
## 20,000 Leagues Under the Sea (1954)                   0.203780555
## 2001: A Space Odyssey (1968)                          0.225802764
## 3 Ninjas: High Noon At Mega Mountain (1998)           0.027642139
## 39 Steps, The (1935)                                  0.092336772
##                                             12 Angry Men (1957)
## 1-900 (1994)                                                   
## 101 Dalmatians (1996)                                          
## 12 Angry Men (1957)                                            
## 187 (1997)                                          0.056822071
## 2 Days in the Valley (1996)                         0.167234811
## 20,000 Leagues Under the Sea (1954)                 0.304078162
## 2001: A Space Odyssey (1968)                        0.422506078
## 3 Ninjas: High Noon At Mega Mountain (1998)         0.072682236
## 39 Steps, The (1935)                                0.394853680
##                                              187 (1997)
## 1-900 (1994)                                           
## 101 Dalmatians (1996)                                  
## 12 Angry Men (1957)                                    
## 187 (1997)                                             
## 2 Days in the Valley (1996)                 0.132326830
## 20,000 Leagues Under the Sea (1954)         0.042927503
## 2001: A Space Odyssey (1968)                0.065059592
## 3 Ninjas: High Noon At Mega Mountain (1998) 0.043133109
## 39 Steps, The (1935)                        0.027300003
##                                             2 Days in the Valley (1996)
## 1-900 (1994)                                                           
## 101 Dalmatians (1996)                                                  
## 12 Angry Men (1957)                                                    
## 187 (1997)                                                             
## 2 Days in the Valley (1996)                                            
## 20,000 Leagues Under the Sea (1954)                         0.133158689
## 2001: A Space Odyssey (1968)                                0.227602301
## 3 Ninjas: High Noon At Mega Mountain (1998)                 0.041364557
## 39 Steps, The (1935)                                        0.055270247
##                                             20,000 Leagues Under the Sea (1954)
## 1-900 (1994)                                                                   
## 101 Dalmatians (1996)                                                          
## 12 Angry Men (1957)                                                            
## 187 (1997)                                                                     
## 2 Days in the Valley (1996)                                                    
## 20,000 Leagues Under the Sea (1954)                                            
## 2001: A Space Odyssey (1968)                                        0.456281038
## 3 Ninjas: High Noon At Mega Mountain (1998)                         0.073720978
## 39 Steps, The (1935)                                                0.322471173
##                                             2001: A Space Odyssey (1968)
## 1-900 (1994)                                                            
## 101 Dalmatians (1996)                                                   
## 12 Angry Men (1957)                                                     
## 187 (1997)                                                              
## 2 Days in the Valley (1996)                                             
## 20,000 Leagues Under the Sea (1954)                                     
## 2001: A Space Odyssey (1968)                                            
## 3 Ninjas: High Noon At Mega Mountain (1998)                  0.067790768
## 39 Steps, The (1935)                                         0.346588693
##                                             3 Ninjas: High Noon At Mega Mountain (1998)
## 1-900 (1994)                                                                           
## 101 Dalmatians (1996)                                                                  
## 12 Angry Men (1957)                                                                    
## 187 (1997)                                                                             
## 2 Days in the Valley (1996)                                                            
## 20,000 Leagues Under the Sea (1954)                                                    
## 2001: A Space Odyssey (1968)                                                           
## 3 Ninjas: High Noon At Mega Mountain (1998)                                            
## 39 Steps, The (1935)                                                        0.098454928

The heat map above displays the similarity between the first 10 movies. The movies that are the most different are 12 Angry Men and 2001: A Space Odyssey; 2001: A Space Odyssey and 20,000 Leagues Under the Sea.

Find Bias For Each Movie and User

##        1        2        3        4        5        6 
## 3.605166 3.704918 2.773585 4.333333 2.874286 3.639423
’Til There Was You (1997) 2.333333
1-900 (1994) 2.600000
101 Dalmatians (1996) 2.908257
12 Angry Men (1957) 4.344000
187 (1997) 3.024390
2 Days in the Valley (1996) 3.225807
## [1] 3.529907
##           1           2           3           4           5           6 
##  0.07525924  0.17501122 -0.75632191  0.80342652 -0.65562110  0.10951626
’Til There Was You (1997) -1.1965735
1-900 (1994) -0.9299068
101 Dalmatians (1996) -0.6216499
12 Angry Men (1957) 0.8140932
187 (1997) -0.5055166
2 Days in the Valley (1996) -0.3041004

The biases of movies and users was done by removing the values for movies that weren’t rated by a user. User 4 tends to rate movies with a high rating while user 3 tends to rate movies with lower ratings. ’Til There Was You is rated lower than most movies and 12 Angry Men was rated with higher than average ratings. The mean rating for a movie is 3.529907.

Singular Value Decomposition (SVD) With Different Features

The irlba function was used to do singular value decomposition. The number of features was changed from 2 to 20 and the root mean square error was calculated each time.

The root mean square error is lowest for two features. However that does not seem like a meaningful way to distinguish between different types of movies. I will therefore choose 15 features, which has a relatively low RMSE, as the means on which to build a model.

Singular Value Decomposition (SVD) With 15 Features

Rating Predictor

The following function takes the number of a user and a movie title and returns the rating the user gave to the movie or the prediction of the rating.

## [1] "Already rated"
## [1] 5
## Home Alone 3 (1997) 
##            1.936554
## Toy Story (1995) 
##         4.435248
## Home Alone 3 (1997) 
##            1.124432

User 1 has already seen Toy Story and rated it a 5. User 1 has not seen Home Alone 3 and probably wouldn’t like it.

User 3 has not seen Toy Story and would probably like it. User 3 has not seen Home Alone 3 and probably wouldn’t like it.