Project 3

Your task is implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system. You may approach this assignment in a number of ways. You are welcome to start with an existing recommender system written by yourself or someone else. Remember as always to cite your sources, so that you can be graded on what you added, not what you found. SVD can be thought of as a pre-processing step for feature engineering. You might easily start with thousands or millions of items, and use SVD to create a much smaller set of “k” items (e.g. 20 or 70).

I chose ml-latest-small dataset from movielens. This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.

Ratings Data File Structure (ratings.csv)

All ratings are contained in the file ratings.csv. Each line of this file after the header row represents one rating of one movie by one user, and has the following format:

userId,movieId,rating,timestamp

Movies Data File Structure (movies.csv)

Movie information is contained in the file movies.csv. Each line of this file after the header row represents one movie, and has the following format:

movieId,title,genres

Download Data
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## Loading required package: arules
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
## Loading required package: proxy
## 
## Attaching package: 'proxy'
## The following object is masked from 'package:Matrix':
## 
##     as.matrix
## The following objects are masked from 'package:stats':
## 
##     as.dist, dist
## The following object is masked from 'package:base':
## 
##     as.matrix
## Loading required package: registry
## Registered S3 methods overwritten by 'registry':
##   method               from 
##   print.registry_field proxy
##   print.registry_entry proxy
Data Exploration
##      userId         movieId           rating        timestamp        
##  Min.   :  1.0   Min.   :     1   Min.   :0.500   Min.   :8.281e+08  
##  1st Qu.:177.0   1st Qu.:  1199   1st Qu.:3.000   1st Qu.:1.019e+09  
##  Median :325.0   Median :  2991   Median :3.500   Median :1.186e+09  
##  Mean   :326.1   Mean   : 19435   Mean   :3.502   Mean   :1.206e+09  
##  3rd Qu.:477.0   3rd Qu.:  8122   3rd Qu.:4.000   3rd Qu.:1.436e+09  
##  Max.   :610.0   Max.   :193609   Max.   :5.000   Max.   :1.538e+09
##   userId movieId rating timestamp
## 1      1       1      4 964982703
## 2      1       3      4 964981247
## 3      1       6      4 964982224
## 4      1      47      5 964983815
## 5      1      50      5 964982931
## 6      1      70      3 964982400
##     movieId                                          title     
##  Min.   :     1   Confessions of a Dangerous Mind (2002):   2  
##  1st Qu.:  3248   Emma (1996)                           :   2  
##  Median :  7300   Eros (2004)                           :   2  
##  Mean   : 42200   Saturn 3 (1980)                       :   2  
##  3rd Qu.: 76232   War of the Worlds (2005)              :   2  
##  Max.   :193609   ¡Three Amigos! (1986)                 :   1  
##                   (Other)                               :9731  
##             genres    
##  Drama         :1053  
##  Comedy        : 946  
##  Comedy|Drama  : 435  
##  Comedy|Romance: 363  
##  Drama|Romance : 349  
##  Documentary   : 339  
##  (Other)       :6257
##   movieId                              title
## 1       1                   Toy Story (1995)
## 2       2                     Jumanji (1995)
## 3       3            Grumpier Old Men (1995)
## 4       4           Waiting to Exhale (1995)
## 5       5 Father of the Bride Part II (1995)
## 6       6                        Heat (1995)
##                                        genres
## 1 Adventure|Animation|Children|Comedy|Fantasy
## 2                  Adventure|Children|Fantasy
## 3                              Comedy|Romance
## 4                        Comedy|Drama|Romance
## 5                                      Comedy
## 6                       Action|Crime|Thriller

Convert into a recommenderlab sparse matrix

##  [1] "HYBRID_realRatingMatrix"       "ALS_realRatingMatrix"         
##  [3] "ALS_implicit_realRatingMatrix" "IBCF_realRatingMatrix"        
##  [5] "LIBMF_realRatingMatrix"        "POPULAR_realRatingMatrix"     
##  [7] "RANDOM_realRatingMatrix"       "RERECOMMEND_realRatingMatrix" 
##  [9] "SVD_realRatingMatrix"          "SVDF_realRatingMatrix"        
## [11] "UBCF_realRatingMatrix"
## $HYBRID_realRatingMatrix
## [1] "Hybrid recommender that aggegates several recommendation strategies using weighted averages."
## 
## $ALS_realRatingMatrix
## [1] "Recommender for explicit ratings based on latent factors, calculated by alternating least squares algorithm."
## 
## $ALS_implicit_realRatingMatrix
## [1] "Recommender for implicit data based on latent factors, calculated by alternating least squares algorithm."
## 
## $IBCF_realRatingMatrix
## [1] "Recommender based on item-based collaborative filtering."
## 
## $LIBMF_realRatingMatrix
## [1] "Matrix factorization with LIBMF via package recosystem (https://cran.r-project.org/web/packages/recosystem/vignettes/introduction.html)."
## 
## $POPULAR_realRatingMatrix
## [1] "Recommender based on item popularity."
## 
## $RANDOM_realRatingMatrix
## [1] "Produce random recommendations (real ratings)."
## 
## $RERECOMMEND_realRatingMatrix
## [1] "Re-recommends highly rated items (real ratings)."
## 
## $SVD_realRatingMatrix
## [1] "Recommender based on SVD approximation with column-mean imputation."
## 
## $SVDF_realRatingMatrix
## [1] "Recommender based on Funk SVD with gradient descend (https://sifter.org/~simon/journal/20061211.html)."
## 
## $UBCF_realRatingMatrix
## [1] "Recommender based on user-based collaborative filtering."

SVD Parameters

## $k
## [1] 10
## 
## $maxiter
## [1] 100
## 
## $normalize
## [1] "center"

Determine similarity between users First 4 users

##           1  2         3         4
## 1 0.0000000  1 0.7919033 0.9328096
## 2 1.0000000  0        NA 1.0000000
## 3 0.7919033 NA 0.0000000 1.0000000
## 4 0.9328096  1 1.0000000 0.0000000

Determine similarity between items First 4 Movies

##           1         2         3         4
## 1 0.0000000 0.9644641 0.9715415 0.9838699
## 2 0.9644641 0.0000000 0.9389013 0.9609877
## 3 0.9715415 0.9389013 0.0000000 1.0000000
## 4 0.9838699 0.9609877 1.0000000 0.0000000

Explore ratings_data distribution

## vector_ratings_data
##       0     0.5       1     1.5       2     2.5       3     3.5       4     4.5 
## 5830804    1370    2811    1791    7551    5550   20047   13136   26818    8551 
##       5 
##   13211
##  [1] 4.0 0.0 4.5 2.5 3.5 3.0 5.0 0.5 2.0 1.5 1.0

Explore movie performance

##      movie views title
## 356    356   329    NA
## 318    318   317    NA
## 296    296   307    NA
## 593    593   279    NA
## 2571  2571   278    NA
## 260    260   251    NA
##      movie views                                     title
## 356    356   329                       Forrest Gump (1994)
## 318    318   317          Shawshank Redemption, The (1994)
## 296    296   307                       Pulp Fiction (1994)
## 593    593   279          Silence of the Lambs, The (1991)
## 2571  2571   278                        Matrix, The (1999)
## 260    260   251 Star Wars: Episode IV - A New Hope (1977)

Consider only movies with total of views higher than 50 views

Only 436 movies have more than 50 views

Consider movies for a Minimum of 50 users per rates movie and 50 views per movie.

## 378 x 436 rating matrix of class 'realRatingMatrix' with 36214 ratings.
## vector_ratings_data_relevant
##      0    0.5      1    1.5      2    2.5      3    3.5      4    4.5      5 
## 128594    322    694    367   1833   1479   6279   4605  10552   3742   6341

Create Recommender Model. Based on SVD approximation

## Warning in .local(x, ...): x was already normalized by row!
## Warning in .local(x, ...): x was already normalized by row!
##                                                 1                          2
## 1                        Full Metal Jacket (1987)        Pat and Mike (1952)
## 2                                 Phantoms (1998)        Dr. Dolittle (1998)
## 3                             If Lucy Fell (1996)    Aristocrats, The (2005)
## 4                         Monster in a Box (1992) Usual Suspects, The (1995)
## 5 Haunted World of Edward D. Wood Jr., The (1996)      Jupiter's Wife (1994)
## 6                         Mighty Aphrodite (1995)       Pete's Dragon (1977)
##                                                     3
## 1                        Great White Hype, The (1996)
## 2 City Slickers II: The Legend of Curly's Gold (1994)
## 3                                 Dr. Dolittle (1998)
## 4                             Fierce Creatures (1997)
## 5                               Kiss Me, Guido (1997)
## 6                                   Sgt. Bilko (1996)
##                                                   4
## 1                                   Fog, The (1980)
## 2 Star Wars: Episode VI - Return of the Jedi (1983)
## 3                                   Out Cold (2001)
## 4                             Kiss the Girls (1997)
## 5                           Dead Man Walking (1995)
## 6                                  Dobermann (1997)
##                              5
## 1            Wonderland (1999)
## 2 Welcome to Collinwood (2002)
## 3              Cop Land (1997)
## 4         Flesh & Blood (1985)
## 5            Cat People (1982)
## 6                Kissed (1996)
##                                                     6
## 1                   NeverEnding Story III, The (1994)
## 2                        Great White Hype, The (1996)
## 3 City Slickers II: The Legend of Curly's Gold (1994)
## 4      Bread and Chocolate (Pane e cioccolata) (1973)
## 5                       Angels in the Outfield (1994)
## 6                        Age of Innocence, The (1993)
##                                             7
## 1 Time Masters (Maîtres du temps, Les) (1982)
## 2                       Kiss Me, Guido (1997)
## 3        Kid in King Arthur's Court, A (1995)
## 4                               Snatch (2000)
## 5                Whole Wide World, The (1996)
## 6                   Twelve Chairs, The (1970)
##                                             8
## 1 Time Masters (Maîtres du temps, Les) (1982)
## 2                             Monsters (2010)
## 3                 Vanya on 42nd Street (1994)
## 4                    Conspiracy Theory (1997)
## 5                Welcome to Collinwood (2002)
## 6                                        <NA>
##                                                9
## 1             What's Eating Gilbert Grape (1993)
## 2                    Addams Family Values (1993)
## 3                              Wonderland (1999)
## 4 Bread and Chocolate (Pane e cioccolata) (1973)
## 5                   Great White Hype, The (1996)
## 6              Nightmare on Elm Street, A (1984)
##                                10
## 1    Welcome to Collinwood (2002)
## 2 Children of a Lesser God (1986)
## 3              Thumbsucker (2005)
## 4                Dobermann (1997)
## 5           Kiss the Girls (1997)
## 6                   Casino (1995)

Ratings assigned to the movies

## Warning in .local(x, ...): x was already normalized by row!
##             1        2        3        6        7
## [1,] 3.784563 3.880402 3.845442 3.731974 3.833721
## [2,] 3.603780 3.672544 3.656996 3.611870 3.644504
## [3,] 4.305907 4.323836 4.323536 4.297794 4.327453
## [4,] 3.255318 3.066650 3.199115 3.280091 3.095823
## [5,] 3.767017 3.527568 3.620133 3.955714 3.620279

Performance index of the whole model

##           TP           FP           FN           TN    precision       recall 
##   3.56250000   6.43750000  84.38541667 331.61458333   0.35625000   0.05086994