• Item-Based COllarborative Filtering, according to Kitts,Freed and Vrieze 2000, is a model-based approach which produces recoommendation that is based on the relationship between the items inferred from the rating matrix.

  • Kindly load the libraries below (Or install if your dont have it already)

options(warn=-1)

suppressMessages(library(recommenderlab))
suppressMessages(library(knitr))
suppressMessages(require(Amelia))
#suppressMessages(library(ggthemes))
suppressMessages(library(ggplot2))
suppressMessages(library(plyr))
suppressMessages(library(plotly))
  • We will make use of MovieLense data came with recommenderLab library.

DATA

Let’s also check the structure as well as the dimension of the data.

data(MovieLense)
str(MovieLense)
## Formal class 'realRatingMatrix' [package "recommenderlab"] with 2 slots
##   ..@ data     :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. ..@ i       : int [1:99392] 0 1 4 5 9 12 14 15 16 17 ...
##   .. .. ..@ p       : int [1:1665] 0 452 583 673 882 968 994 1386 1605 1904 ...
##   .. .. ..@ Dim     : int [1:2] 943 1664
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : chr [1:943] "1" "2" "3" "4" ...
##   .. .. .. ..$ : chr [1:1664] "Toy Story (1995)" "GoldenEye (1995)" "Four Rooms (1995)" "Get Shorty (1995)" ...
##   .. .. ..@ x       : num [1:99392] 5 4 4 4 4 3 1 5 4 5 ...
##   .. .. ..@ factors : list()
##   ..@ normalize: NULL
dim(MovieLense)
## [1]  943 1664

We want to delve more into recommendation which generate recommendation solely on the population.

THE ANALYSIS

movies <- Recommender(MovieLense[1:900], method="POPULAR")
movies
## Recommender of type 'POPULAR' for 'realRatingMatrix' 
## learned using 900 users.
names(getModel(movies))
## [1] "topN"                  "ratings"               "normalize"            
## [4] "aggregationRatings"    "aggregationPopularity" "verbose"

Here are the topN prediction of the recommendation.

getModel(movies)$topN
## Recommendations as 'topNList' with n = 1664 for 1 users.
recom <- predict(movies, MovieLense[901:903], n=10)
recom
## Recommendations as 'topNList' with n = 10 for 3 users.
as(recom, "list")
## $`901`
##  [1] "Godfather, The (1972)"                 
##  [2] "Fargo (1996)"                          
##  [3] "Silence of the Lambs, The (1991)"      
##  [4] "Titanic (1997)"                        
##  [5] "Schindler's List (1993)"               
##  [6] "Shawshank Redemption, The (1994)"      
##  [7] "L.A. Confidential (1997)"              
##  [8] "Casablanca (1942)"                     
##  [9] "Princess Bride, The (1987)"            
## [10] "One Flew Over the Cuckoo's Nest (1975)"
## 
## $`902`
##  [1] "Fargo (1996)"                          
##  [2] "Raiders of the Lost Ark (1981)"        
##  [3] "Silence of the Lambs, The (1991)"      
##  [4] "Titanic (1997)"                        
##  [5] "Shawshank Redemption, The (1994)"      
##  [6] "Usual Suspects, The (1995)"            
##  [7] "Pulp Fiction (1994)"                   
##  [8] "Princess Bride, The (1987)"            
##  [9] "One Flew Over the Cuckoo's Nest (1975)"
## [10] "Braveheart (1995)"                     
## 
## $`903`
##  [1] "Raiders of the Lost Ark (1981)"        
##  [2] "Titanic (1997)"                        
##  [3] "Empire Strikes Back, The (1980)"       
##  [4] "Casablanca (1942)"                     
##  [5] "Princess Bride, The (1987)"            
##  [6] "Braveheart (1995)"                     
##  [7] "Monty Python and the Holy Grail (1974)"
##  [8] "Rear Window (1954)"                    
##  [9] "Contact (1997)"                        
## [10] "Full Monty, The (1997)"

Best Five (5) for 3 users

recom2 <- bestN(recom, n=5)
as(recom2, "list")
## $`901`
## [1] "Godfather, The (1972)"            "Fargo (1996)"                    
## [3] "Silence of the Lambs, The (1991)" "Titanic (1997)"                  
## [5] "Schindler's List (1993)"         
## 
## $`902`
## [1] "Fargo (1996)"                     "Raiders of the Lost Ark (1981)"  
## [3] "Silence of the Lambs, The (1991)" "Titanic (1997)"                  
## [5] "Shawshank Redemption, The (1994)"
## 
## $`903`
## [1] "Raiders of the Lost Ark (1981)"  "Titanic (1997)"                 
## [3] "Empire Strikes Back, The (1980)" "Casablanca (1942)"              
## [5] "Princess Bride, The (1987)"

Rating matrix with inclusion of original user.

recom3 <- predict(movies, MovieLense[901:903], type="ratingMatrix")
as(recom3, "matrix")[, 1:10]
##     Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995)
## 901        1.1451613         3.587685          3.471984         3.8294402
## 902        1.5000000         3.232847          3.117145         3.4746015
## 903       -0.8529412         3.585788          3.470087         0.1470588
##     Copycat (1995) Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)
## 901       3.621587                                             3.903914
## 902       3.266748                                             3.549076
## 903       3.619689                                             3.902017
##     Twelve Monkeys (1995) Babe (1995) Dead Man Walking (1995)
## 901              4.102889  -0.8548387               4.1699507
## 902              3.748051   1.5000000               3.8151120
## 903             -1.852941   4.2414786              -0.8529412
##     Richard III (1995)
## 901           4.108844
## 902           3.754006
## 903           4.106947

We have here the top 1, 3, 5, 10, 15 and 20 recommendation lists to evaluate the recommeder using the “popular”" method by obtaining their confusion matrix and avg results.

eval2 <- evaluationScheme(MovieLense[1:900], method="cross", k=4, given=3, goodRating=5)
results <- evaluate(eval2, method = "POPULAR", type="topNList", n=c(1,3,5,10,15,20))
## POPULAR run fold/sample [model time/prediction time]
##   1  [0.02sec/0.85sec] 
##   2  [0sec/0.81sec] 
##   3  [0sec/0.93sec] 
##   4  [0sec/0.8sec]
confusionmatrix <- getConfusionMatrix(results)[[1]]
kable(confusionmatrix)
TP FP FN TN precision recall TPR FPR
1 0.3511111 0.6488889 23.93333 1636.067 0.3511111 0.0193578 0.0193578 0.0003942
3 0.7600000 2.2400000 23.52444 1634.476 0.2533333 0.0367864 0.0367864 0.0013628
5 1.2755556 3.7244444 23.00889 1632.991 0.2551111 0.0634717 0.0634717 0.0022661
10 2.2711111 7.7288889 22.01333 1628.987 0.2271111 0.1184005 0.1184005 0.0047047
15 3.1377778 11.8622222 21.14667 1624.853 0.2091852 0.1635445 0.1635445 0.0072229
20 3.7911111 16.2088889 20.49333 1620.507 0.1895556 0.1912116 0.1912116 0.0098722
kable(avg(results))
TP FP FN TN precision recall TPR FPR
1 0.3366667 0.6633333 21.44889 1638.551 0.3366667 0.0216077 0.0216077 0.0004030
3 0.7344444 2.2655556 21.05111 1636.949 0.2448148 0.0502713 0.0502713 0.0013782
5 1.1911111 3.8088889 20.59444 1635.406 0.2382222 0.0725328 0.0725328 0.0023164
10 2.0922222 7.9077778 19.69333 1631.307 0.2092222 0.1262098 0.1262098 0.0048112
15 2.8566667 12.1433333 18.92889 1627.071 0.1904444 0.1675452 0.1675452 0.0073895
20 3.4400000 16.5600000 18.34556 1622.654 0.1720000 0.1959332 0.1959332 0.0100790

PLOTS

A histogram discribtion.

hist(getRatings(recom3), breaks = 100)

A plot showing precisio-recall.

plot(results, "prec/rec", annotate=TRUE)