|IS 643 CURRENT TOPICS IN DATA ANALYTICS - PROJECT 2 | Data Analytics
Item-Based COllarborative Filtering, according to Kitts,Freed and Vrieze 2000, is a model-based approach which produces recoommendation that is based on the relationship between the items inferred from the rating matrix.
Kindly load the libraries below (Or install if your dont have it already)
options(warn=-1)
suppressMessages(library(recommenderlab))
suppressMessages(library(knitr))
suppressMessages(require(Amelia))
#suppressMessages(library(ggthemes))
suppressMessages(library(ggplot2))
suppressMessages(library(plyr))
suppressMessages(library(plotly))Let’s also check the structure as well as the dimension of the data.
data(MovieLense)
str(MovieLense)## Formal class 'realRatingMatrix' [package "recommenderlab"] with 2 slots
## ..@ data :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
## .. .. ..@ i : int [1:99392] 0 1 4 5 9 12 14 15 16 17 ...
## .. .. ..@ p : int [1:1665] 0 452 583 673 882 968 994 1386 1605 1904 ...
## .. .. ..@ Dim : int [1:2] 943 1664
## .. .. ..@ Dimnames:List of 2
## .. .. .. ..$ : chr [1:943] "1" "2" "3" "4" ...
## .. .. .. ..$ : chr [1:1664] "Toy Story (1995)" "GoldenEye (1995)" "Four Rooms (1995)" "Get Shorty (1995)" ...
## .. .. ..@ x : num [1:99392] 5 4 4 4 4 3 1 5 4 5 ...
## .. .. ..@ factors : list()
## ..@ normalize: NULL
dim(MovieLense)## [1] 943 1664
We want to delve more into recommendation which generate recommendation solely on the population.
movies <- Recommender(MovieLense[1:900], method="POPULAR")
movies## Recommender of type 'POPULAR' for 'realRatingMatrix'
## learned using 900 users.
names(getModel(movies))## [1] "topN" "ratings" "normalize"
## [4] "aggregationRatings" "aggregationPopularity" "verbose"
Here are the topN prediction of the recommendation.
getModel(movies)$topN## Recommendations as 'topNList' with n = 1664 for 1 users.
recom <- predict(movies, MovieLense[901:903], n=10)
recom## Recommendations as 'topNList' with n = 10 for 3 users.
as(recom, "list")## $`901`
## [1] "Godfather, The (1972)"
## [2] "Fargo (1996)"
## [3] "Silence of the Lambs, The (1991)"
## [4] "Titanic (1997)"
## [5] "Schindler's List (1993)"
## [6] "Shawshank Redemption, The (1994)"
## [7] "L.A. Confidential (1997)"
## [8] "Casablanca (1942)"
## [9] "Princess Bride, The (1987)"
## [10] "One Flew Over the Cuckoo's Nest (1975)"
##
## $`902`
## [1] "Fargo (1996)"
## [2] "Raiders of the Lost Ark (1981)"
## [3] "Silence of the Lambs, The (1991)"
## [4] "Titanic (1997)"
## [5] "Shawshank Redemption, The (1994)"
## [6] "Usual Suspects, The (1995)"
## [7] "Pulp Fiction (1994)"
## [8] "Princess Bride, The (1987)"
## [9] "One Flew Over the Cuckoo's Nest (1975)"
## [10] "Braveheart (1995)"
##
## $`903`
## [1] "Raiders of the Lost Ark (1981)"
## [2] "Titanic (1997)"
## [3] "Empire Strikes Back, The (1980)"
## [4] "Casablanca (1942)"
## [5] "Princess Bride, The (1987)"
## [6] "Braveheart (1995)"
## [7] "Monty Python and the Holy Grail (1974)"
## [8] "Rear Window (1954)"
## [9] "Contact (1997)"
## [10] "Full Monty, The (1997)"
Best Five (5) for 3 users
recom2 <- bestN(recom, n=5)
as(recom2, "list")## $`901`
## [1] "Godfather, The (1972)" "Fargo (1996)"
## [3] "Silence of the Lambs, The (1991)" "Titanic (1997)"
## [5] "Schindler's List (1993)"
##
## $`902`
## [1] "Fargo (1996)" "Raiders of the Lost Ark (1981)"
## [3] "Silence of the Lambs, The (1991)" "Titanic (1997)"
## [5] "Shawshank Redemption, The (1994)"
##
## $`903`
## [1] "Raiders of the Lost Ark (1981)" "Titanic (1997)"
## [3] "Empire Strikes Back, The (1980)" "Casablanca (1942)"
## [5] "Princess Bride, The (1987)"
Rating matrix with inclusion of original user.
recom3 <- predict(movies, MovieLense[901:903], type="ratingMatrix")
as(recom3, "matrix")[, 1:10]## Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995)
## 901 1.1451613 3.587685 3.471984 3.8294402
## 902 1.5000000 3.232847 3.117145 3.4746015
## 903 -0.8529412 3.585788 3.470087 0.1470588
## Copycat (1995) Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)
## 901 3.621587 3.903914
## 902 3.266748 3.549076
## 903 3.619689 3.902017
## Twelve Monkeys (1995) Babe (1995) Dead Man Walking (1995)
## 901 4.102889 -0.8548387 4.1699507
## 902 3.748051 1.5000000 3.8151120
## 903 -1.852941 4.2414786 -0.8529412
## Richard III (1995)
## 901 4.108844
## 902 3.754006
## 903 4.106947
We have here the top 1, 3, 5, 10, 15 and 20 recommendation lists to evaluate the recommeder using the “popular”" method by obtaining their confusion matrix and avg results.
eval2 <- evaluationScheme(MovieLense[1:900], method="cross", k=4, given=3, goodRating=5)
results <- evaluate(eval2, method = "POPULAR", type="topNList", n=c(1,3,5,10,15,20))## POPULAR run fold/sample [model time/prediction time]
## 1 [0.02sec/0.85sec]
## 2 [0sec/0.81sec]
## 3 [0sec/0.93sec]
## 4 [0sec/0.8sec]
confusionmatrix <- getConfusionMatrix(results)[[1]]
kable(confusionmatrix)| TP | FP | FN | TN | precision | recall | TPR | FPR | |
|---|---|---|---|---|---|---|---|---|
| 1 | 0.3511111 | 0.6488889 | 23.93333 | 1636.067 | 0.3511111 | 0.0193578 | 0.0193578 | 0.0003942 |
| 3 | 0.7600000 | 2.2400000 | 23.52444 | 1634.476 | 0.2533333 | 0.0367864 | 0.0367864 | 0.0013628 |
| 5 | 1.2755556 | 3.7244444 | 23.00889 | 1632.991 | 0.2551111 | 0.0634717 | 0.0634717 | 0.0022661 |
| 10 | 2.2711111 | 7.7288889 | 22.01333 | 1628.987 | 0.2271111 | 0.1184005 | 0.1184005 | 0.0047047 |
| 15 | 3.1377778 | 11.8622222 | 21.14667 | 1624.853 | 0.2091852 | 0.1635445 | 0.1635445 | 0.0072229 |
| 20 | 3.7911111 | 16.2088889 | 20.49333 | 1620.507 | 0.1895556 | 0.1912116 | 0.1912116 | 0.0098722 |
kable(avg(results))| TP | FP | FN | TN | precision | recall | TPR | FPR | |
|---|---|---|---|---|---|---|---|---|
| 1 | 0.3366667 | 0.6633333 | 21.44889 | 1638.551 | 0.3366667 | 0.0216077 | 0.0216077 | 0.0004030 |
| 3 | 0.7344444 | 2.2655556 | 21.05111 | 1636.949 | 0.2448148 | 0.0502713 | 0.0502713 | 0.0013782 |
| 5 | 1.1911111 | 3.8088889 | 20.59444 | 1635.406 | 0.2382222 | 0.0725328 | 0.0725328 | 0.0023164 |
| 10 | 2.0922222 | 7.9077778 | 19.69333 | 1631.307 | 0.2092222 | 0.1262098 | 0.1262098 | 0.0048112 |
| 15 | 2.8566667 | 12.1433333 | 18.92889 | 1627.071 | 0.1904444 | 0.1675452 | 0.1675452 | 0.0073895 |
| 20 | 3.4400000 | 16.5600000 | 18.34556 | 1622.654 | 0.1720000 | 0.1959332 | 0.1959332 | 0.0100790 |
A histogram discribtion.
hist(getRatings(recom3), breaks = 100)A plot showing precisio-recall.
plot(results, "prec/rec", annotate=TRUE)