Project 4 | Accuracy and Beyond
Project 4 | Accuracy and Beyond
Project Objectives
The goal of this assignment is give you practice working with accuracy and other recommender system metrics.
In this assignment you’re asked to do at least one or (if you like) both of the following:
- Work in a small group, and/or
- Choose a different dataset to work with from your previous projects.
Deliverables
- As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data.
- Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity.
- Compare and report on any change in accuracy before and after you’ve made the change in #2.
- As part of your textual conclusion, discuss one or more additional experiments that could be performed and/or metrics that could be evaluated only if online evaluation was possible. Also, briefly propose how you would design a reasonable online evaluation environment.
You’ll find some of the material discussed in this week’s reading to be helpful in completing this project. You may also want to look at papers online, or search for “recsys” on youtube or slideshare.net.
Libraries
Data Preparation and Exploration
We gathered data from https://www.kaggle.com/skillsmuggler/amazon-ratings?select=ratings_Beauty.csv. This is an Amazon dataset, obtained from Kaggle.com. Description of the content is as follows:
This is a dataset related to over 2 Million customer reviews and ratings of Beauty related products sold on their website.
It contains:
the unique UserId (Customer Identification),
the product ASIN (Amazon’s unique product identification code for each product),
Ratings (ranging from 1-5 based on customer satisfaction) and
the Timestamp of the rating (in UNIX time)
Preview data
#Preview movies data
kable(head(amzonratings, n = 10L)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
scroll_box(width = "100%", height = "200px")| UserId | ProductId | Rating | Timestamp |
|---|---|---|---|
| A39HTATAQ9V7YF | 0205616461 | 5 | 1369699200 |
| A3JM6GV9MNOF9X | 0558925278 | 3 | 1355443200 |
| A1Z513UWSAAO0F | 0558925278 | 5 | 1404691200 |
| A1WMRR494NWEWV | 0733001998 | 4 | 1382572800 |
| A3IAAVS479H7M7 | 0737104473 | 1 | 1274227200 |
| AKJHHD5VEH7VG | 0762451459 | 5 | 1404518400 |
| A1BG8QW55XHN6U | 1304139212 | 5 | 1371945600 |
| A22VW0P4VZHDE3 | 1304139220 | 5 | 1373068800 |
| A3V3RE4132GKRO | 130414089X | 5 | 1401840000 |
| A327B0I7CYTEJC | 130414643X | 4 | 1389052800 |
## [1] 2023070 4
Create Matrix
We are converting the dataset to an object of class realRatingMatrix. So, in the following code-chunk, through a couple of steps, we’ll convert amzonratings into a realRatingsMatrix, called amazon, for downstream analysis (refer: page 33 of Building a Recommendation System with R).
amazonmatrix <- sparseMatrix(as.integer(amzonratings$UserId), as.integer(amzonratings$ProductId),
x = amzonratings$Rating)
colnames(amazonmatrix) <- levels(amzonratings$ProductId)
rownames(amazonmatrix) <- levels(amzonratings$UserId)
(amazon <- as(amazonmatrix, "realRatingMatrix"))## 1210271 x 249274 rating matrix of class 'realRatingMatrix' with 2023070 ratings.
At this point, we’ll take stock of the important characteristics of amazon.
## [1] 1210271 249274
## user item rating
## 1609862 A00008821J0F472NDY6A2 B007T8XPC6 5
## 299511 A000186437REL8X2RW8UW B000JD4N9M 5
## 1791795 A0002574WYJMBWKNCPY8 B00AH4TJNS 3
## 1557548 A00029263J863WSR0TDRS B00766SGE8 5
## 898365 A00031961JI1CBNV98TW B002OO16QC 5
## 1544065 A000325234LCBTFVL1QK4 B006ZN1B1S 5
Check and Remove Empty Lines
##
## 1 2 3 4 5 6 7 8 9 10
## 887401 175875 64336 30285 16187 9827 6324 4260 3181 2275
## 11 12 13 14 15 16 17 18 19 20
## 1745 1402 1012 912 677 550 462 411 323 277
## 21 22 23 24 25 26 27 28 29 30
## 262 238 198 160 132 105 129 95 94 82
## 31 32 33 34 35 36 37 38 39 40
## 68 53 64 51 49 45 32 41 37 38
## 41 42 43 44 45 46 47 48 49 50
## 30 35 29 20 13 25 25 19 19 17
## 51 52 53 54 55 56 57 58 59 60
## 13 18 13 15 14 11 15 6 6 13
## 61 62 63 64 65 66 67 68 69 70
## 6 9 11 3 10 4 4 5 6 6
## 71 72 73 74 75 76 77 78 79 80
## 5 2 6 5 7 5 5 2 3 2
## 81 82 83 84 85 86 87 88 89 90
## 8 3 1 5 1 2 4 1 2 3
## 91 92 93 94 95 96 97 98 99 102
## 1 2 5 2 3 1 1 2 2 2
## 103 104 105 107 108 109 110 112 113 114
## 1 3 1 1 1 3 1 2 1 2
## 115 116 117 118 120 122 125 127 129 130
## 1 1 2 2 1 1 1 2 1 1
## 131 132 134 135 137 139 141 145 150 151
## 1 1 1 2 1 1 1 1 1 1
## 152 154 155 164 168 170 172 173 182 186
## 1 1 1 1 1 1 1 1 1 1
## 205 209 211 225 249 259 269 275 276 278
## 1 1 1 1 1 1 1 1 1 1
## 326 336 389
## 1 1 1
##
## 1 2 3 4 5 6 7 8 9 10
## 103484 42209 22334 13902 9623 7214 5592 4404 3574 3059
## 11 12 13 14 15 16 17 18 19 20
## 2542 2267 2024 1657 1526 1410 1208 1096 1054 912
## 21 22 23 24 25 26 27 28 29 30
## 869 810 723 663 591 545 508 517 468 432
## 31 32 33 34 35 36 37 38 39 40
## 381 388 361 338 332 347 276 269 267 261
## 41 42 43 44 45 46 47 48 49 50
## 261 245 221 226 177 204 188 177 167 174
## 51 52 53 54 55 56 57 58 59 60
## 156 138 151 157 132 142 112 119 105 120
## 61 62 63 64 65 66 67 68 69 70
## 103 125 116 111 83 94 99 86 89 92
## 71 72 73 74 75 76 77 78 79 80
## 89 78 84 85 79 74 66 66 63 55
## 81 82 83 84 85 86 87 88 89 90
## 72 60 62 62 58 51 64 56 52 61
## 91 92 93 94 95 96 97 98 99 100
## 56 50 47 41 36 49 46 49 37 32
## 101 102 103 104 105 106 107 108 109 110
## 34 39 40 26 46 46 34 36 27 29
## 111 112 113 114 115 116 117 118 119 120
## 32 39 26 25 29 33 28 22 23 19
## 121 122 123 124 125 126 127 128 129 130
## 27 22 21 28 24 16 24 25 18 19
## 131 132 133 134 135 136 137 138 139 140
## 20 16 24 27 21 15 31 21 23 14
## 141 142 143 144 145 146 147 148 149 150
## 9 16 19 17 24 15 8 16 10 11
## 151 152 153 154 155 156 157 158 159 160
## 13 16 11 15 16 5 18 16 13 12
## 161 162 163 164 165 166 167 168 169 170
## 6 10 14 16 14 17 10 19 18 13
## 171 172 173 174 175 176 177 178 179 180
## 8 8 10 11 4 13 12 7 11 14
## 181 182 183 184 185 186 187 188 189 190
## 10 9 9 8 12 9 10 11 9 15
## 191 192 193 194 195 196 197 198 199 200
## 6 5 5 8 8 7 7 12 9 6
## 201 202 203 204 205 206 207 208 209 210
## 4 8 4 5 12 9 8 5 4 10
## 211 212 213 214 215 216 217 218 219 220
## 6 5 10 9 8 6 5 10 6 4
## 221 222 223 224 225 226 227 228 229 230
## 4 4 5 9 5 1 5 5 9 7
## 231 232 233 234 235 236 237 238 239 240
## 8 1 8 7 11 8 4 4 4 7
## 241 242 243 244 245 246 247 248 249 250
## 7 5 4 6 3 2 3 7 5 5
## 251 252 253 254 255 256 257 258 259 260
## 5 3 7 3 8 8 4 5 2 7
## 261 262 263 264 265 266 267 268 269 272
## 4 3 3 6 5 3 3 3 3 4
## 273 274 275 276 277 278 279 281 282 283
## 2 2 1 5 2 3 3 7 4 2
## 284 285 286 287 288 289 290 291 292 293
## 2 4 3 3 4 6 3 2 3 3
## 294 295 296 297 298 299 300 301 302 303
## 3 5 2 2 4 1 6 3 2 3
## 305 306 307 308 309 310 311 312 313 315
## 4 3 4 1 3 1 5 3 3 5
## 316 318 319 320 321 322 324 326 327 328
## 4 2 3 1 4 2 1 1 3 1
## 329 330 331 332 333 334 335 336 337 338
## 1 5 2 2 2 1 2 3 3 3
## 339 340 341 342 345 346 347 348 349 350
## 5 2 3 1 2 1 1 2 1 3
## 351 352 353 354 355 358 359 360 362 363
## 2 1 1 2 6 2 1 3 3 2
## 364 366 368 369 370 372 375 376 377 379
## 2 1 3 3 3 2 1 2 5 2
## 380 381 383 384 386 387 388 389 391 392
## 2 1 1 1 2 3 1 1 1 1
## 394 395 396 397 398 399 400 402 404 405
## 1 1 1 1 3 2 1 1 1 1
## 406 407 409 411 412 413 414 415 416 418
## 2 1 1 1 4 2 3 3 1 2
## 419 422 423 426 427 429 430 431 432 434
## 2 1 2 1 1 1 3 1 2 1
## 435 436 438 441 442 443 446 447 452 455
## 1 1 3 1 2 2 2 2 2 1
## 456 458 459 460 461 462 463 465 466 467
## 1 1 2 3 2 1 3 2 1 1
## 468 472 473 474 478 479 480 481 483 488
## 3 1 1 3 1 1 4 1 1 1
## 489 494 495 496 497 498 499 500 501 502
## 1 1 2 2 1 2 1 2 3 1
## 503 506 507 509 510 511 512 513 514 515
## 2 1 2 1 3 1 2 1 1 1
## 519 520 523 524 534 537 539 544 545 550
## 1 1 1 3 1 1 1 1 2 2
## 551 553 554 557 558 563 565 581 584 585
## 1 1 1 2 3 2 1 1 1 2
## 587 590 591 594 595 597 598 599 600 601
## 1 1 1 1 1 1 1 2 1 1
## 605 607 609 614 616 618 619 624 639 643
## 1 1 1 1 1 1 1 1 2 1
## 644 653 656 661 662 665 666 668 671 672
## 1 1 1 1 1 1 1 1 1 1
## 678 680 682 685 686 687 689 691 698 700
## 1 1 1 1 1 2 1 1 1 1
## 704 706 707 709 713 714 720 725 729 734
## 1 1 1 1 2 1 1 1 1 1
## 735 736 738 746 755 757 758 765 766 768
## 1 1 1 1 1 1 1 1 1 1
## 773 782 784 789 795 798 810 818 828 834
## 2 1 1 1 1 1 1 1 1 1
## 845 865 880 883 885 888 896 899 925 926
## 1 1 1 1 1 1 1 1 1 1
## 943 945 946 981 992 1046 1051 1061 1074 1079
## 1 1 1 1 1 1 1 1 1 1
## 1105 1108 1135 1136 1153 1159 1163 1323 1330 1333
## 1 1 1 1 1 1 1 1 1 1
## 1341 1347 1349 1379 1430 1468 1475 1558 1589 1838
## 1 1 2 1 1 1 1 1 1 1
## 1885 1918 2041 2088 2143 2477 2869 7533
## 1 1 1 1 1 1 1 1
## 1210271 x 249274 rating matrix of class 'realRatingMatrix' with 2023070 ratings.
Subset
Since the original file contains 2M rows, it consumes too much processing time. So, in order to reduce processing time, we are subsetting the matrix, in th below code-chunk.
## 10320 x 12057 rating matrix of class 'realRatingMatrix' with 111871 ratings.
amazonsub <- amazon[, colCounts(amazon) > 30]
amazonsub <- amazonsub[rowCounts(amazonsub) > 10, ]
amazonsub## 3562 x 12057 rating matrix of class 'realRatingMatrix' with 68565 ratings.
Data Visualization
Exploring the values of the rating
# Vectorize and create unique vector.
vector_ratings <- as.vector(amazonsub@data)
unique(vector_ratings)## [1] 0 5 2 1 4 3
# The ratings are in the range 0-5. Let's count the occurrences of each of them.
table_ratings <- table(vector_ratings)
kable(table_ratings)| vector_ratings | Freq |
|---|---|
| 0 | 42878469 |
| 1 | 3114 |
| 2 | 3966 |
| 3 | 8273 |
| 4 | 16020 |
| 5 | 37192 |
Rating equal to 0 represents a missing value, so we’ll purge out the zero-ratings from vector_ratings.
vector_ratings <- vector_ratings[vector_ratings != 0]
vector_ratings <- factor(vector_ratings)
qplot(vector_ratings) + ggtitle("Distribution of the ratings")Recommendation algorithms
Split the dataset into training set (80%) and testing set (20%):
# Train/test split
set.seed(88)
evaluation <- evaluationScheme(amazonsub, method = "split", train = 0.8, given = 5, goodRating = 3)
train <- getData(evaluation, "train")
known <- getData(evaluation, "known")
unknown <- getData(evaluation, "unknown")
# Set up data frame for timing
timing <- data.frame(Model = factor(), Training = double(), Predicting = double())User Based Collaborative Filtering
method_name <- "UBCF"
# Training
tic()
UBCF_model <- Recommender(train, method = method_name)
t <- toc(quiet = TRUE)
train_time <- round(t$toc - t$tic, 2)
# Predicting
tic()
UBCF_prediction <- predict(UBCF_model, newdata = known, type = "ratings")
t <- toc(quiet = TRUE)
predict_time <- round(t$toc - t$tic, 2)
timing <- rbind(timing, data.frame(Model = as.factor(method_name), Training = as.double(train_time), Predicting = as.double(predict_time)))
# Accuracy
UBCF_accuracy <- calcPredictionAccuracy(UBCF_prediction, unknown)Random
method_name <- "RANDOM"
# Training
tic()
Random_model <- Recommender(train, method = method_name)
t <- toc(quiet = TRUE)
train_time <- round(t$toc - t$tic, 2)
# Predicting
tic()
Random_prediction <- predict(Random_model, newdata = known, type = "ratings")
t <- toc(quiet = TRUE)
predict_time <- round(t$toc - t$tic, 2)
timing <- rbind(timing, data.frame(Model = as.factor(method_name), Training = as.double(train_time),
Predicting = as.double(predict_time)))
# Accuracy
Random_accuracy <- calcPredictionAccuracy(Random_prediction, unknown)SVD
method_name <- "SVD"
# Training
tic()
SVD_model <- Recommender(train, method = method_name, parameter = list(k = 50))
t <- toc(quiet = TRUE)
train_time <- round(t$toc - t$tic, 2)
# Predicting
tic()
SVD_prediction <- predict(SVD_model, newdata = known, type = "ratings")
t <- toc(quiet = TRUE)
predict_time <- round(t$toc - t$tic, 2)
timing <- rbind(timing, data.frame(Model = as.factor(method_name), Training = as.double(train_time), Predicting = as.double(predict_time)))
# Accuracy
SVD_accuracy <- calcPredictionAccuracy(SVD_prediction, unknown)Compairing Models
accuracy <- rbind(UBCF_accuracy, Random_accuracy)
accuracy <- rbind(accuracy, SVD_accuracy)
rownames(accuracy) <- c("UBCF", "Random", "SVD")
#
kable(accuracy) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
scroll_box(width = "100%", height = "200px")| RMSE | MSE | MAE | |
|---|---|---|---|
| UBCF | 1.080407 | 1.167279 | 0.7939583 |
| Random | 1.355398 | 1.837103 | 0.9711710 |
| SVD | 1.076807 | 1.159515 | 0.7809779 |
In the comparison-table, we see that the Random has the highest RMSE, which implies that its accuracy is lowest, but the accuracies of UBCF and SVD are almost same. This is expected, because We know that random recommendations are not expected to be as accurate as the ones based on historical data.
ROC Curve
In the following, we’ll review ROC curve and Precision-reall plot for all the models.
model_list <- list(UBCF = list(name = "UBCF", param = NULL), Random = list(name = "RANDOM", param = NULL), SVD = list(name = "SVD", param = list(k = 50)))
evaluate_results <- evaluate(x = evaluation, method = model_list, n = c(1, 5, 10, 30, 60))## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/232.24sec]
## RANDOM run fold/sample [model time/prediction time]
## 1 [0sec/9.03sec]
## SVD run fold/sample [model time/prediction time]
## 1 [19.19sec/8.63sec]
# Precision-Recall Plot
plot(evaluate_results, "prec/rec", annotate = TRUE, legend = "topright", main = "Precision-Recall")We observe that UBCF performs better than SVD, and considerably better than the Random.
rownames(timing) <- timing$Model
knitr::kable(timing[, 2:3], format = "html") %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))| Training | Predicting | |
|---|---|---|
| UBCF | 0.02 | 212.44 |
| RANDOM | 0.00 | 4.45 |
| SVD | 21.97 | 7.09 |
The table shows that while UBCF model can be created fast, the prediction takes more time. Random model is only predicting, so the table shows only prediction time. In the case of SVD, it takes more time to build, but less time to predict. However, SVD takes less time than UBCF.
Implementation Support for Business or User Experience Goal
We observed that UBCF and SVD models’ accuracy scores were not only similar, but performed better, compared to Random model, let’s create a hybrid model consisting of UBCF and Random models.
Recommending products that are likely to be highly rated by the user, may not always desirable. Rather, users would like to receive recommendations on products, they didn’t immediately consciously imagine. So, perhaps serendipitous recommendation might be more welcome. So, we want to build in some entropy (or randomness), and mildly disrupt the routine of receiving recommendations for highly rated products. In order to accomodate best of both worlds–high rating for preferred items and mild serendipidity–we distributed the weights at 0.99 vs 0.01 between UBCF and Random models.
hybrid_model <- HybridRecommender(UBCF_model, Random_model, weights = c(0.99, 0.01))
Hybrid_prediction <- predict(hybrid_model, newdata = known, type = "ratings")
(Hybrid_accuracy <- calcPredictionAccuracy(Hybrid_prediction, unknown))## RMSE MSE MAE
## 1.4017672 1.9649514 0.9435311
Accuracy Comparison
As a consequence of the entropy, the accuracy has reduced, as expected. However, goal here is not exclusively highest accuracy, but a balance of good user experience.
Now, let’s take top 10 recommendations for the the first user in the test set.
UBCF_recom <- predict(UBCF_model, newdata = known[1], type = "topNList")
Hybrid_recom <- predict(hybrid_model, newdata = known[1], type = "topNList")## $A103WXT3CHVY0H
## [1] 3234 46 580 3609 3624 37 6415 520 2883 3674
## $A103WXT3CHVY0H
## [1] 3234 3624 3609 46 37 10521 2883 7715 7576 2749
Conclusion
In this project, we built four recommender systems and compared their accuracies. Likewise, We could create additional models and compare, and arrive at the most optimal model.
Additional experiments
Another approach is measurement of diversification in choices, through A/B testing. In Amazon, the users are offered a set of recommendations. It may be possible to record how many of the recommended products are hit or clicked, or even leading to a purchase, by a user. A measure of the clicked products is a clue to a person’s tendencies.
It may be possible to throw in a different attractive pop-up Ad, to interrupt the user’s mouse clicking activity. If the user waits for the the Ad to finish or clicks the Ad off, and persists in clicking one or the other offered items, then his/her desire to evaluate thte products is strong.