Objective

The goal of this assignment is give you practice working with accuracy and other recommender system metrics.

Deliverables

  1. As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data.

  2. Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity.

  3. Compare and report on any change in accuracy before and after you’ve made the change in #2.

  4. As part of your textual conclusion, discuss one or more additional experiments that could be performed and/or metrics that could be evaluated only if online evaluation was possible. Also, briefly propose how you would design a reasonable online evaluation environment.

Dataset Information

• The dataset is a product ratings for beauty products sold on Amazon.com. The dataset was downloaded from Kaggle.com.

• Original set contains 2,023,070 observations and 4 variables - User ID, Product ID, Rating (from 1 to 5), and Time Stamp. It covers 1,210,271 users and 249,274 products. In order to make the set more manageable it has been reduced to a smaller subset.

• The final ratings dataset used consists of 3562 x 12057 rating matrix of class ‘realRatingMatrix’ with 68565 ratings.

Subsetting Dataset

Now let’s see the process of reducing the data from the main dataset

Step 1) Import original file and select sample for project

Step-2) Explore

          UserId  ProductId Rating  Timestamp
1 A39HTATAQ9V7YF 0205616461      5 1369699200
2 A3JM6GV9MNOF9X 0558925278      3 1355443200
3 A1Z513UWSAAO0F 0558925278      5 1404691200
4 A1WMRR494NWEWV 0733001998      4 1382572800
5 A3IAAVS479H7M7 0737104473      1 1274227200
6  AKJHHD5VEH7VG 0762451459      5 1404518400
[1] "factor"
[1] "factor"
[1] "numeric"
[1] "integer"

Step-3) Convert to realRatingMatrix

Step-4) Explore

1210271 x 249274 rating matrix of class 'realRatingMatrix' with 2023070 ratings.


     1      2      3      4      5      6      7      8      9     10 
887401 175875  64336  30285  16187   9827   6324   4260   3181   2275 
    11     12     13     14     15     16     17     18     19     20 
  1745   1402   1012    912    677    550    462    411    323    277 
    21     22     23     24     25     26     27     28     29     30 
   262    238    198    160    132    105    129     95     94     82 
    31     32     33     34     35     36     37     38     39     40 
    68     53     64     51     49     45     32     41     37     38 
    41     42     43     44     45     46     47     48     49     50 
    30     35     29     20     13     25     25     19     19     17 
    51     52     53     54     55     56     57     58     59     60 
    13     18     13     15     14     11     15      6      6     13 
    61     62     63     64     65     66     67     68     69     70 
     6      9     11      3     10      4      4      5      6      6 
    71     72     73     74     75 
     5      2      6      5      7 
 [ reached getOption("max.print") -- omitted 78 entries ]


     1      2      3      4      5      6      7      8      9     10 
103484  42209  22334  13902   9623   7214   5592   4404   3574   3059 
    11     12     13     14     15     16     17     18     19     20 
  2542   2267   2024   1657   1526   1410   1208   1096   1054    912 
    21     22     23     24     25     26     27     28     29     30 
   869    810    723    663    591    545    508    517    468    432 
    31     32     33     34     35     36     37     38     39     40 
   381    388    361    338    332    347    276    269    267    261 
    41     42     43     44     45     46     47     48     49     50 
   261    245    221    226    177    204    188    177    167    174 
    51     52     53     54     55     56     57     58     59     60 
   156    138    151    157    132    142    112    119    105    120 
    61     62     63     64     65     66     67     68     69     70 
   103    125    116    111     83     94     99     86     89     92 
    71     72     73     74     75 
    89     78     84     85     79 
 [ reached getOption("max.print") -- omitted 513 entries ]

Step-5) Select Subset 1 and Subset 2

10320 x 12057 rating matrix of class 'realRatingMatrix' with 111871 ratings.
3562 x 12057 rating matrix of class 'realRatingMatrix' with 68565 ratings.

Step-6) Check and Remove Empty Lines


 11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28 
598 479 335 274 264 184 163 135 118  99  84  73  73  66  50  54  26  34 
 29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46 
 36  31  31  30  22  18  17  15  11  20  15  13  11   8   7   9   8   5 
 47  48  49  50  51  52  53  54  55  56  57  58  60  61  62  63  64  65 
 10  10   4   5   8   4   7   6   4   2   3   3   1   2   4   4   1   2 
 66  67  68  69  70  71  72  73  74  75  76  77  78  79  81  85  86  90 
  4   4   2   2   1   1   1   3   2   5   2   5   1   3   3   2   1   1 
 91  93  94 
  2   1   1 
 [ reached getOption("max.print") -- omitted 18 entries ]

   0    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
2410 2404 1849 1231  918  587  451  323  217  192  147  114   96   77   51 
  15   16   17   18   19   20   21   22   23   24   25   26   27   28   29 
  60   48   48   49   30   36   39   24   18   26   20   20   17   23   18 
  30   31   32   33   34   35   36   37   38   39   40   41   42   43   44 
  19   20   21   15   16   10   14   16   10   14    8   12   10    8    9 
  45   46   47   48   49   50   51   52   53   54   55   56   57   58   59 
  15   11    9    9   15   13    5    4    6   10    4    9   13   11    5 
  60   61   62   63   64   65   66   68   69   70   71   72   73   74   75 
   6   12    6    6    5    8    4    7    2    6    2    6    5    3    8 
 [ reached getOption("max.print") -- omitted 41 entries ]
3562 x 9647 rating matrix of class 'realRatingMatrix' with 68565 ratings.

Step-7) Convert to data frame and save as CSV file

Recommender Models

Now, Let’s build three different models

USER BASED COLLABORATIVE FILTERING

RANDOM

SVD

Compairing Models

As we have build all three models for the dataset, now we can proceed with compairing the accuracy for all three models

RMSE MSE MAE
UBCF 1.129307 1.275334 0.838039
Random 1.423150 2.025355 1.025171
SVD 1.123959 1.263285 0.827643

As we review the accuracy scores above for UBCF, Random, SVD models, we see that Random has the lowest accuracy than UBCF and SVD. Whereas, UBCF and SVD models accuracy figures are quite close to each other. It is not surprising that random recommendations are not as accurate as recommendations based on prior ratings.

ROC curve

Now we can we can review ROC curve and Precision-Recall plot for all three models.

UBCF run fold/sample [model time/prediction time]
     1  [0.006sec/109.981sec] 
RANDOM run fold/sample [model time/prediction time]
     1  [0.001sec/4.169sec] 
SVD run fold/sample [model time/prediction time]
     1  [12.857sec/4.995sec] 

UBCF performs better than SVD and considerably better than the Random model.

• Now, Let us see the training and prediction time.

• From the table below we can see that the UBCF model can be created fairly quickly, but predicting results takes considerable time. The Random model is pretty efficient all around. The SVD model takes longer to build than to predict, but altogether it is quicker than the UBCF model.

Training Predicting
UBCF 0.02 108.18
RANDOM 0.00 2.31
SVD 12.70 3.44

Implement Support for Business/User Experiance Goal

• Since UBCF and SVD models’s accuracy scores were similar and they also performed better compared to Random model, let’s create a hybrid model consisting of UBCF and SVD models.

• It may not always be desirable to recommend products that are likely to be most highly rated by a user. Recommending somewhat unexpected products may improve user experience, expand user preferences, provide additional knowledge about a user.

• In order to make sure that most of recommendations are still likely to be highly rated we only allow very minor influence of the Random model (0.99 vs. 0.01 weight between UBCF and Random models).

     RMSE       MSE       MAE 
1.3008284 1.6921544 0.9147718 

Comprison of the accuracy

The accuracy has gone down. It is not as bad as with purely random model, but clearly not as good as UBCF or SVD models. However, the goal here is to influence user experience rather than make the most accurate model, so we need to employ different metrics.

Let us look at top 10 recommendations for the first user in the test set.

$A100ZQDV7L8PVV
 [1] 8871 4298 8747 9564 8868 2076 1181 9547 7390 9560
$A100ZQDV7L8PVV
 [1] 4298 8871 8747 8868 2076 9547 9560 1181 8643 9564

Now as we see, the Hybrid model includes most of the items recommended by the UBCF model, but there are new items and the order is different.

Conclusion

• In this project we have build three different recommender system algorithms and compared the accuracy of all the three different models. Similar process can be employed to compare additional models or to adjust model parameters to find the most optimal model.

Additional experiments that could be performed

• One of the approaches in measuring success of diversification may be A/B testing. Users are randomly divided into two groups and each group is offered a slightly different experience. For instance, one group may get recommendations only from the UBCF model while the other group will get recommendations from the hybrid model.

• User experience is measured in some way. The least instrusive way is to monitor user interaction. In this example of Amazon products, a click on a recommendation suggested by the random element of the model will point to the fact that the hybrid model provides valuable recommendations.

• Of course, it is possible to track other metrics - products bought, time spent on product page, amount spent, etc. The basic idea is to see if the hybrid model provides meaningful improvement to the basic model.

• It is important to have objective measures when building and optimizing data science models. Evaluation of a model that returns highly relevant, but redundant recommendations should reflect that the model may score poorly in user experience.

• One of the approaches to measure diversity is described in Novelty and Diversity in Information Retrieval Evaluation. This or similar measurement should be incorporated in projects of this type.