Apriori algorithm and collaborative filtering

MARKET BASKET ANALYSIS

#install.packages("arules")
#install.packages("recommenderlab")
#install.packages("tidyverse")
library(arules)
library(recommenderlab)
library(tidyverse)

1, 2

retail <- read.transactions("retail_transactions_2.csv", sep = ",")
summary(retail)
transactions as itemMatrix in sparse format with
 10000 rows (elements/itemsets/transactions) and
 5471 columns (items) and a density of 0.002797642 

most frequent items:
WHITE HANGING HEART T-LIGHT HOLDER           REGENCY CAKESTAND 3 TIER 
                               823                                777 
           JUMBO BAG RED RETROSPOT                      PARTY BUNTING 
                               644                                577 
     ASSORTED COLOUR BIRD ORNAMENT                            (Other) 
                               558                             149680 

element (itemset/transaction) length distribution:
sizes
   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
1660  727  492  408  396  330  290  307  281  258  279  262  227  239  262  246 
  17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
 201  197  219  194  164  148  138  128  109  110   95   90  109   86   76   66 
  33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
  56   56   59   44   41   46   57   44   33   41   39   31   31   27   29   25 
  49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
  27   24   29   19   23   27   24   16   21   19   19   15   17    7   11   13 
  65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80 
   7   16   16   13    4   10    9    6    7    6    5   10    8    1    2    4 
  81   82   83   84   85   86   87   88   89   90   91   92   93   94   95   97 
   8    4    3    5    5    6    6    5    1    2    3    7    4    2    2    3 
  98   99  101  102  103  105  107  108  109  111  113  114  116  117  119  120 
   4    1    1    2    3    1    2    1    2    1    1    1    1    3    1    1 
 121  122  125  126  127  134  135  143  146  147  158  168  178  235  249  285 
   1    2    1    1    2    1    1    1    1    1    1    1    1    1    1    1 
 320  400 
   1    1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    3.00   10.00   15.31   21.00  400.00 

includes extended item information - examples:
                      labels
1                   1 HANGER
2     10 COLOUR SPACEBOY PEN
3 12 COLOURED PARTY BALLOONS
  1. 1000 transactions.
  2. 5471 items
  3. Sparse matrix contains 5 471 000 cells and density of 0.002797642 tells us that 0.2% of the cells (15 305,9) contain non-zero value. That number then represents how many items were purchased.
  4. 400
  5. 15.31

3

itemFrequencyPlot(retail, topN = 20, horiz = T)

4

retail_rules <- apriori(retail, parameter = list(support = 0.01, 
                                                       confidence = 0.5, 
                                                       minlen = 2))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5    0.01      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 100 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[5471 item(s), 10000 transaction(s)] done [0.02s].
sorting and recoding items ... [405 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [72 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
retail_rules
set of 72 rules 
  1. 72 rules was discovered
  2. It is a smallest number of transactions needed before the data and its pattern becomes interesting, so 0.01 would mean an item would have to be bought in 1% of all transactions.
  3. Low confidence level means there might be too many unreliable results, high confidence means we’d get results that are too obvious, confidence threshold of 0.5 means that Y must appear in 50% of transactions purchased with X.

5

summary(retail_rules)
set of 72 rules

rule length distribution (lhs + rhs):sizes
 2  3 
54 18 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00    2.00    2.00    2.25    2.25    3.00 

summary of quality measures:
    support          confidence        coverage            lift       
 Min.   :0.01000   Min.   :0.5020   Min.   :0.01080   Min.   : 6.461  
 1st Qu.:0.01080   1st Qu.:0.5502   1st Qu.:0.01680   1st Qu.:14.406  
 Median :0.01200   Median :0.6226   Median :0.01970   Median :22.160  
 Mean   :0.01351   Mean   :0.6637   Mean   :0.02126   Mean   :27.032  
 3rd Qu.:0.01673   3rd Qu.:0.7307   3rd Qu.:0.02515   3rd Qu.:26.647  
 Max.   :0.02280   Max.   :1.0000   Max.   :0.03530   Max.   :92.593  
     count      
 Min.   :100.0  
 1st Qu.:108.0  
 Median :120.0  
 Mean   :135.1  
 3rd Qu.:167.2  
 Max.   :228.0  

mining info:
   data ntransactions support confidence
 retail         10000    0.01        0.5
                                                                                   call
 apriori(data = retail, parameter = list(support = 0.01, confidence = 0.5, minlen = 2))
  1. 54 items have 2 rules and 18 items have 3 rules
  2. Minimum lift value is 6.461 and maximum is 92.593.

6

inspect(retail_rules[1:72])
     lhs                                      rhs                                   support confidence coverage      lift count
[1]  {SUGAR}                               => {SET 3 RETROSPOT TEA}                  0.0108  1.0000000   0.0108 92.592593   108
[2]  {SET 3 RETROSPOT TEA}                 => {SUGAR}                                0.0108  1.0000000   0.0108 92.592593   108
[3]  {SUGAR}                               => {COFFEE}                               0.0108  1.0000000   0.0108 64.102564   108
[4]  {COFFEE}                              => {SUGAR}                                0.0108  0.6923077   0.0156 64.102564   108
[5]  {SET 3 RETROSPOT TEA}                 => {COFFEE}                               0.0108  1.0000000   0.0108 64.102564   108
[6]  {COFFEE}                              => {SET 3 RETROSPOT TEA}                  0.0108  0.6923077   0.0156 64.102564   108
[7]  {PINK HAPPY BIRTHDAY BUNTING}         => {BLUE HAPPY BIRTHDAY BUNTING}          0.0104  0.7074830   0.0147 45.940454   104
[8]  {BLUE HAPPY BIRTHDAY BUNTING}         => {PINK HAPPY BIRTHDAY BUNTING}          0.0104  0.6753247   0.0154 45.940454   104
[9]  {BAKING SET SPACEBOY DESIGN}          => {BAKING SET 9 PIECE RETROSPOT}         0.0109  0.6942675   0.0157 20.848874   109
[10] {SET OF TEA COFFEE SUGAR TINS PANTRY} => {SET OF 3 CAKE TINS PANTRY DESIGN}     0.0103  0.5852273   0.0176 11.276055   103
[11] {HAND WARMER SCOTTY DOG DESIGN}       => {HAND WARMER OWL DESIGN}               0.0106  0.6057143   0.0175 27.040816   106
[12] {JUMBO BAG PEARS}                     => {JUMBO BAG APPLES}                     0.0115  0.6318681   0.0182 22.977023   115
[13] {RED KITCHEN SCALES}                  => {IVORY KITCHEN SCALES}                 0.0113  0.5566502   0.0203 21.492288   113
[14] {JUMBO BAG WOODLAND ANIMALS}          => {JUMBO BAG RED RETROSPOT}              0.0109  0.5505051   0.0198  8.548215   109
[15] {ALARM CLOCK BAKELIKE IVORY}          => {ALARM CLOCK BAKELIKE GREEN}           0.0105  0.5526316   0.0190 18.861146   105
[16] {ALARM CLOCK BAKELIKE IVORY}          => {ALARM CLOCK BAKELIKE RED}             0.0130  0.6842105   0.0190 19.832189   130
[17] {ROUND SNACK BOXES SET OF 4 FRUITS}   => {ROUND SNACK BOXES SET OF4 WOODLAND}   0.0102  0.5454545   0.0187 22.083180   102
[18] {WOODEN STAR CHRISTMAS SCANDINAVIAN}  => {WOODEN HEART CHRISTMAS SCANDINAVIAN}  0.0139  0.7679558   0.0181 40.207110   139
[19] {WOODEN HEART CHRISTMAS SCANDINAVIAN} => {WOODEN STAR CHRISTMAS SCANDINAVIAN}   0.0139  0.7277487   0.0191 40.207110   139
[20] {HOT WATER BOTTLE TEA AND SYMPATHY}   => {CHOCOLATE HOT WATER BOTTLE}           0.0104  0.5073171   0.0205 18.053988   104
[21] {HAND WARMER BIRD DESIGN}             => {HAND WARMER OWL DESIGN}               0.0100  0.5494505   0.0182 24.529042   100
[22] {STRAWBERRY CHARLOTTE BAG}            => {RED RETROSPOT CHARLOTTE BAG}          0.0103  0.5988372   0.0172 20.438130   103
[23] {HOT WATER BOTTLE I AM SO POORLY}     => {CHOCOLATE HOT WATER BOTTLE}           0.0117  0.5879397   0.0199 20.923121   117
[24] {LARGE WHITE HEART OF WICKER}         => {SMALL WHITE HEART OF WICKER}          0.0110  0.5238095   0.0210 23.280423   110
[25] {JUMBO BAG SPACEBOY DESIGN}           => {JUMBO BAG RED RETROSPOT}              0.0105  0.5440415   0.0193  8.447849   105
[26] {JUMBO BAG PINK VINTAGE PAISLEY}      => {JUMBO BAG RED RETROSPOT}              0.0125  0.5122951   0.0244  7.954893   125
[27] {PINK REGENCY TEACUP AND SAUCER}      => {GREEN REGENCY TEACUP AND SAUCER}      0.0197  0.7848606   0.0251 26.515559   197
[28] {GREEN REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0197  0.6655405   0.0296 26.515559   197
[29] {PINK REGENCY TEACUP AND SAUCER}      => {ROSES REGENCY TEACUP AND SAUCER}      0.0192  0.7649402   0.0251 22.236635   192
[30] {ROSES REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0192  0.5581395   0.0344 22.236635   192
[31] {PINK REGENCY TEACUP AND SAUCER}      => {REGENCY CAKESTAND 3 TIER}             0.0126  0.5019920   0.0251  6.460644   126
[32] {CHARLOTTE BAG PINK POLKADOT}         => {RED RETROSPOT CHARLOTTE BAG}          0.0114  0.5876289   0.0194 20.055593   114
[33] {LUNCH BAG VINTAGE LEAF DESIGN}       => {LUNCH BAG APPLE DESIGN}               0.0121  0.5377778   0.0225 16.247063   121
[34] {ALARM CLOCK BAKELIKE PINK}           => {ALARM CLOCK BAKELIKE GREEN}           0.0123  0.5082645   0.0242 17.346910   123
[35] {ALARM CLOCK BAKELIKE PINK}           => {ALARM CLOCK BAKELIKE RED}             0.0148  0.6115702   0.0242 17.726674   148
[36] {JUMBO  BAG BAROQUE BLACK WHITE}      => {JUMBO BAG RED RETROSPOT}              0.0141  0.5529412   0.0255  8.586043   141
[37] {JUMBO BAG STRAWBERRY}                => {JUMBO BAG RED RETROSPOT}              0.0171  0.6263736   0.0273  9.726299   171
[38] {DOLLY GIRL LUNCH BOX}                => {SPACEBOY LUNCH BOX}                   0.0140  0.6635071   0.0211 26.225577   140
[39] {SPACEBOY LUNCH BOX}                  => {DOLLY GIRL LUNCH BOX}                 0.0140  0.5533597   0.0253 26.225577   140
[40] {LUNCH BAG DOLLY GIRL DESIGN}         => {LUNCH BAG SPACEBOY DESIGN}            0.0122  0.5754717   0.0212 16.029852   122
[41] {RED HANGING HEART T-LIGHT HOLDER}    => {WHITE HANGING HEART T-LIGHT HOLDER}   0.0162  0.6303502   0.0257  7.659176   162
[42] {GARDENERS KNEELING PAD CUP OF TEA}   => {GARDENERS KNEELING PAD KEEP CALM}     0.0167  0.7260870   0.0230 25.931677   167
[43] {GARDENERS KNEELING PAD KEEP CALM}    => {GARDENERS KNEELING PAD CUP OF TEA}    0.0167  0.5964286   0.0280 25.931677   167
[44] {WOODEN FRAME ANTIQUE WHITE}          => {WOODEN PICTURE FRAME WHITE FINISH}    0.0175  0.5520505   0.0317 16.236779   175
[45] {WOODEN PICTURE FRAME WHITE FINISH}   => {WOODEN FRAME ANTIQUE WHITE}           0.0175  0.5147059   0.0340 16.236779   175
[46] {GREEN REGENCY TEACUP AND SAUCER}     => {ROSES REGENCY TEACUP AND SAUCER}      0.0228  0.7702703   0.0296 22.391578   228
[47] {ROSES REGENCY TEACUP AND SAUCER}     => {GREEN REGENCY TEACUP AND SAUCER}      0.0228  0.6627907   0.0344 22.391578   228
[48] {ALARM CLOCK BAKELIKE GREEN}          => {ALARM CLOCK BAKELIKE RED}             0.0184  0.6279863   0.0293 18.202503   184
[49] {ALARM CLOCK BAKELIKE RED}            => {ALARM CLOCK BAKELIKE GREEN}           0.0184  0.5333333   0.0345 18.202503   184
[50] {JUMBO STORAGE BAG SUKI}              => {JUMBO BAG RED RETROSPOT}              0.0168  0.5472313   0.0307  8.497380   168
[51] {JUMBO BAG PINK POLKADOT}             => {JUMBO BAG RED RETROSPOT}              0.0211  0.6187683   0.0341  9.608204   211
[52] {LUNCH BAG WOODLAND}                  => {LUNCH BAG RED RETROSPOT}              0.0155  0.5115512   0.0303  9.894606   155
[53] {LUNCH BAG SUKI DESIGN}               => {LUNCH BAG RED RETROSPOT}              0.0172  0.5043988   0.0341  9.756264   172
[54] {LUNCH BAG PINK POLKADOT}             => {LUNCH BAG RED RETROSPOT}              0.0195  0.5524079   0.0353 10.684873   195
[55] {SET 3 RETROSPOT TEA,                                                                                                     
      SUGAR}                               => {COFFEE}                               0.0108  1.0000000   0.0108 64.102564   108
[56] {COFFEE,                                                                                                                  
      SUGAR}                               => {SET 3 RETROSPOT TEA}                  0.0108  1.0000000   0.0108 92.592593   108
[57] {COFFEE,                                                                                                                  
      SET 3 RETROSPOT TEA}                 => {SUGAR}                                0.0108  1.0000000   0.0108 92.592593   108
[58] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      PINK REGENCY TEACUP AND SAUCER}      => {ROSES REGENCY TEACUP AND SAUCER}      0.0170  0.8629442   0.0197 25.085586   170
[59] {PINK REGENCY TEACUP AND SAUCER,                                                                                          
      ROSES REGENCY TEACUP AND SAUCER}     => {GREEN REGENCY TEACUP AND SAUCER}      0.0170  0.8854167   0.0192 29.912725   170
[60] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      ROSES REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0170  0.7456140   0.0228 29.705738   170
[61] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      PINK REGENCY TEACUP AND SAUCER}      => {REGENCY CAKESTAND 3 TIER}             0.0108  0.5482234   0.0197  7.055642   108
[62] {PINK REGENCY TEACUP AND SAUCER,                                                                                          
      REGENCY CAKESTAND 3 TIER}            => {GREEN REGENCY TEACUP AND SAUCER}      0.0108  0.8571429   0.0126 28.957529   108
[63] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      REGENCY CAKESTAND 3 TIER}            => {PINK REGENCY TEACUP AND SAUCER}       0.0108  0.7397260   0.0146 29.471156   108
[64] {PINK REGENCY TEACUP AND SAUCER,                                                                                          
      ROSES REGENCY TEACUP AND SAUCER}     => {REGENCY CAKESTAND 3 TIER}             0.0107  0.5572917   0.0192  7.172351   107
[65] {PINK REGENCY TEACUP AND SAUCER,                                                                                          
      REGENCY CAKESTAND 3 TIER}            => {ROSES REGENCY TEACUP AND SAUCER}      0.0107  0.8492063   0.0126 24.686231   107
[66] {REGENCY CAKESTAND 3 TIER,                                                                                                
      ROSES REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0107  0.6369048   0.0168 25.374692   107
[67] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      ROSES REGENCY TEACUP AND SAUCER}     => {REGENCY CAKESTAND 3 TIER}             0.0120  0.5263158   0.0228  6.773691   120
[68] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      REGENCY CAKESTAND 3 TIER}            => {ROSES REGENCY TEACUP AND SAUCER}      0.0120  0.8219178   0.0146 23.892960   120
[69] {REGENCY CAKESTAND 3 TIER,                                                                                                
      ROSES REGENCY TEACUP AND SAUCER}     => {GREEN REGENCY TEACUP AND SAUCER}      0.0120  0.7142857   0.0168 24.131274   120
[70] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG PINK POLKADOT}             => {LUNCH BAG RED RETROSPOT}              0.0101  0.6601307   0.0153 12.768486   101
[71] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG  BLACK SKULL}               0.0101  0.5179487   0.0195 12.916427   101
[72] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG PINK POLKADOT}              0.0101  0.5260417   0.0192 14.902030   101
  1. i. If a customer buys sugar, they will also buy Retrospot tea.
  1. support = 0.0108, which means this rule covers 1% of transactions. Confidence is 1, which means it is correct in 100% purchases involving sugar.
  2. Lift is 92.59259, which means that transaction containing sugar makes 92 times more possible for tea to be bought as well.
  1. Coffee, tea and sugar being interconnected feel as trivial associations as they are known combination often bought together.

  2. Different colored pink x blue hand warmers, alarm clocks and other clothes and accessories are often bought together revealing an opportunity to implement bundle deals (girl x boy sets).

7

teacup_rules <- subset(retail_rules, items %in% "GREEN REGENCY TEACUP AND SAUCER")
inspect(teacup_rules)
     lhs                                   rhs                               support confidence coverage      lift count
[1]  {PINK REGENCY TEACUP AND SAUCER}   => {GREEN REGENCY TEACUP AND SAUCER}  0.0197  0.7848606   0.0251 26.515559   197
[2]  {GREEN REGENCY TEACUP AND SAUCER}  => {PINK REGENCY TEACUP AND SAUCER}   0.0197  0.6655405   0.0296 26.515559   197
[3]  {GREEN REGENCY TEACUP AND SAUCER}  => {ROSES REGENCY TEACUP AND SAUCER}  0.0228  0.7702703   0.0296 22.391578   228
[4]  {ROSES REGENCY TEACUP AND SAUCER}  => {GREEN REGENCY TEACUP AND SAUCER}  0.0228  0.6627907   0.0344 22.391578   228
[5]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      PINK REGENCY TEACUP AND SAUCER}   => {ROSES REGENCY TEACUP AND SAUCER}  0.0170  0.8629442   0.0197 25.085586   170
[6]  {PINK REGENCY TEACUP AND SAUCER,                                                                                   
      ROSES REGENCY TEACUP AND SAUCER}  => {GREEN REGENCY TEACUP AND SAUCER}  0.0170  0.8854167   0.0192 29.912725   170
[7]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      ROSES REGENCY TEACUP AND SAUCER}  => {PINK REGENCY TEACUP AND SAUCER}   0.0170  0.7456140   0.0228 29.705738   170
[8]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      PINK REGENCY TEACUP AND SAUCER}   => {REGENCY CAKESTAND 3 TIER}         0.0108  0.5482234   0.0197  7.055642   108
[9]  {PINK REGENCY TEACUP AND SAUCER,                                                                                   
      REGENCY CAKESTAND 3 TIER}         => {GREEN REGENCY TEACUP AND SAUCER}  0.0108  0.8571429   0.0126 28.957529   108
[10] {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      REGENCY CAKESTAND 3 TIER}         => {PINK REGENCY TEACUP AND SAUCER}   0.0108  0.7397260   0.0146 29.471156   108
[11] {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      ROSES REGENCY TEACUP AND SAUCER}  => {REGENCY CAKESTAND 3 TIER}         0.0120  0.5263158   0.0228  6.773691   120
[12] {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      REGENCY CAKESTAND 3 TIER}         => {ROSES REGENCY TEACUP AND SAUCER}  0.0120  0.8219178   0.0146 23.892960   120
[13] {REGENCY CAKESTAND 3 TIER,                                                                                         
      ROSES REGENCY TEACUP AND SAUCER}  => {GREEN REGENCY TEACUP AND SAUCER}  0.0120  0.7142857   0.0168 24.131274   120

If a customer purchases Green regency teacup and saucer, they are more likely to purchase the pink and rose version and regency cake stand 3 tier.

COLLABORATIVE FILTERING

1,2.

steam_ratings <- read_csv("steam_ratings.csv")
steam_ratings <- as(steam_ratings, "matrix")
steam_ratings <- as(steam_ratings, "realRatingMatrix")
steam_ratings
2080 x 1581 rating matrix of class 'realRatingMatrix' with 52414 ratings.

3

a.

vector_ratings <- as.vector(steam_ratings@data)
table(vector_ratings)
vector_ratings
      0       1       2       3       4       5 
3236066    4773   12500   19762   10655    4724 

In total, there is 52414 ratings. 3 236 066 of 0 value are missing ratings. 4 773 rated 1, 12 500 rated 2, 19 762, the most common, rated 3, 10 655 rated 4 and 4 724 rated 5.

colMeans(steam_ratings) %>%
    tibble::enframe(name = "steam", value = "steam_ratings") %>%
    ggplot() +
            geom_histogram(mapping = aes(x = steam_ratings), color = "white")

hist(as.vector(as.matrix(rowCounts(steam_ratings))), main = "Distribution of Steam Ratings",
     col = "lightblue", xlab = "Ratings")

4

a) b) i, ii, iii, iv)

set.seed(101)
eval_steam = evaluationScheme(data = steam_ratings, 
                                 method = "split", 
                                 train = 0.8,       
                                 given = 6,       
                                 goodRating = 3) 
train_steam <- getData(eval_steam, "train")
known_steam <- getData(eval_steam, "known")
unknown_steam <- getData(eval_steam, "unknown")

5

ubcf1_model <- Recommender(data = train_steam,
                      method = "UBCF", 
                      parameter = list(normalize = "center", method = "Cosine"))

ubcf1_predict <- predict(object = ubcf1_model,
                          newdata = known_steam, 
                          type = "ratings")

ubcf1_eval <- calcPredictionAccuracy(x = ubcf1_predict,
                                      data = unknown_steam)
ubcf1_eval
     RMSE       MSE       MAE 
1.1691627 1.3669415 0.9175286 

The predicted ratings from the UBCF model are off by approximately 0.92 of a rating.

ubcf2_model <- Recommender(data = train_steam,
                      method = "UBCF", 
                      parameter = list(normalize = NULL, method = "Cosine"))

ubcf2_predict <- predict(object = ubcf2_model,
                          newdata = known_steam, 
                          type = "ratings")

ubcf2_eval <- calcPredictionAccuracy(x = ubcf2_predict,
                                      data = unknown_steam)
ubcf2_eval
     RMSE       MSE       MAE 
1.0793268 1.1649463 0.8189319 

The predicted ratings from the UBCF model are off by approximately 0.82 of a rating.

ubcf3_model <- Recommender(data = train_steam,
                      method = "UBCF", 
                      parameter = list(normalize = "Z-score", method = "Cosine"))

ubcf3_predict <- predict(object = ubcf3_model,
                          newdata = known_steam, 
                          type = "ratings")

ubcf3_eval <- calcPredictionAccuracy(x = ubcf3_predict,
                                      data = unknown_steam)
ubcf3_eval
     RMSE       MSE       MAE 
1.1843251 1.4026258 0.9230579 

The predicted ratings from the UBCF model are off by approximately 0.92 of a rating.

ubcf4_model <- Recommender(data = train_steam,
                      method = "UBCF", 
                      parameter = list(normalize = "center", method = "Euclidean"))

ubcf4_predict <- predict(object = ubcf4_model,
                          newdata = known_steam, 
                          type = "ratings")

ubcf4_eval <- calcPredictionAccuracy(x = ubcf4_predict,
                                      data = unknown_steam)
ubcf4_eval
     RMSE       MSE       MAE 
1.1892017 1.4142006 0.9145427 

The predicted ratings from the UBCF model are off by approximately 0.91 of a rating.

ubcf5_model <- Recommender(data = train_steam,
                      method = "UBCF", 
                      parameter = list(normalize = NULL, method = "Euclidean"))

ubcf5_predict <- predict(object = ubcf5_model,
                          newdata = known_steam, 
                          type = "ratings")

ubcf5_eval <- calcPredictionAccuracy(x = ubcf5_predict,
                                      data = unknown_steam)
ubcf5_eval
     RMSE       MSE       MAE 
1.0990975 1.2080152 0.8294308 

The predicted ratings from the UBCF model are off by approximately 0.83 of a rating.

ubcf6_model <- Recommender(data = train_steam,
                      method = "UBCF", 
                      parameter = list(normalize = "Z-score", method = "Euclidean"))

ubcf6_predict <- predict(object = ubcf6_model,
                          newdata = known_steam, 
                          type = "ratings")

ubcf6_eval <- calcPredictionAccuracy(x = ubcf6_predict,
                                      data = unknown_steam)
ubcf6_eval
     RMSE       MSE       MAE 
1.2103043 1.4648366 0.9309755 

The predicted ratings from the UBCF model are off by approximately 0.93 of a rating.

ubcf7_model <- Recommender(data = train_steam,
                      method = "UBCF", 
                      parameter = list(normalize = "center", method = "pearson"))

ubcf7_predict <- predict(object = ubcf7_model,
                          newdata = known_steam, 
                          type = "ratings")

ubcf7_eval <- calcPredictionAccuracy(x = ubcf7_predict,
                                      data = unknown_steam)
ubcf7_eval
     RMSE       MSE       MAE 
1.1209660 1.2565649 0.8720308 

The predicted ratings from the UBCF model are off by approximately 0.87 of a rating.

ubcf8_model <- Recommender(data = train_steam,
                      method = "UBCF", 
                      parameter = list(normalize = NULL, method = "pearson"))

ubcf8_predict <- predict(object = ubcf8_model,
                          newdata = known_steam, 
                          type = "ratings")

ubcf8_eval <- calcPredictionAccuracy(x = ubcf8_predict,
                                      data = unknown_steam)
ubcf8_eval
     RMSE       MSE       MAE 
1.0949035 1.1988137 0.8284463 

The predicted ratings from the UBCF model are off by approximately 0.83 of a rating.

ubcf9_model <- Recommender(data = train_steam,
                      method = "UBCF", 
                      parameter = list(normalize = "Z-score", method = "pearson"))

ubcf9_predict <- predict(object = ubcf9_model,
                          newdata = known_steam, 
                          type = "ratings")

ubcf9_eval <- calcPredictionAccuracy(x = ubcf9_predict,
                                      data = unknown_steam)
ubcf9_eval
     RMSE       MSE       MAE 
1.1308570 1.2788376 0.8754739 

The predicted ratings from the UBCF model are off by approximately 0.88 of a rating. 6.

ibcf1_model <- Recommender(data = train_steam,
                      method = "IBCF", 
                      parameter = list(normalize = "center", method = "Cosine"))

ibcf1_predict <- predict(object = ibcf1_model,
                          newdata = known_steam, 
                          type = "ratings")
ibcf1_eval <- calcPredictionAccuracy(x = ibcf1_predict,
                                      data = unknown_steam)
ibcf1_eval
    RMSE      MSE      MAE 
1.500975 2.252927 1.165031 

The predicted ratings from the IBCF model are off by approximately 1.17 of a rating.

ibcf2_model <- Recommender(data = train_steam,
                      method = "IBCF", 
                      parameter = list(normalize = "Z-score", method = "Cosine"))

ibcf2_predict <- predict(object = ibcf2_model,
                          newdata = known_steam, 
                          type = "ratings")
ibcf2_eval <- calcPredictionAccuracy(x = ibcf2_predict,
                                      data = unknown_steam)
ibcf2_eval
    RMSE      MSE      MAE 
1.500865 2.252596 1.166651 

The predicted ratings from the IBCF model are off by approximately 1.17 of a rating.

ibcf3_model <- Recommender(data = train_steam,
                      method = "IBCF", 
                      parameter = list(normalize = NULL, method = "Cosine"))

ibcf3_predict <- predict(object = ibcf3_model,
                          newdata = known_steam, 
                          type = "ratings")
ibcf3_eval <- calcPredictionAccuracy(x = ibcf3_predict,
                                      data = unknown_steam)
ibcf3_eval
    RMSE      MSE      MAE 
1.587257 2.519385 1.239649 

The predicted ratings from the IBCF model are off by approximately 1.24 of a rating.

ibcf4_model <- Recommender(data = train_steam,
                      method = "IBCF", 
                      parameter = list(normalize = "center", method = "Euclidean"))

ibcf4_predict <- predict(object = ibcf4_model,
                          newdata = known_steam, 
                          type = "ratings")
ibcf4_eval <- calcPredictionAccuracy(x = ibcf4_predict,
                                      data = unknown_steam)
ibcf4_eval
    RMSE      MSE      MAE 
1.476175 2.179092 1.140654 

The predicted ratings from the IBCF model are off by approximately 1.14 of a rating.

ibcf5_model <- Recommender(data = train_steam,
                      method = "IBCF", 
                      parameter = list(normalize = "Z-score", method = "Euclidean"))

ibcf5_predict <- predict(object = ibcf5_model,
                          newdata = known_steam, 
                          type = "ratings")
ibcf5_eval <- calcPredictionAccuracy(x = ibcf5_predict,
                                      data = unknown_steam)
ibcf5_eval
    RMSE      MSE      MAE 
1.474962 2.175512 1.140897 

The predicted ratings from the IBCF model are off by approximately 1.14 of a rating.

ibcf6_model <- Recommender(data = train_steam,
                      method = "IBCF", 
                      parameter = list(normalize = NULL, method = "Euclidean"))

ibcf6_predict <- predict(object = ibcf6_model,
                          newdata = known_steam, 
                          type = "ratings")
ibcf6_eval <- calcPredictionAccuracy(x = ibcf6_predict,
                                      data = unknown_steam)
ibcf6_eval
    RMSE      MSE      MAE 
1.476175 2.179092 1.140654 

The predicted ratings from the IBCF model are off by approximately 1.14 of a rating.

ibcf7_model <- Recommender(data = train_steam,
                      method = "IBCF", 
                      parameter = list(normalize = "center", method = "pearson"))

ibcf7_predict <- predict(object = ibcf7_model,
                          newdata = known_steam, 
                          type = "ratings")
ibcf7_eval <- calcPredictionAccuracy(x = ibcf7_predict,
                                      data = unknown_steam)
ibcf7_eval
    RMSE      MSE      MAE 
1.473200 2.170317 1.162027 

The predicted ratings from the IBCF model are off by approximately 1.62 of a rating.

ibcf8_model <- Recommender(data = train_steam,
                      method = "IBCF", 
                      parameter = list(normalize = "Z-score", method = "pearson"))

ibcf8_predict <- predict(object = ibcf8_model,
                          newdata = known_steam, 
                          type = "ratings")
ibcf8_eval <- calcPredictionAccuracy(x = ibcf8_predict,
                                      data = unknown_steam)
ibcf8_eval
    RMSE      MSE      MAE 
1.473052 2.169883 1.158043 

The predicted ratings from the IBCF model are off by approximately 1.16 of a rating.

ibcf9_model <- Recommender(data = train_steam,
                      method = "IBCF", 
                      parameter = list(normalize = NULL, method = "pearson"))

ibcf9_predict <- predict(object = ibcf9_model,
                          newdata = known_steam, 
                          type = "ratings")
ibcf9_eval <- calcPredictionAccuracy(x = ibcf9_predict,
                                      data = unknown_steam)
ibcf9_eval
    RMSE      MSE      MAE 
1.465543 2.147817 1.154197 

The predicted ratings from the IBCF model are off by approximately 1.15 of a rating.

7

The best model to generate recommendations from is UBCF model number 2, Cosine method, NULL normalisation with the lowest MAE of 0.82.

ubcf2_recs <- predict(object = ubcf2_model,
                       newdata = known_steam,
                       type = "topNList",
                       n = 3)
head(as(ubcf2_recs, "list"), 5)
$`0`
[1] "Bridge Constructor"          "Car Mechanic Simulator 2014"
[3] "Democracy 3"                

$`1`
[1] "8BitMMO"                        "Airline Tycoon 2"              
[3] "Alan Wake's American Nightmare"

$`2`
[1] "Cogs"              "FINAL FANTASY VII" "Frozen Hearth"    

$`3`
[1] "12 Labours of Hercules"                   
[2] "12 Labours of Hercules II The Cretan Bull"
[3] "Age of Empires Online"                    

$`4`
[1] "Airline Tycoon 2"    "BattleBlock Theater" "Bridge Constructor" 

User 1 top 3 recommendations are “Bridge Constructor”, “Car Mechanic Simulator 2014”, “Democracy 3”.
User 2 top 3 recommendations are “8BitMMO”, “Airline Tycoon 2, Alan Wake’s American Nightmare”. User 3 top 3 recommendations are “Cogs”, “FINAL FANTASY VII”, “Frozen Hearth”. User 4 top 3 recommendations are “12 Labours of Hercules”, “12 Labours of Hercules II The Cretan Bull”, “Age of Empires Online”. User 5 top 3 recommendations are “Airline Tycoon 2”, “BattleBlock Theater”, “Bridge Constructor”.

8

Lot of rating is empty, so encouraging people to rate could always bring more accurate results. Collaborative filtering model shows that Steam can use past user rating behaviour to predict which games a user is most likely to enjoy next. The best-performing model was User-Based Collaborative Filtering (UBCF) model 2 using Cosine similarity with no normalisation with lowest MAE of 0.82, which means that on average, its predicted ratings were off by about 0.82 rating points.

Model will be useful for cross-selling and upselling purposes as well as user retention.