Recommendation Engines, Assignment 3

library(arules)
library(recommenderlab)
library(tidyverse)

Question 1

1

retail <- read.transactions("retail_transactions_3.csv", sep = ",")

2

summary(retail)
transactions as itemMatrix in sparse format with
 10000 rows (elements/itemsets/transactions) and
 5479 columns (items) and a density of 0.002744552 

most frequent items:
WHITE HANGING HEART T-LIGHT HOLDER           REGENCY CAKESTAND 3 TIER 
                               822                                776 
           JUMBO BAG RED RETROSPOT                      PARTY BUNTING 
                               663                                561 
     ASSORTED COLOUR BIRD ORNAMENT                            (Other) 
                               544                             147008 

element (itemset/transaction) length distribution:
sizes
   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
1658  707  498  413  365  341  316  310  309  290  261  227  227  242  260  227 
  17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
 204  199  233  189  184  149  138  125  104  111  112   98  113   99   78   68 
  33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
  55   62   65   44   48   41   52   41   44   26   45   27   27   35   30   24 
  49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
  26   23   22   21   16   19   21   13   11   16   13   14   15   10   13   13 
  65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80 
   3   13   13    7    6    9    9    6    6    4    3    8    5    6    3    5 
  81   82   83   84   85   86   87   88   89   90   91   92   93   94   95   96 
   3    4    5    8    3    5    8    2    4    4    1    3    2    3    1    2 
  97   98  100  101  102  103  104  105  107  108  109  110  111  112  113  119 
   5    1    2    2    2    1    2    1    1    3    1    2    1    1    2    1 
 120  121  122  123  125  127  134  142  146  147  150  154  157  171  193  204 
   1    1    1    1    1    1    1    2    1    1    1    2    1    2    1    1 
 235  249 
   1    1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    3.00   10.00   15.04   21.00  249.00 

includes extended item information - examples:
                      labels
1                   1 HANGER
2     10 COLOUR SPACEBOY PEN
3 12 COLOURED PARTY BALLOONS

(A)

10,000

(B)

5,479

(c)

10,000 x 5,479

= 54,790,000

((i))

54,790,000 x 0.002744552

Zero values = 1,503,740

(D)

249

(E)

15

3

itemFrequencyPlot(retail, topN = 20, horiz = T)

4

retail_rules <- apriori(retail, parameter = list(support = 0.01, 
                                                       confidence = 0.5, 
                                                       minlen = 2))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5    0.01      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 100 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[5479 item(s), 10000 transaction(s)] done [0.06s].
sorting and recoding items ... [384 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [86 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
summary(retail_rules)
set of 86 rules

rule length distribution (lhs + rhs):sizes
 2  3 
47 39 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.000   2.000   2.000   2.453   3.000   3.000 

summary of quality measures:
    support          confidence        coverage            lift       
 Min.   :0.01000   Min.   :0.5021   Min.   :0.01200   Min.   : 6.585  
 1st Qu.:0.01063   1st Qu.:0.5510   1st Qu.:0.01733   1st Qu.:12.347  
 Median :0.01160   Median :0.6022   Median :0.02000   Median :15.266  
 Mean   :0.01315   Mean   :0.6198   Mean   :0.02152   Mean   :17.144  
 3rd Qu.:0.01515   3rd Qu.:0.6652   3rd Qu.:0.02428   3rd Qu.:21.864  
 Max.   :0.02270   Max.   :0.8814   Max.   :0.03770   Max.   :44.055  
     count      
 Min.   :100.0  
 1st Qu.:106.2  
 Median :116.0  
 Mean   :131.5  
 3rd Qu.:151.5  
 Max.   :227.0  

mining info:
   data ntransactions support confidence
 retail         10000    0.01        0.5
                                                                                   call
 apriori(data = retail, parameter = list(support = 0.01, confidence = 0.5, minlen = 2))

(A)

86

(B)

A support threshold of 0.01 means that a variable or item within a data set must appear in at least 1% of all transactions to be considered significant in the analysis.

(C)

A confidence threshold of 0.5 means that a rule must be true in 50% of the cases when “if” condition is applied.

5

(A)

Rules 47 have 2 items. Rules 39 have 3 items.

(B)

Minimum = 6.585 Maximum = 44.055

6

inspect(sort(retail_rules, by = "lift"))
     lhs                                      rhs                                   support confidence coverage      lift count
[1]  {WOODEN STAR CHRISTMAS SCANDINAVIAN}  => {WOODEN HEART CHRISTMAS SCANDINAVIAN}  0.0113  0.7533333   0.0150 44.054581   113
[2]  {WOODEN HEART CHRISTMAS SCANDINAVIAN} => {WOODEN STAR CHRISTMAS SCANDINAVIAN}   0.0113  0.6608187   0.0171 44.054581   113
[3]  {PINK REGENCY TEACUP AND SAUCER,                                                                                          
      ROSES REGENCY TEACUP AND SAUCER}     => {GREEN REGENCY TEACUP AND SAUCER}      0.0156  0.8813559   0.0177 28.708662   156
[4]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      REGENCY CAKESTAND 3 TIER}            => {PINK REGENCY TEACUP AND SAUCER}       0.0102  0.6938776   0.0147 28.672626   102
[5]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      ROSES REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0156  0.6872247   0.0227 28.397714   156
[6]  {PINK REGENCY TEACUP AND SAUCER,                                                                                          
      REGENCY CAKESTAND 3 TIER}            => {GREEN REGENCY TEACUP AND SAUCER}      0.0102  0.8500000   0.0120 27.687296   102
[7]  {GARDENERS KNEELING PAD KEEP CALM}    => {GARDENERS KNEELING PAD CUP OF TEA}    0.0163  0.5821429   0.0280 26.105061   163
[8]  {GARDENERS KNEELING PAD CUP OF TEA}   => {GARDENERS KNEELING PAD KEEP CALM}     0.0163  0.7309417   0.0223 26.105061   163
[9]  {PINK REGENCY TEACUP AND SAUCER}      => {GREEN REGENCY TEACUP AND SAUCER}      0.0192  0.7933884   0.0242 25.843271   192
[10] {GREEN REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0192  0.6254072   0.0307 25.843271   192
[11] {DOLLY GIRL LUNCH BOX}                => {SPACEBOY LUNCH BOX}                   0.0138  0.6359447   0.0217 25.642931   138
[12] {SPACEBOY LUNCH BOX}                  => {DOLLY GIRL LUNCH BOX}                 0.0138  0.5564516   0.0248 25.642931   138
[13] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      PINK REGENCY TEACUP AND SAUCER}      => {ROSES REGENCY TEACUP AND SAUCER}      0.0156  0.8125000   0.0192 24.399399   156
[14] {REGENCY CAKESTAND 3 TIER,                                                                                                
      ROSES REGENCY TEACUP AND SAUCER}     => {GREEN REGENCY TEACUP AND SAUCER}      0.0116  0.7483871   0.0155 24.377430   116
[15] {JUMBO BAG PEARS}                     => {JUMBO BAG APPLES}                     0.0111  0.6529412   0.0170 24.272906   111
[16] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      REGENCY CAKESTAND 3 TIER}            => {ROSES REGENCY TEACUP AND SAUCER}      0.0116  0.7891156   0.0147 23.697167   116
[17] {GREEN REGENCY TEACUP AND SAUCER}     => {ROSES REGENCY TEACUP AND SAUCER}      0.0227  0.7394137   0.0307 22.204615   227
[18] {ROSES REGENCY TEACUP AND SAUCER}     => {GREEN REGENCY TEACUP AND SAUCER}      0.0227  0.6816817   0.0333 22.204615   227
[19] {ROUND SNACK BOXES SET OF 4 FRUITS}   => {ROUND SNACK BOXES SET OF4 WOODLAND}   0.0100  0.5524862   0.0181 22.188200   100
[20] {PINK REGENCY TEACUP AND SAUCER}      => {ROSES REGENCY TEACUP AND SAUCER}      0.0177  0.7314050   0.0242 21.964113   177
[21] {ROSES REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0177  0.5315315   0.0333 21.964113   177
[22] {ALARM CLOCK BAKELIKE GREEN,                                                                                              
      ALARM CLOCK BAKELIKE PINK}           => {ALARM CLOCK BAKELIKE RED}             0.0100  0.7575758   0.0132 21.895253   100
[23] {LARGE WHITE HEART OF WICKER}         => {SMALL WHITE HEART OF WICKER}          0.0113  0.5159817   0.0219 21.771381   113
[24] {ALARM CLOCK BAKELIKE PINK,                                                                                               
      ALARM CLOCK BAKELIKE RED}            => {ALARM CLOCK BAKELIKE GREEN}           0.0100  0.6666667   0.0150 21.299255   100
[25] {CHARLOTTE BAG PINK POLKADOT}         => {RED RETROSPOT CHARLOTTE BAG}          0.0127  0.6256158   0.0203 20.923604   127
[26] {HOT WATER BOTTLE I AM SO POORLY}     => {CHOCOLATE HOT WATER BOTTLE}           0.0102  0.5454545   0.0187 20.661157   102
[27] {STRAWBERRY CHARLOTTE BAG}            => {RED RETROSPOT CHARLOTTE BAG}          0.0117  0.6157895   0.0190 20.594966   117
[28] {ALARM CLOCK BAKELIKE GREEN,                                                                                              
      ALARM CLOCK BAKELIKE RED}            => {ALARM CLOCK BAKELIKE PINK}            0.0100  0.5025126   0.0199 20.344638   100
[29] {ALARM CLOCK BAKELIKE IVORY}          => {ALARM CLOCK BAKELIKE RED}             0.0120  0.6896552   0.0174 19.932230   120
[30] {BAKING SET SPACEBOY DESIGN}          => {BAKING SET 9 PIECE RETROSPOT}         0.0115  0.6388889   0.0180 19.597819   115
[31] {ALARM CLOCK BAKELIKE GREEN}          => {ALARM CLOCK BAKELIKE RED}             0.0199  0.6357827   0.0313 18.375224   199
[32] {ALARM CLOCK BAKELIKE RED}            => {ALARM CLOCK BAKELIKE GREEN}           0.0199  0.5751445   0.0346 18.375224   199
[33] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG WOODLAND}                   0.0101  0.5343915   0.0189 17.578669   101
[34] {ALARM CLOCK BAKELIKE PINK}           => {ALARM CLOCK BAKELIKE RED}             0.0150  0.6072874   0.0247 17.551660   150
[35] {CHARLOTTE BAG SUKI DESIGN}           => {RED RETROSPOT CHARLOTTE BAG}          0.0114  0.5181818   0.0220 17.330496   114
[36] {WOODLAND CHARLOTTE BAG}              => {RED RETROSPOT CHARLOTTE BAG}          0.0111  0.5115207   0.0217 17.107717   111
[37] {ALARM CLOCK BAKELIKE PINK}           => {ALARM CLOCK BAKELIKE GREEN}           0.0132  0.5344130   0.0247 17.073896   132
[38] {WOODEN PICTURE FRAME WHITE FINISH}   => {WOODEN FRAME ANTIQUE WHITE}           0.0197  0.5487465   0.0359 16.283279   197
[39] {WOODEN FRAME ANTIQUE WHITE}          => {WOODEN PICTURE FRAME WHITE FINISH}    0.0197  0.5845697   0.0337 16.283279   197
[40] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG PINK POLKADOT}              0.0116  0.6137566   0.0189 16.280016   116
[41] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG CARS BLUE}                 => {LUNCH BAG PINK POLKADOT}              0.0108  0.6000000   0.0180 15.915119   108
[42] {LUNCH BAG RED RETROSPOT,                                                                                                 
      LUNCH BAG SUKI DESIGN}               => {LUNCH BAG PINK POLKADOT}              0.0103  0.5953757   0.0173 15.792459   103
[43] {LUNCH BAG RED RETROSPOT,                                                                                                 
      LUNCH BAG WOODLAND}                  => {LUNCH BAG CARS BLUE}                  0.0101  0.5906433   0.0171 15.301639   101
[44] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG PINK POLKADOT}              0.0120  0.5741627   0.0209 15.229779   120
[45] {LUNCH BAG DOLLY GIRL DESIGN}         => {LUNCH BAG SPACEBOY DESIGN}            0.0131  0.5796460   0.0226 15.213806   131
[46] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG PINK POLKADOT}             => {LUNCH BAG CARS BLUE}                  0.0108  0.5775401   0.0187 14.962179   108
[47] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG  BLACK SKULL}               0.0118  0.6243386   0.0189 14.829896   118
[48] {LUNCH BAG RED RETROSPOT,                                                                                                 
      LUNCH BAG SPACEBOY DESIGN}           => {LUNCH BAG PINK POLKADOT}              0.0101  0.5580110   0.0181 14.801354   101
[49] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG CARS BLUE}                  0.0116  0.5686275   0.0204 14.731281   116
[50] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG CARS BLUE}                  0.0118  0.5645933   0.0209 14.626769   118
[51] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG PINK POLKADOT}             => {LUNCH BAG  BLACK SKULL}               0.0108  0.6101695   0.0177 14.493337   108
[52] {LUNCH BAG VINTAGE LEAF DESIGN}       => {LUNCH BAG APPLE DESIGN}               0.0122  0.5020576   0.0243 14.468519   122
[53] {LUNCH BAG RED RETROSPOT,                                                                                                 
      LUNCH BAG WOODLAND}                  => {LUNCH BAG  BLACK SKULL}               0.0102  0.5964912   0.0171 14.168438   102
[54] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG SUKI DESIGN}                0.0103  0.5049020   0.0204 14.064121   103
[55] {LUNCH BAG RED RETROSPOT,                                                                                                 
      LUNCH BAG SPACEBOY DESIGN}           => {LUNCH BAG  BLACK SKULL}               0.0107  0.5911602   0.0181 14.041810   107
[56] {LUNCH BAG RED RETROSPOT,                                                                                                 
      LUNCH BAG SUKI DESIGN}               => {LUNCH BAG  BLACK SKULL}               0.0102  0.5895954   0.0173 14.004641   102
[57] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG  BLACK SKULL}               0.0120  0.5882353   0.0204 13.972335   120
[58] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG WOODLAND}                  => {LUNCH BAG RED RETROSPOT}              0.0101  0.7266187   0.0139 13.735703   101
[59] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG WOODLAND}                  => {LUNCH BAG RED RETROSPOT}              0.0102  0.7183099   0.0142 13.578636   102
[60] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG SPACEBOY DESIGN}            0.0107  0.5119617   0.0209 13.437316   107
[61] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG SPACEBOY DESIGN}           => {LUNCH BAG RED RETROSPOT}              0.0101  0.7013889   0.0144 13.258769   101
[62] {PAINTED METAL PEARS ASSORTED}        => {ASSORTED COLOUR BIRD ORNAMENT}        0.0106  0.7210884   0.0147 13.255302   106
[63] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG CARS BLUE}                 => {LUNCH BAG RED RETROSPOT}              0.0118  0.6555556   0.0180 12.392355   118
[64] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG PINK POLKADOT}             => {LUNCH BAG RED RETROSPOT}              0.0116  0.6553672   0.0177 12.388795   116
[65] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG SPACEBOY DESIGN}           => {LUNCH BAG RED RETROSPOT}              0.0107  0.6524390   0.0164 12.333441   107
[66] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG SUKI DESIGN}               => {LUNCH BAG RED RETROSPOT}              0.0103  0.6518987   0.0158 12.323227   103
[67] {60 TEATIME FAIRY CAKE CASES}         => {PACK OF 72 RETROSPOT CAKE CASES}      0.0134  0.5056604   0.0265 12.184587   134
[68] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG PINK POLKADOT}             => {LUNCH BAG RED RETROSPOT}              0.0120  0.6417112   0.0187 12.130647   120
[69] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG SUKI DESIGN}               => {LUNCH BAG RED RETROSPOT}              0.0102  0.6144578   0.0166 11.615460   102
[70] {LUNCH BAG WOODLAND}                  => {LUNCH BAG RED RETROSPOT}              0.0171  0.5625000   0.0304 10.633270   171
[71] {LUNCH BAG PINK POLKADOT}             => {LUNCH BAG RED RETROSPOT}              0.0204  0.5411141   0.0377 10.228999   204
[72] {LUNCH BAG DOLLY GIRL DESIGN}         => {LUNCH BAG RED RETROSPOT}              0.0121  0.5353982   0.0226 10.120950   121
[73] {JUMBO BAG STRAWBERRY}                => {JUMBO BAG RED RETROSPOT}              0.0172  0.6515152   0.0264  9.826775   172
[74] {JUMBO BAG SCANDINAVIAN BLUE PAISLEY} => {JUMBO BAG RED RETROSPOT}              0.0108  0.6352941   0.0170  9.582113   108
[75] {JUMBO BAG PINK POLKADOT}             => {JUMBO BAG RED RETROSPOT}              0.0223  0.6043360   0.0369  9.115174   223
[76] {CANDLEHOLDER PINK HANGING HEART}     => {WHITE HANGING HEART T-LIGHT HOLDER}   0.0111  0.7449664   0.0149  9.062852   111
[77] {JUMBO BAG SPACEBOY DESIGN}           => {JUMBO BAG RED RETROSPOT}              0.0113  0.5978836   0.0189  9.017852   113
[78] {JUMBO STORAGE BAG SUKI}              => {JUMBO BAG RED RETROSPOT}              0.0186  0.5904762   0.0315  8.906127   186
[79] {RED HANGING HEART T-LIGHT HOLDER}    => {WHITE HANGING HEART T-LIGHT HOLDER}   0.0186  0.6888889   0.0270  8.380643   186
[80] {JUMBO  BAG BAROQUE BLACK WHITE}      => {JUMBO BAG RED RETROSPOT}              0.0147  0.5505618   0.0267  8.304100   147
[81] {JUMBO BAG PINK VINTAGE PAISLEY}      => {JUMBO BAG RED RETROSPOT}              0.0128  0.5493562   0.0233  8.285916   128
[82] {JUMBO BAG VINTAGE DOILY}             => {JUMBO BAG RED RETROSPOT}              0.0116  0.5155556   0.0225  7.776102   116
[83] {JUMBO SHOPPER VINTAGE RED PAISLEY}   => {JUMBO BAG RED RETROSPOT}              0.0152  0.5033113   0.0302  7.591422   152
[84] {JUMBO BAG WOODLAND ANIMALS}          => {JUMBO BAG RED RETROSPOT}              0.0101  0.5024876   0.0201  7.578998   101
[85] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      PINK REGENCY TEACUP AND SAUCER}      => {REGENCY CAKESTAND 3 TIER}             0.0102  0.5312500   0.0192  6.846005   102
[86] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      ROSES REGENCY TEACUP AND SAUCER}     => {REGENCY CAKESTAND 3 TIER}             0.0116  0.5110132   0.0227  6.585222   116

((i))

If a customer makes the purchase of a Wooden Star Christmas Scandinavian, then they are also likely to purchase a Wooden Heart Christmas Scandinavian.

((ii))

Support Value (0.0113) displays how common it is that both items are bought together out of all transactions.

Confidence value (0.5733333) outlines how often the second item is bought when the first item is already bought.

((iii)

A lift of 44.05 means that customers that purchase the Wooden Star Christmas Scandinavian are x44 times more likely to simultaneously purchase the Wooden Heart Christmas Scandinavian than if the two items were unrelated.

(B)

{gardener’s kneeling pad keep calm} => {gardeners kneeling pad cup of tea}

This is considered a trivial rule as all items are generally quite closely linked and unsurprising when purchased simultaneously.

(C)

{jumbo bag strawberry} => {jumbo bag red retrospot}

Actionable rules are rules that can potentially help stores or businesses improve, as the customers simultaneous purchasing of two items may not seem that closely related, but can often offer insights into consumers spending habits.

7

rose_rule <- subset(retail_rules, items %in% "ROSES REGENCY TEACUP AND SAUCER")
inspect(rose_rule)
     lhs                                   rhs                               support confidence coverage      lift count
[1]  {PINK REGENCY TEACUP AND SAUCER}   => {ROSES REGENCY TEACUP AND SAUCER}  0.0177  0.7314050   0.0242 21.964113   177
[2]  {ROSES REGENCY TEACUP AND SAUCER}  => {PINK REGENCY TEACUP AND SAUCER}   0.0177  0.5315315   0.0333 21.964113   177
[3]  {GREEN REGENCY TEACUP AND SAUCER}  => {ROSES REGENCY TEACUP AND SAUCER}  0.0227  0.7394137   0.0307 22.204615   227
[4]  {ROSES REGENCY TEACUP AND SAUCER}  => {GREEN REGENCY TEACUP AND SAUCER}  0.0227  0.6816817   0.0333 22.204615   227
[5]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      PINK REGENCY TEACUP AND SAUCER}   => {ROSES REGENCY TEACUP AND SAUCER}  0.0156  0.8125000   0.0192 24.399399   156
[6]  {PINK REGENCY TEACUP AND SAUCER,                                                                                   
      ROSES REGENCY TEACUP AND SAUCER}  => {GREEN REGENCY TEACUP AND SAUCER}  0.0156  0.8813559   0.0177 28.708662   156
[7]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      ROSES REGENCY TEACUP AND SAUCER}  => {PINK REGENCY TEACUP AND SAUCER}   0.0156  0.6872247   0.0227 28.397714   156
[8]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      ROSES REGENCY TEACUP AND SAUCER}  => {REGENCY CAKESTAND 3 TIER}         0.0116  0.5110132   0.0227  6.585222   116
[9]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      REGENCY CAKESTAND 3 TIER}         => {ROSES REGENCY TEACUP AND SAUCER}  0.0116  0.7891156   0.0147 23.697167   116
[10] {REGENCY CAKESTAND 3 TIER,                                                                                         
      ROSES REGENCY TEACUP AND SAUCER}  => {GREEN REGENCY TEACUP AND SAUCER}  0.0116  0.7483871   0.0155 24.377430   116

green regency tea cup , saucer and cake stand

Pink regency tea cup and saucer

3 tier is another other item that some customers are also likely to buy with the regency tea cup and saucer.

Question 2

(1)

library(recommenderlab)
library(tidyverse)

(2)

steam_ratings <- read_csv("steam_ratings.csv")
steam_ratings <- as(steam_ratings, "matrix")
steam_ratings <- as(steam_ratings, "realRatingMatrix")

(3)

vector_ratings <- as.vector(steam_ratings@data)
table(vector_ratings)
vector_ratings
      0       1       2       3       4       5 
3236066    4773   12500   19762   10655    4724 

(B)

colMeans(steam_ratings) %>%
  tibble::enframe(name = "games", value = "steam_ratings") %>% 
  ggplot() +
  geom_histogram(mapping = aes(x = steam_ratings), color = "blue") +
  scale_x_continuous(limits = c(1, 5), breaks = c(1, 2, 3, 4, 5), 
                     labels = c('1','2', '3', '4', '5'))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

(C)

counts <- rowCounts(steam_ratings, value = TRUE, na.rm = FALSE) 

  ggplot() +
  geom_histogram(mapping = aes(x = counts), color = "blue")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

(4)

set.seed(101)
eval_games <- evaluationScheme(data = steam_ratings, 
                                method = "split",  
                                train = 0.8,      
                                given = 6,        
                                goodRating = 3)  

train_games <- getData(eval_games, "train")
known_games <- getData(eval_games, "known")
unknown_games <- getData(eval_games, "unknown")

(5)

(a)

ubcf_model_cc <- Recommender(data = train_games,
                          method = "UBCF", 
                          parameter = list(normalize = "center", method = "Cosine"))

ubcf_model_ce <- Recommender(data = train_games,
                            method = "UBCF", 
                            parameter = list(normalize = "center", method = "Euclidean"))

ubcf_model_cp <- Recommender(data = train_games,
                            method = "UBCF", 
                            parameter = list(normalize = "center", method = "Pearson"))
ubcf_model_zc <- Recommender(data = train_games,
                             method = "UBCF", 
                             parameter = list(normalize = "z-score", method = "Cosine"))

ubcf_model_ze <- Recommender(data = train_games,
                             method = "UBCF", 
                             parameter = list(normalize = "z-score", method = "Euclidean"))

ubcf_model_zp <- Recommender(data = train_games,
                             method = "UBCF", 
                             parameter = list(normalize = "z-score", method = "Pearson"))
ubcf_model_nc <- Recommender(data = train_games,
                             method = "UBCF", 
                             parameter = list(normalize = NULL, method = "Cosine"))

ubcf_model_ne <- Recommender(data = train_games,
                             method = "UBCF", 
                             parameter = list(normalize = NULL, method = "Euclidean"))

ubcf_model_np <- Recommender(data = train_games,
                             method = "UBCF", 
                             parameter = list(normalize = NULL, method = "Pearson"))

(b)

ubcf_predict_cc <- predict(object = ubcf_model_cc,
                        newdata = known_games, 
                        type = "ratings")

ubcf_cc_eval <- calcPredictionAccuracy(x = ubcf_predict_cc,
                                    data = unknown_games)
ubcf_cc_eval
     RMSE       MSE       MAE 
1.1697655 1.3683514 0.9183398 
ubcf_predict_ce <- predict(object = ubcf_model_ce,
                           newdata = known_games, 
                           type = "ratings")

ubcf_ce_eval <- calcPredictionAccuracy(x = ubcf_predict_ce,
                                       data = unknown_games)
ubcf_ce_eval 
     RMSE       MSE       MAE 
1.1910345 1.4185633 0.9163087 
ubcf_predict_cp <- predict(object = ubcf_model_cp,
                           newdata = known_games, 
                           type = "ratings")

ubcf_cp_eval <- calcPredictionAccuracy(x = ubcf_predict_cp,
                                       data = unknown_games)
ubcf_cp_eval 
     RMSE       MSE       MAE 
1.1212624 1.2572293 0.8702777 
ubcf_predict_zc <- predict(object = ubcf_model_zc,
                           newdata = known_games, 
                           type = "ratings")

ubcf_zc_eval <- calcPredictionAccuracy(x = ubcf_predict_zc,
                                       data = unknown_games)
ubcf_zc_eval 
    RMSE      MSE      MAE 
1.184555 1.403170 0.923375 
ubcf_predict_ze <- predict(object = ubcf_model_ze,
                             newdata = known_games, 
                             type = "ratings")

ubcf_ze_eval <- calcPredictionAccuracy(x = ubcf_predict_ze,
                                       data = unknown_games)
ubcf_ze_eval 
     RMSE       MSE       MAE 
1.2103032 1.4648339 0.9309624 
ubcf_predict_zp <- predict(object = ubcf_model_zp,
                             newdata = known_games, 
                             type = "ratings")

ubcf_zp_eval <- calcPredictionAccuracy(x = ubcf_predict_zp,
                                       data = unknown_games)
ubcf_zp_eval 
     RMSE       MSE       MAE 
1.1345807 1.2872733 0.8790968 
ubcf_predict_nc <- predict(object = ubcf_model_nc,
                             newdata = known_games, 
                             type = "ratings")

ubcf_nc_eval <- calcPredictionAccuracy(x = ubcf_predict_nc,
                                       data = unknown_games)
ubcf_nc_eval 
     RMSE       MSE       MAE 
1.0793268 1.1649463 0.8189319 
ubcf_predict_ne <- predict(object = ubcf_model_ne,
                             newdata = known_games, 
                             type = "ratings")

ubcf_ne_eval <- calcPredictionAccuracy(x = ubcf_predict_ne,
                                       data = unknown_games)
ubcf_ne_eval 
     RMSE       MSE       MAE 
1.0990975 1.2080152 0.8294308 
ubcf_predict_np <- predict(object = ubcf_model_np,
                             newdata = known_games, 
                             type = "ratings")

ubcf_np_eval <- calcPredictionAccuracy(x = ubcf_predict_np,
                                       data = unknown_games)
ubcf_np_eval 
     RMSE       MSE       MAE 
1.1086429 1.2290892 0.8349371 

(6)

(A)

#centering#

ibcf_model_cc <- Recommender(data = train_games,
                             method = "IBCF", 
                             parameter = list(normalize = "center", method = "Cosine"))

ibcf_model_ce <- Recommender(data = train_games,
                             method = "IBCF", 
                             parameter = list(normalize = "center", method = "Euclidean"))

ibcf_model_cp <- Recommender(data = train_games,
                             method = "IBCF", 
                             parameter = list(normalize = "center", method = "Pearson"))
#z-score#


ibcf_model_zc <- Recommender(data = train_games,
                             method = "IBCF", 
                             parameter = list(normalize = "z-score", method = "Cosine"))

ibcf_model_ze <- Recommender(data = train_games,
                             method = "IBCF", 
                             parameter = list(normalize = "z-score", method = "Euclidean"))

ibcf_model_zp <- Recommender(data = train_games,
                             method = "IBCF", 
                             parameter = list(normalize = "z-score", method = "Pearson"))
#null#


ibcf_model_nc <- Recommender(data = train_games,
                             method = "IBCF", 
                             parameter = list(normalize = NULL, method = "Cosine"))

ibcf_model_ne <- Recommender(data = train_games,
                             method = "IBCF", 
                             parameter = list(normalize = NULL, method = "Euclidean"))

ibcf_model_np <- Recommender(data = train_games,
                             method = "IBCF", 
                             parameter = list(normalize = NULL, method = "Pearson"))

(b)

ibcf_predict_cc <- predict(object = ibcf_model_cc,
                        newdata = known_games, 
                        type = "ratings")

ibcf_cc_eval <- calcPredictionAccuracy(x = ibcf_predict_cc,
                                    data = unknown_games)
ibcf_cc_eval
    RMSE      MSE      MAE 
1.500713 2.252139 1.165198 
ibcf_predict_ce <- predict(object = ibcf_model_ce,
                        newdata = known_games, 
                        type = "ratings")

ibcf_ce_eval <- calcPredictionAccuracy(x = ibcf_predict_ce,
                                    data = unknown_games)
ibcf_ce_eval
    RMSE      MSE      MAE 
1.477274 2.182339 1.142542 
ibcf_predict_cp <- predict(object = ibcf_model_cp,
                        newdata = known_games, 
                        type = "ratings")

ibcf_cp_eval <- calcPredictionAccuracy(x = ibcf_predict_cp,
                                    data = unknown_games)
ibcf_cp_eval
    RMSE      MSE      MAE 
1.470169 2.161397 1.158908 
ibcf_predict_zc <- predict(object = ibcf_model_zc,
                        newdata = known_games, 
                        type = "ratings")

ibcf_zc_eval <- calcPredictionAccuracy(x = ibcf_predict_zc,
                                    data = unknown_games)
ibcf_zc_eval
    RMSE      MSE      MAE 
1.500976 2.252928 1.163775 
ibcf_predict_ze <- predict(object = ibcf_model_ze,
                        newdata = known_games, 
                        type = "ratings")

ibcf_ze_eval <- calcPredictionAccuracy(x = ibcf_predict_ze,
                                    data = unknown_games)
ibcf_ze_eval
    RMSE      MSE      MAE 
1.475157 2.176087 1.141132 
ibcf_predict_zp <- predict(object = ibcf_model_zp,
                        newdata = known_games, 
                        type = "ratings")

ibcf_zp_eval <- calcPredictionAccuracy(x = ibcf_predict_zp,
                                    data = unknown_games)
ibcf_zp_eval
    RMSE      MSE      MAE 
1.467355 2.153130 1.158796 
ibcf_predict_nc <- predict(object = ibcf_model_nc,
                        newdata = known_games, 
                        type = "ratings")

ibcf_nc_eval <- calcPredictionAccuracy(x = ibcf_predict_nc,
                                    data = unknown_games)
ibcf_nc_eval
    RMSE      MSE      MAE 
1.587257 2.519385 1.239649 
ibcf_predict_ne <- predict(object = ibcf_model_ne,
                        newdata = known_games, 
                        type = "ratings")

ibcf_ne_eval <- calcPredictionAccuracy(x = ibcf_predict_ne,
                                    data = unknown_games)
ibcf_ne_eval
    RMSE      MSE      MAE 
1.476175 2.179092 1.140654 
ibcf_predict_np <- predict(object = ibcf_model_np,
                        newdata = known_games, 
                        type = "ratings")

ibcf_np_eval <- calcPredictionAccuracy(x = ibcf_predict_np,
                                    data = unknown_games)
ibcf_np_eval
    RMSE      MSE      MAE 
1.456788 2.122230 1.152312 

(7)

Best average scores:

UBFC_ze - 0.9309624

IBFC_nc - 1.239649

ubcf_ze_recs <- predict(object = ubcf_model_ze,
                     newdata = known_games,
                     type = "topNList",
                     n = 3) 


recommendation_list <- as(ubcf_ze_recs, "list")
recommendation_list[1:5]
$`0`
[1] "Pro Evolution Soccer 2015" "Deadpool"                 
[3] "Guns of Icarus Online"    

$`1`
[1] "Valkyria Chronicles"                 
[2] "Lara Croft and the Guardian of Light"
[3] "Panzar"                              

$`2`
[1] "Duke Nukem 3D Megaton Edition" "The Ultimate DOOM"            
[3] "Synergy"                      

$`3`
[1] "Sparkle 2 Evo"                    "Sang-Froid - Tales of Werewolves"
[3] "The Journey Down Chapter One"    

$`4`
[1] "Assassin's Creed"         "Sonic Adventure 2"       
[3] "Galaxy on Fire 2 Full HD"

user 0: Pro evolution soccer 2015, Deadpool, and Guns of Icarus online

user 1: Valkyria chronicles, lara croft and the guardian or light, panzar

user 2: Duke nukem 3d megaton edition, the ultimate doom, synergy

user 3: sparkle 2 evo, sang froid- tales of werewolves , the journey down chapter one

user 4: assassins creed, sonic adventure 2, galaxy on fire 2 full hd

ibcf_nc_recs <- predict(object = ibcf_model_nc,
                     newdata = known_games,
                     type = "topNList",
                     n = 3) 


recommendation_list2 <- as(ibcf_nc_recs, "list")
recommendation_list2[1:5]
$`0`
[1] "Wind of Luck Arena" "404Sight"           "8BitMMO"           

$`1`
[1] "3DMark"               "AdVenture Capitalist" "Age of Wonders III"  

$`2`
[1] "Heroes of Might & Magic III - HD Edition"
[2] "Age of Conan Unchained - EU version"     
[3] "Alien Rage - Unlimited"                  

$`3`
[1] "60 Seconds!" "Batla"       "Bus Driver" 

$`4`
[1] "Anno 1404"                      "Axis Game Factory's AGFPRO 3.0"
[3] "Blood Bowl Chaos Edition"      

user 0: Wind of luck arena, 404sight, 8bitmmo

user 1: 3dmark, adventure capitalist, age of wonders 3

user 2: heroes of might & magic 3, age of conan unchained, alien rage

user 3: 60 seconds , batla, bus driver

user 4: anno 1404, axis game factory agfpro 3.0, blood bowl.

(8)

Steam could use this output to enhance user engagement and drive sales by leveraging personalized recommendations to create a more tailored experience for each user. By having a good recommendation engine, Steam could convert their current customers from 1 time buyers, to repeat customers and build relationships with each customer.