Assignment 2 - Adv Data Analysis 2

Part A

Question 1

retail <- read.transactions("retail_transactions_1.csv", sep = ",")

Question 2

summary(retail)
transactions as itemMatrix in sparse format with
 10000 rows (elements/itemsets/transactions) and
 5497 columns (items) and a density of 0.00277837 

most frequent items:
WHITE HANGING HEART T-LIGHT HOLDER           REGENCY CAKESTAND 3 TIER 
                               838                                775 
           JUMBO BAG RED RETROSPOT                      PARTY BUNTING 
                               671                                551 
     ASSORTED COLOUR BIRD ORNAMENT                            (Other) 
                               543                             149349 

element (itemset/transaction) length distribution:
sizes
   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
1622  684  518  406  399  352  304  313  291  265  268  234  236  221  249  259 
  17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
 220  201  223  185  176  138  145  115  110  114   96  101  109  100   74   73 
  33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
  68   66   62   40   59   42   50   43   43   55   36   28   31   28   29   23 
  49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
  29   21   34   13   15   25   17   20   13   16   20   14   11   11   13   14 
  65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80 
   9   15   12    8    2   10    4    7    6    9    1    7    6    3    3    4 
  81   82   83   84   85   86   87   88   89   90   91   92   93   94   95   96 
   7    6    3    2    4    3    5    5    1    2    1    4    2    1    1    2 
  97   98   99  101  103  105  107  108  109  110  111  113  114  116  117  118 
   1    3    1    2    2    1    2    2    3    1    1    2    1    1    3    2 
 120  121  122  123  125  126  135  143  147  149  154  157  158  168  171  177 
   1    1    1    1    2    3    1    1    1    1    4    1    1    1    1    1 
 202  204  249  320  428 
   1    1    1    1    1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    3.00   10.00   15.27   21.00  428.00 

includes extended item information - examples:
                      labels
1                   1 HANGER
2     10 COLOUR SPACEBOY PEN
3 12 COLOURED PARTY BALLOONS

a) There are 10,000 transactions in the dataset

b) There are 5497 possible items available to purchase

c) The sparse matrix contains 54,970,000 cells. 152,727 of which are non-zero values.

d) The largest number of items purchased in a single transaction is 428

e) The mean number of items purchased in a single transaction is 15.27

Question 3

itemFrequencyPlot(retail, topN = 20, horiz = T)

Question 4

retail_rules <- apriori(retail, parameter = list(support = 0.01, 
                                               confidence = 0.5,
                                               minlen = 2))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5    0.01      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 100 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[5497 item(s), 10000 transaction(s)] done [0.02s].
sorting and recoding items ... [401 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [90 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
retail_rules
set of 90 rules 

a) There are 90 rules

b) A support threshold of 0.01 means that in order to generate a rule, an item must have appeared in 1% of the 10,000 transactions

c) A confidence threshold of 0.5 means that in order for a rule X -> Y to be included in the results, Y must appear in 50% of the transactions containing X

Question 5

summary(retail_rules)
set of 90 rules

rule length distribution (lhs + rhs):sizes
 2  3 
58 32 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.000   2.000   2.000   2.356   3.000   3.000 

summary of quality measures:
    support          confidence        coverage            lift       
 Min.   :0.01000   Min.   :0.5000   Min.   :0.01000   Min.   : 6.452  
 1st Qu.:0.01050   1st Qu.:0.5507   1st Qu.:0.01695   1st Qu.:12.805  
 Median :0.01175   Median :0.6124   Median :0.01965   Median :18.167  
 Mean   :0.01309   Mean   :0.6294   Mean   :0.02131   Mean   :20.694  
 3rd Qu.:0.01530   3rd Qu.:0.6733   3rd Qu.:0.02368   3rd Qu.:24.416  
 Max.   :0.02230   Max.   :1.0000   Max.   :0.03810   Max.   :66.667  
     count      
 Min.   :100.0  
 1st Qu.:105.0  
 Median :117.5  
 Mean   :130.9  
 3rd Qu.:153.0  
 Max.   :223.0  

mining info:
   data ntransactions support confidence
 retail         10000    0.01        0.5
                                                                                   call
 apriori(data = retail, parameter = list(support = 0.01, confidence = 0.5, minlen = 2))

a) 58 of the rules have 2 items, while 32 of the rules have 3 items

b) The minimum lift value for a rule is 6.452, while the maximum lift value for a rule is 66.667

Question 6

inspect(sort(retail_rules, by = "lift"))
     lhs                                      rhs                                   support confidence coverage      lift count
[1]  {SHED}                                => {KEY FOB}                              0.0100  1.0000000   0.0100 66.666667   100
[2]  {BACK DOOR}                           => {KEY FOB}                              0.0101  1.0000000   0.0101 66.666667   101
[3]  {KEY FOB}                             => {SHED}                                 0.0100  0.6666667   0.0150 66.666667   100
[4]  {KEY FOB}                             => {BACK DOOR}                            0.0101  0.6733333   0.0150 66.666667   101
[5]  {WOODEN STAR CHRISTMAS SCANDINAVIAN}  => {WOODEN HEART CHRISTMAS SCANDINAVIAN}  0.0114  0.7755102   0.0147 47.577313   114
[6]  {WOODEN HEART CHRISTMAS SCANDINAVIAN} => {WOODEN STAR CHRISTMAS SCANDINAVIAN}   0.0114  0.6993865   0.0163 47.577313   114
[7]  {PINK HAPPY BIRTHDAY BUNTING}         => {BLUE HAPPY BIRTHDAY BUNTING}          0.0101  0.6778523   0.0149 46.748438   101
[8]  {BLUE HAPPY BIRTHDAY BUNTING}         => {PINK HAPPY BIRTHDAY BUNTING}          0.0101  0.6965517   0.0145 46.748438   101
[9]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      REGENCY CAKESTAND 3 TIER}            => {PINK REGENCY TEACUP AND SAUCER}       0.0108  0.7105263   0.0152 30.494692   108
[10] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      ROSES REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0153  0.6860987   0.0223 29.446294   153
[11] {PINK REGENCY TEACUP AND SAUCER,                                                                                          
      ROSES REGENCY TEACUP AND SAUCER}     => {GREEN REGENCY TEACUP AND SAUCER}      0.0153  0.8793103   0.0174 29.408373   153
[12] {PINK REGENCY TEACUP AND SAUCER,                                                                                          
      REGENCY CAKESTAND 3 TIER}            => {GREEN REGENCY TEACUP AND SAUCER}      0.0108  0.8709677   0.0124 29.129356   108
[13] {HAND WARMER SCOTTY DOG DESIGN}       => {HAND WARMER OWL DESIGN}               0.0104  0.5621622   0.0185 27.556969   104
[14] {HAND WARMER OWL DESIGN}              => {HAND WARMER SCOTTY DOG DESIGN}        0.0104  0.5098039   0.0204 27.556969   104
[15] {PINK REGENCY TEACUP AND SAUCER}      => {GREEN REGENCY TEACUP AND SAUCER}      0.0188  0.8068670   0.0233 26.985517   188
[16] {GREEN REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0188  0.6287625   0.0299 26.985517   188
[17] {REGENCY CAKESTAND 3 TIER,                                                                                                
      ROSES REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0102  0.6107784   0.0167 26.213667   102
[18] {GARDENERS KNEELING PAD CUP OF TEA}   => {GARDENERS KNEELING PAD KEEP CALM}     0.0186  0.7440000   0.0250 25.220339   186
[19] {GARDENERS KNEELING PAD KEEP CALM}    => {GARDENERS KNEELING PAD CUP OF TEA}    0.0186  0.6305085   0.0295 25.220339   186
[20] {PINK REGENCY TEACUP AND SAUCER,                                                                                          
      REGENCY CAKESTAND 3 TIER}            => {ROSES REGENCY TEACUP AND SAUCER}      0.0102  0.8225806   0.0124 24.628163   102
[21] {SPACEBOY LUNCH BOX}                  => {DOLLY GIRL LUNCH BOX}                 0.0161  0.5812274   0.0277 24.524364   161
[22] {DOLLY GIRL LUNCH BOX}                => {SPACEBOY LUNCH BOX}                   0.0161  0.6793249   0.0237 24.524364   161
[23] {REGENCY CAKESTAND 3 TIER,                                                                                                
      ROSES REGENCY TEACUP AND SAUCER}     => {GREEN REGENCY TEACUP AND SAUCER}      0.0122  0.7305389   0.0167 24.432740   122
[24] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      PINK REGENCY TEACUP AND SAUCER}      => {ROSES REGENCY TEACUP AND SAUCER}      0.0153  0.8138298   0.0188 24.366161   153
[25] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      REGENCY CAKESTAND 3 TIER}            => {ROSES REGENCY TEACUP AND SAUCER}      0.0122  0.8026316   0.0152 24.030886   122
[26] {PINK REGENCY TEACUP AND SAUCER}      => {ROSES REGENCY TEACUP AND SAUCER}      0.0174  0.7467811   0.0233 22.358716   174
[27] {ROSES REGENCY TEACUP AND SAUCER}     => {PINK REGENCY TEACUP AND SAUCER}       0.0174  0.5209581   0.0334 22.358716   174
[28] {ROSES REGENCY TEACUP AND SAUCER}     => {GREEN REGENCY TEACUP AND SAUCER}      0.0223  0.6676647   0.0334 22.329922   223
[29] {GREEN REGENCY TEACUP AND SAUCER}     => {ROSES REGENCY TEACUP AND SAUCER}      0.0223  0.7458194   0.0299 22.329922   223
[30] {PLASTERS IN TIN CIRCUS PARADE}       => {PLASTERS IN TIN WOODLAND ANIMALS}     0.0110  0.5263158   0.0209 22.114109   110
[31] {CHARLOTTE BAG PINK POLKADOT}         => {RED RETROSPOT CHARLOTTE BAG}          0.0120  0.6315789   0.0190 22.083180   120
[32] {ROUND SNACK BOXES SET OF 4 FRUITS}   => {ROUND SNACK BOXES SET OF4 WOODLAND}   0.0106  0.5520833   0.0192 21.995352   106
[33] {HOT WATER BOTTLE I AM SO POORLY}     => {CHOCOLATE HOT WATER BOTTLE}           0.0123  0.6275510   0.0196 21.865889   123
[34] {RED KITCHEN SCALES}                  => {IVORY KITCHEN SCALES}                 0.0119  0.6010101   0.0198 21.854913   119
[35] {JUMBO BAG PEARS}                     => {JUMBO BAG APPLES}                     0.0122  0.6455026   0.0189 21.233640   122
[36] {STRAWBERRY CHARLOTTE BAG}            => {RED RETROSPOT CHARLOTTE BAG}          0.0115  0.5502392   0.0209 19.239134   115
[37] {BAKING SET SPACEBOY DESIGN}          => {BAKING SET 9 PIECE RETROSPOT}         0.0110  0.6508876   0.0169 19.087612   110
[38] {ALARM CLOCK BAKELIKE GREEN}          => {ALARM CLOCK BAKELIKE RED}             0.0195  0.6414474   0.0304 18.866099   195
[39] {ALARM CLOCK BAKELIKE RED}            => {ALARM CLOCK BAKELIKE GREEN}           0.0195  0.5735294   0.0340 18.866099   195
[40] {ALARM CLOCK BAKELIKE IVORY}          => {ALARM CLOCK BAKELIKE RED}             0.0115  0.6388889   0.0180 18.790850   115
[41] {ALARM CLOCK BAKELIKE ORANGE}         => {ALARM CLOCK BAKELIKE RED}             0.0101  0.6352201   0.0159 18.682945   101
[42] {ALARM CLOCK BAKELIKE IVORY}          => {ALARM CLOCK BAKELIKE GREEN}           0.0102  0.5666667   0.0180 18.640351   102
[43] {ALARM CLOCK BAKELIKE PINK}           => {ALARM CLOCK BAKELIKE RED}             0.0148  0.6271186   0.0236 18.444666   148
[44] {LOVE BUILDING BLOCK WORD}            => {HOME BUILDING BLOCK WORD}             0.0107  0.5270936   0.0203 18.429846   107
[45] {HOT WATER BOTTLE TEA AND SYMPATHY}   => {CHOCOLATE HOT WATER BOTTLE}           0.0103  0.5228426   0.0197 18.217514   103
[46] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG SPACEBOY DESIGN}           => {LUNCH BAG CARS BLUE}                  0.0105  0.7046980   0.0149 18.115629   105
[47] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG SPACEBOY DESIGN}           => {LUNCH BAG PINK POLKADOT}              0.0105  0.6730769   0.0156 17.666061   105
[48] {LUNCH BAG RED RETROSPOT,                                                                                                 
      LUNCH BAG SPACEBOY DESIGN}           => {LUNCH BAG WOODLAND}                   0.0105  0.6069364   0.0173 17.643500   105
[49] {ALARM CLOCK BAKELIKE PINK}           => {ALARM CLOCK BAKELIKE GREEN}           0.0123  0.5211864   0.0236 17.144291   123
[50] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG PINK POLKADOT}              0.0118  0.6519337   0.0181 17.111121   118
[51] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG CARS BLUE}                 => {LUNCH BAG PINK POLKADOT}              0.0106  0.6385542   0.0166 16.759953   106
[52] {LUNCH BAG RED RETROSPOT,                                                                                                 
      LUNCH BAG WOODLAND}                  => {LUNCH BAG SPACEBOY DESIGN}            0.0105  0.6140351   0.0171 16.158818   105
[53] {LUNCH BAG RED RETROSPOT,                                                                                                 
      LUNCH BAG WOODLAND}                  => {LUNCH BAG PINK POLKADOT}              0.0103  0.6023392   0.0171 15.809427   103
[54] {WOODEN FRAME ANTIQUE WHITE}          => {WOODEN PICTURE FRAME WHITE FINISH}    0.0202  0.5821326   0.0347 15.523535   202
[55] {WOODEN PICTURE FRAME WHITE FINISH}   => {WOODEN FRAME ANTIQUE WHITE}           0.0202  0.5386667   0.0375 15.523535   202
[56] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG PINK POLKADOT}             => {LUNCH BAG CARS BLUE}                  0.0106  0.5888889   0.0180 15.138532   106
[57] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG PINK POLKADOT}             => {LUNCH BAG SPACEBOY DESIGN}            0.0105  0.5737705   0.0183 15.099223   105
[58] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG CARS BLUE}                  0.0118  0.5841584   0.0202 15.016926   118
[59] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG PINK POLKADOT}              0.0118  0.5673077   0.0208 14.889966   118
[60] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG WOODLAND}                   0.0103  0.5099010   0.0202 14.822703   103
[61] {LUNCH BAG DOLLY GIRL DESIGN}         => {LUNCH BAG SPACEBOY DESIGN}            0.0126  0.5478261   0.0230 14.416476   126
[62] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG WOODLAND}                  => {LUNCH BAG RED RETROSPOT}              0.0103  0.7463768   0.0138 14.381056   103
[63] {LUNCH BAG VINTAGE LEAF DESIGN}       => {LUNCH BAG APPLE DESIGN}               0.0114  0.5112108   0.0223 14.279630   114
[64] {LUNCH BAG PINK POLKADOT,                                                                                                 
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG  BLACK SKULL}               0.0118  0.5841584   0.0202 13.680525   118
[65] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG PINK POLKADOT}             => {LUNCH BAG  BLACK SKULL}               0.0106  0.5792350   0.0183 13.565222   106
[66] {PAINTED METAL PEARS ASSORTED}        => {ASSORTED COLOUR BIRD ORNAMENT}        0.0111  0.7302632   0.0152 13.448677   111
[67] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG RED RETROSPOT}             => {LUNCH BAG  BLACK SKULL}               0.0103  0.5690608   0.0181 13.326950   103
[68] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG PINK POLKADOT}             => {LUNCH BAG RED RETROSPOT}              0.0118  0.6555556   0.0180 12.631128   118
[69] {LUNCH BAG CARS BLUE,                                                                                                     
      LUNCH BAG PINK POLKADOT}             => {LUNCH BAG RED RETROSPOT}              0.0118  0.6448087   0.0183 12.424061   118
[70] {60 TEATIME FAIRY CAKE CASES}         => {PACK OF 72 RETROSPOT CAKE CASES}      0.0145  0.5350554   0.0271 12.414277   145
[71] {LUNCH BAG SPACEBOY DESIGN,                                                                                               
      LUNCH BAG WOODLAND}                  => {LUNCH BAG RED RETROSPOT}              0.0105  0.6250000   0.0168 12.042389   105
[72] {LUNCH BAG  BLACK SKULL,                                                                                                  
      LUNCH BAG CARS BLUE}                 => {LUNCH BAG RED RETROSPOT}              0.0103  0.6204819   0.0166 11.955336   103
[73] {PACK OF 72 SKULL CAKE CASES}         => {PACK OF 72 RETROSPOT CAKE CASES}      0.0108  0.5046729   0.0214 11.709348   108
[74] {LUNCH BAG PINK POLKADOT}             => {LUNCH BAG RED RETROSPOT}              0.0202  0.5301837   0.0381 10.215486   202
[75] {JUMBO BAG STRAWBERRY}                => {JUMBO BAG RED RETROSPOT}              0.0176  0.6641509   0.0265  9.897928   176
[76] {LUNCH BAG DOLLY GIRL DESIGN}         => {LUNCH BAG RED RETROSPOT}              0.0117  0.5086957   0.0230  9.801458   117
[77] {JUMBO BAG PINK POLKADOT}             => {JUMBO BAG RED RETROSPOT}              0.0222  0.5951743   0.0373  8.869959   222
[78] {JUMBO BAG SCANDINAVIAN BLUE PAISLEY} => {JUMBO BAG RED RETROSPOT}              0.0104  0.5683060   0.0183  8.469538   104
[79] {JUMBO  BAG BAROQUE BLACK WHITE}      => {JUMBO BAG RED RETROSPOT}              0.0149  0.5539033   0.0269  8.254893   149
[80] {JUMBO STORAGE BAG SUKI}              => {JUMBO BAG RED RETROSPOT}              0.0161  0.5457627   0.0295  8.133572   161
[81] {RED HANGING HEART T-LIGHT HOLDER}    => {WHITE HANGING HEART T-LIGHT HOLDER}   0.0175  0.6481481   0.0270  7.734465   175
[82] {JUMBO BAG SPACEBOY DESIGN}           => {JUMBO BAG RED RETROSPOT}              0.0102  0.5125628   0.0199  7.638790   102
[83] {PINK REGENCY TEACUP AND SAUCER,                                                                                          
      ROSES REGENCY TEACUP AND SAUCER}     => {REGENCY CAKESTAND 3 TIER}             0.0102  0.5862069   0.0174  7.563960   102
[84] {JUMBO BAG PINK VINTAGE PAISLEY}      => {JUMBO BAG RED RETROSPOT}              0.0125  0.5040323   0.0248  7.511658   125
[85] {JUMBO SHOPPER VINTAGE RED PAISLEY}   => {JUMBO BAG RED RETROSPOT}              0.0157  0.5000000   0.0314  7.451565   157
[86] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      PINK REGENCY TEACUP AND SAUCER}      => {REGENCY CAKESTAND 3 TIER}             0.0108  0.5744681   0.0188  7.412491   108
[87] {GREEN REGENCY TEACUP AND SAUCER,                                                                                         
      ROSES REGENCY TEACUP AND SAUCER}     => {REGENCY CAKESTAND 3 TIER}             0.0122  0.5470852   0.0223  7.059164   122
[88] {PINK REGENCY TEACUP AND SAUCER}      => {REGENCY CAKESTAND 3 TIER}             0.0124  0.5321888   0.0233  6.866953   124
[89] {GREEN REGENCY TEACUP AND SAUCER}     => {REGENCY CAKESTAND 3 TIER}             0.0152  0.5083612   0.0299  6.559499   152
[90] {ROSES REGENCY TEACUP AND SAUCER}     => {REGENCY CAKESTAND 3 TIER}             0.0167  0.5000000   0.0334  6.451613   167

a)

i) The rule {SHED} -> {KEY FOB} means that if somebody buys a shed, it has also been paired with a key fob.

ii) The support value for this rule is 0.0100, meaning that this rule covers 1% of transactions. The confidence value for this rule is 1.0000, meaning that it is correct in 100% of purchases involving sheds.

iii) The chances of someone buying a Key Fob after they bought a Shed is 66.66667 times more likely than someone buying a Key Fob on its own.

b) Trivial rules may describe things that are already well-known or common sense. Some rules that would be trivial include purchasing Christmas decorations together, and colour pairings of items such as Green Teacups & Saucers with Pink Teacups & Saucers.

c) Actionable rules may describe things that the business can use to their advantage to make changes to their strategy. This includes Sheds with Key Fobs and Back Doors with Key Fobs. The business could offer a free key fob with their Sheds and Doors to entice customers to shop with them in comparison to their competitors.

Question 7

pink_regency_rules <- subset(retail_rules, items %in% "PINK REGENCY TEACUP AND SAUCER")
inspect(pink_regency_rules)
     lhs                                   rhs                               support confidence coverage      lift count
[1]  {PINK REGENCY TEACUP AND SAUCER}   => {GREEN REGENCY TEACUP AND SAUCER}  0.0188  0.8068670   0.0233 26.985517   188
[2]  {GREEN REGENCY TEACUP AND SAUCER}  => {PINK REGENCY TEACUP AND SAUCER}   0.0188  0.6287625   0.0299 26.985517   188
[3]  {PINK REGENCY TEACUP AND SAUCER}   => {ROSES REGENCY TEACUP AND SAUCER}  0.0174  0.7467811   0.0233 22.358716   174
[4]  {ROSES REGENCY TEACUP AND SAUCER}  => {PINK REGENCY TEACUP AND SAUCER}   0.0174  0.5209581   0.0334 22.358716   174
[5]  {PINK REGENCY TEACUP AND SAUCER}   => {REGENCY CAKESTAND 3 TIER}         0.0124  0.5321888   0.0233  6.866953   124
[6]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      PINK REGENCY TEACUP AND SAUCER}   => {ROSES REGENCY TEACUP AND SAUCER}  0.0153  0.8138298   0.0188 24.366161   153
[7]  {PINK REGENCY TEACUP AND SAUCER,                                                                                   
      ROSES REGENCY TEACUP AND SAUCER}  => {GREEN REGENCY TEACUP AND SAUCER}  0.0153  0.8793103   0.0174 29.408373   153
[8]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      ROSES REGENCY TEACUP AND SAUCER}  => {PINK REGENCY TEACUP AND SAUCER}   0.0153  0.6860987   0.0223 29.446294   153
[9]  {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      PINK REGENCY TEACUP AND SAUCER}   => {REGENCY CAKESTAND 3 TIER}         0.0108  0.5744681   0.0188  7.412491   108
[10] {PINK REGENCY TEACUP AND SAUCER,                                                                                   
      REGENCY CAKESTAND 3 TIER}         => {GREEN REGENCY TEACUP AND SAUCER}  0.0108  0.8709677   0.0124 29.129356   108
[11] {GREEN REGENCY TEACUP AND SAUCER,                                                                                  
      REGENCY CAKESTAND 3 TIER}         => {PINK REGENCY TEACUP AND SAUCER}   0.0108  0.7105263   0.0152 30.494692   108
[12] {PINK REGENCY TEACUP AND SAUCER,                                                                                   
      ROSES REGENCY TEACUP AND SAUCER}  => {REGENCY CAKESTAND 3 TIER}         0.0102  0.5862069   0.0174  7.563960   102
[13] {PINK REGENCY TEACUP AND SAUCER,                                                                                   
      REGENCY CAKESTAND 3 TIER}         => {ROSES REGENCY TEACUP AND SAUCER}  0.0102  0.8225806   0.0124 24.628163   102
[14] {REGENCY CAKESTAND 3 TIER,                                                                                         
      ROSES REGENCY TEACUP AND SAUCER}  => {PINK REGENCY TEACUP AND SAUCER}   0.0102  0.6107784   0.0167 26.213667   102

If a customer purchases a PINK REGENCY TEACUP AND SAUCER they are most likely to buy the following along with it:

  1. GREEN REGENCY TEACUP AND SAUCER

  2. ROSES REGENCY TEACUP AND SAUCER

  3. REGENCY CAKESTAND 3 TIER

Part B

Question 1

library(tidyverse)
library(recommenderlab)

Question 2

steam_ratings <- read_csv("steam_ratings.csv")
steam_ratings <- as(steam_ratings, "matrix")
steam_ratings <- as(steam_ratings, "realRatingMatrix")

Question 3

a)

vector_ratings <- as.vector(steam_ratings@data)
table(vector_ratings)

The ratings follow a Normal Distribution; with a rating of 3 being the most common (19762), while 2 (12500) and 4 (10655) are very similar, as well as 1 (4773) and 5 (4724) being very similar.

b)

colMeans(steam_ratings) %>%
  tibble::enframe(name = "game", value = "game_rating") %>%
  ggplot() +
      geom_histogram(mapping = aes(x = game_rating), color = "white")

c)

rowCounts(steam_ratings) %>%
  tibble::enframe(name = "game", value = "game_rating") %>%
  ggplot() +
  geom_histogram(mapping = aes(x = game_rating), color = "white")

Question 4

a)

set.seed(101)

b)

eval_games = evaluationScheme(data = steam_ratings,
                              method = "split",
                              train = 0.8,
                              given = 6,
                              goodRating = 3)

c)

train_games <- getData(eval_games, "train")
known_games <- getData(eval_games, "known")
unknown_games <- getData(eval_games, "unknown")

Question 5

a)

ubcf_model <- Recommender(data = train_games,
                          method = "UBCF",
                          parameter = list(normalize = "center", method = "Cosine"))

ubcf_predict <- predict(object = ubcf_model,
                          newdata = known_games,
                          type = "ratings")

ubcf_eval <- calcPredictionAccuracy(x = ubcf_predict,
                                    data = unknown_games)
ubcf_eval
     RMSE       MSE       MAE 
1.1697655 1.3683514 0.9183398 

b) The Mean Absolute Error (MAE) of the model is 0.9183

Question 6

a)

ibcf_model <- Recommender(data = train_games,
                          method = "IBCF",
                          parameter = list(normalize = "center", method = "cosine"))

ibcf_predict <- predict(object = ibcf_model,
                        newdata = known_games,
                        type = "ratings")

ibcf_eval <- calcPredictionAccuracy(x = ibcf_predict,
                                    data = unknown_games)
ibcf_eval
    RMSE      MSE      MAE 
1.500713 2.252139 1.165198 

b) The Mean Absolute Error (MAE) for this model is 1.1652

Question 7

ubcf_recs <- predict(object = ubcf_model,
                     newdata = known_games,
                     type = "topNList",
                     n = 3)
rec_list <- as(ubcf_recs, "list")
rec_list[1:5]
$`0`
[1] "Frozen Hearth"     "FINAL FANTASY VII" "HAWKEN"           

$`1`
[1] "Loadout Campaign Beta" "Royal Quest"           "Villagers and Heroes" 

$`2`
[1] "Hitman Blood Money" "Sonic Adventure 2"  "Time Clickers"     

$`3`
[1] "The Ultimate DOOM" "Door Kickers"      "Train Fever"      

$`4`
[1] "Sonic Adventure 2" "Quake Live"        "Royal Quest"      

The top 3 game recommendations for the first 5 users are the following:

User 1

  1. Frozen Hearth

  2. FINAL FANTASY VII

  3. HAWKEN

User 2

  1. Loadout Campaign Beta

  2. Royal Quest

  3. Villagers and Heroes

User 3

  1. Hitman Blood Money

  2. Sonic Adventure 2

  3. Time Clickers

User 4

  1. The Ultimate DOOM

  2. Door Kickers

  3. Train Fever

User 5

  1. Sonic Adventure 2

  2. Quake Live

  3. Royal Quest

Question 8

Steam could use this Collaborative Filtering model to increase user engagement and sales by promoting specific games to users, tailored by their current behaviours. This would keep the users online and playing their games, as well as purchasing games which they may be interested in. This provides an experience which is customised to each user.