#install.packages("arules")
#install.packages("recommenderlab")
#install.packages("tidyverse")
library(arules)
library(recommenderlab)
library(tidyverse)Apriori algorithm and collaborative filtering
MARKET BASKET ANALYSIS
1, 2
retail <- read.transactions("retail_transactions_2.csv", sep = ",")
summary(retail)transactions as itemMatrix in sparse format with
10000 rows (elements/itemsets/transactions) and
5471 columns (items) and a density of 0.002797642
most frequent items:
WHITE HANGING HEART T-LIGHT HOLDER REGENCY CAKESTAND 3 TIER
823 777
JUMBO BAG RED RETROSPOT PARTY BUNTING
644 577
ASSORTED COLOUR BIRD ORNAMENT (Other)
558 149680
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1660 727 492 408 396 330 290 307 281 258 279 262 227 239 262 246
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
201 197 219 194 164 148 138 128 109 110 95 90 109 86 76 66
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
56 56 59 44 41 46 57 44 33 41 39 31 31 27 29 25
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
27 24 29 19 23 27 24 16 21 19 19 15 17 7 11 13
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
7 16 16 13 4 10 9 6 7 6 5 10 8 1 2 4
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 97
8 4 3 5 5 6 6 5 1 2 3 7 4 2 2 3
98 99 101 102 103 105 107 108 109 111 113 114 116 117 119 120
4 1 1 2 3 1 2 1 2 1 1 1 1 3 1 1
121 122 125 126 127 134 135 143 146 147 158 168 178 235 249 285
1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1
320 400
1 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 3.00 10.00 15.31 21.00 400.00
includes extended item information - examples:
labels
1 1 HANGER
2 10 COLOUR SPACEBOY PEN
3 12 COLOURED PARTY BALLOONS
- 1000 transactions.
- 5471 items
- Sparse matrix contains 5 471 000 cells and density of 0.002797642 tells us that 0.2% of the cells (15 305,9) contain non-zero value. That number then represents how many items were purchased.
- 400
- 15.31
3
itemFrequencyPlot(retail, topN = 20, horiz = T)4
retail_rules <- apriori(retail, parameter = list(support = 0.01,
confidence = 0.5,
minlen = 2))Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.5 0.1 1 none FALSE TRUE 5 0.01 2
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 100
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[5471 item(s), 10000 transaction(s)] done [0.02s].
sorting and recoding items ... [405 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [72 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
retail_rulesset of 72 rules
- 72 rules was discovered
- It is a smallest number of transactions needed before the data and its pattern becomes interesting, so 0.01 would mean an item would have to be bought in 1% of all transactions.
- Low confidence level means there might be too many unreliable results, high confidence means we’d get results that are too obvious, confidence threshold of 0.5 means that Y must appear in 50% of transactions purchased with X.
5
summary(retail_rules)set of 72 rules
rule length distribution (lhs + rhs):sizes
2 3
54 18
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 2.00 2.00 2.25 2.25 3.00
summary of quality measures:
support confidence coverage lift
Min. :0.01000 Min. :0.5020 Min. :0.01080 Min. : 6.461
1st Qu.:0.01080 1st Qu.:0.5502 1st Qu.:0.01680 1st Qu.:14.406
Median :0.01200 Median :0.6226 Median :0.01970 Median :22.160
Mean :0.01351 Mean :0.6637 Mean :0.02126 Mean :27.032
3rd Qu.:0.01673 3rd Qu.:0.7307 3rd Qu.:0.02515 3rd Qu.:26.647
Max. :0.02280 Max. :1.0000 Max. :0.03530 Max. :92.593
count
Min. :100.0
1st Qu.:108.0
Median :120.0
Mean :135.1
3rd Qu.:167.2
Max. :228.0
mining info:
data ntransactions support confidence
retail 10000 0.01 0.5
call
apriori(data = retail, parameter = list(support = 0.01, confidence = 0.5, minlen = 2))
- 54 items have 2 rules and 18 items have 3 rules
- Minimum lift value is 6.461 and maximum is 92.593.
6
inspect(retail_rules[1:72]) lhs rhs support confidence coverage lift count
[1] {SUGAR} => {SET 3 RETROSPOT TEA} 0.0108 1.0000000 0.0108 92.592593 108
[2] {SET 3 RETROSPOT TEA} => {SUGAR} 0.0108 1.0000000 0.0108 92.592593 108
[3] {SUGAR} => {COFFEE} 0.0108 1.0000000 0.0108 64.102564 108
[4] {COFFEE} => {SUGAR} 0.0108 0.6923077 0.0156 64.102564 108
[5] {SET 3 RETROSPOT TEA} => {COFFEE} 0.0108 1.0000000 0.0108 64.102564 108
[6] {COFFEE} => {SET 3 RETROSPOT TEA} 0.0108 0.6923077 0.0156 64.102564 108
[7] {PINK HAPPY BIRTHDAY BUNTING} => {BLUE HAPPY BIRTHDAY BUNTING} 0.0104 0.7074830 0.0147 45.940454 104
[8] {BLUE HAPPY BIRTHDAY BUNTING} => {PINK HAPPY BIRTHDAY BUNTING} 0.0104 0.6753247 0.0154 45.940454 104
[9] {BAKING SET SPACEBOY DESIGN} => {BAKING SET 9 PIECE RETROSPOT} 0.0109 0.6942675 0.0157 20.848874 109
[10] {SET OF TEA COFFEE SUGAR TINS PANTRY} => {SET OF 3 CAKE TINS PANTRY DESIGN} 0.0103 0.5852273 0.0176 11.276055 103
[11] {HAND WARMER SCOTTY DOG DESIGN} => {HAND WARMER OWL DESIGN} 0.0106 0.6057143 0.0175 27.040816 106
[12] {JUMBO BAG PEARS} => {JUMBO BAG APPLES} 0.0115 0.6318681 0.0182 22.977023 115
[13] {RED KITCHEN SCALES} => {IVORY KITCHEN SCALES} 0.0113 0.5566502 0.0203 21.492288 113
[14] {JUMBO BAG WOODLAND ANIMALS} => {JUMBO BAG RED RETROSPOT} 0.0109 0.5505051 0.0198 8.548215 109
[15] {ALARM CLOCK BAKELIKE IVORY} => {ALARM CLOCK BAKELIKE GREEN} 0.0105 0.5526316 0.0190 18.861146 105
[16] {ALARM CLOCK BAKELIKE IVORY} => {ALARM CLOCK BAKELIKE RED} 0.0130 0.6842105 0.0190 19.832189 130
[17] {ROUND SNACK BOXES SET OF 4 FRUITS} => {ROUND SNACK BOXES SET OF4 WOODLAND} 0.0102 0.5454545 0.0187 22.083180 102
[18] {WOODEN STAR CHRISTMAS SCANDINAVIAN} => {WOODEN HEART CHRISTMAS SCANDINAVIAN} 0.0139 0.7679558 0.0181 40.207110 139
[19] {WOODEN HEART CHRISTMAS SCANDINAVIAN} => {WOODEN STAR CHRISTMAS SCANDINAVIAN} 0.0139 0.7277487 0.0191 40.207110 139
[20] {HOT WATER BOTTLE TEA AND SYMPATHY} => {CHOCOLATE HOT WATER BOTTLE} 0.0104 0.5073171 0.0205 18.053988 104
[21] {HAND WARMER BIRD DESIGN} => {HAND WARMER OWL DESIGN} 0.0100 0.5494505 0.0182 24.529042 100
[22] {STRAWBERRY CHARLOTTE BAG} => {RED RETROSPOT CHARLOTTE BAG} 0.0103 0.5988372 0.0172 20.438130 103
[23] {HOT WATER BOTTLE I AM SO POORLY} => {CHOCOLATE HOT WATER BOTTLE} 0.0117 0.5879397 0.0199 20.923121 117
[24] {LARGE WHITE HEART OF WICKER} => {SMALL WHITE HEART OF WICKER} 0.0110 0.5238095 0.0210 23.280423 110
[25] {JUMBO BAG SPACEBOY DESIGN} => {JUMBO BAG RED RETROSPOT} 0.0105 0.5440415 0.0193 8.447849 105
[26] {JUMBO BAG PINK VINTAGE PAISLEY} => {JUMBO BAG RED RETROSPOT} 0.0125 0.5122951 0.0244 7.954893 125
[27] {PINK REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.0197 0.7848606 0.0251 26.515559 197
[28] {GREEN REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} 0.0197 0.6655405 0.0296 26.515559 197
[29] {PINK REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} 0.0192 0.7649402 0.0251 22.236635 192
[30] {ROSES REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} 0.0192 0.5581395 0.0344 22.236635 192
[31] {PINK REGENCY TEACUP AND SAUCER} => {REGENCY CAKESTAND 3 TIER} 0.0126 0.5019920 0.0251 6.460644 126
[32] {CHARLOTTE BAG PINK POLKADOT} => {RED RETROSPOT CHARLOTTE BAG} 0.0114 0.5876289 0.0194 20.055593 114
[33] {LUNCH BAG VINTAGE LEAF DESIGN} => {LUNCH BAG APPLE DESIGN} 0.0121 0.5377778 0.0225 16.247063 121
[34] {ALARM CLOCK BAKELIKE PINK} => {ALARM CLOCK BAKELIKE GREEN} 0.0123 0.5082645 0.0242 17.346910 123
[35] {ALARM CLOCK BAKELIKE PINK} => {ALARM CLOCK BAKELIKE RED} 0.0148 0.6115702 0.0242 17.726674 148
[36] {JUMBO BAG BAROQUE BLACK WHITE} => {JUMBO BAG RED RETROSPOT} 0.0141 0.5529412 0.0255 8.586043 141
[37] {JUMBO BAG STRAWBERRY} => {JUMBO BAG RED RETROSPOT} 0.0171 0.6263736 0.0273 9.726299 171
[38] {DOLLY GIRL LUNCH BOX} => {SPACEBOY LUNCH BOX} 0.0140 0.6635071 0.0211 26.225577 140
[39] {SPACEBOY LUNCH BOX} => {DOLLY GIRL LUNCH BOX} 0.0140 0.5533597 0.0253 26.225577 140
[40] {LUNCH BAG DOLLY GIRL DESIGN} => {LUNCH BAG SPACEBOY DESIGN} 0.0122 0.5754717 0.0212 16.029852 122
[41] {RED HANGING HEART T-LIGHT HOLDER} => {WHITE HANGING HEART T-LIGHT HOLDER} 0.0162 0.6303502 0.0257 7.659176 162
[42] {GARDENERS KNEELING PAD CUP OF TEA} => {GARDENERS KNEELING PAD KEEP CALM} 0.0167 0.7260870 0.0230 25.931677 167
[43] {GARDENERS KNEELING PAD KEEP CALM} => {GARDENERS KNEELING PAD CUP OF TEA} 0.0167 0.5964286 0.0280 25.931677 167
[44] {WOODEN FRAME ANTIQUE WHITE} => {WOODEN PICTURE FRAME WHITE FINISH} 0.0175 0.5520505 0.0317 16.236779 175
[45] {WOODEN PICTURE FRAME WHITE FINISH} => {WOODEN FRAME ANTIQUE WHITE} 0.0175 0.5147059 0.0340 16.236779 175
[46] {GREEN REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} 0.0228 0.7702703 0.0296 22.391578 228
[47] {ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.0228 0.6627907 0.0344 22.391578 228
[48] {ALARM CLOCK BAKELIKE GREEN} => {ALARM CLOCK BAKELIKE RED} 0.0184 0.6279863 0.0293 18.202503 184
[49] {ALARM CLOCK BAKELIKE RED} => {ALARM CLOCK BAKELIKE GREEN} 0.0184 0.5333333 0.0345 18.202503 184
[50] {JUMBO STORAGE BAG SUKI} => {JUMBO BAG RED RETROSPOT} 0.0168 0.5472313 0.0307 8.497380 168
[51] {JUMBO BAG PINK POLKADOT} => {JUMBO BAG RED RETROSPOT} 0.0211 0.6187683 0.0341 9.608204 211
[52] {LUNCH BAG WOODLAND} => {LUNCH BAG RED RETROSPOT} 0.0155 0.5115512 0.0303 9.894606 155
[53] {LUNCH BAG SUKI DESIGN} => {LUNCH BAG RED RETROSPOT} 0.0172 0.5043988 0.0341 9.756264 172
[54] {LUNCH BAG PINK POLKADOT} => {LUNCH BAG RED RETROSPOT} 0.0195 0.5524079 0.0353 10.684873 195
[55] {SET 3 RETROSPOT TEA,
SUGAR} => {COFFEE} 0.0108 1.0000000 0.0108 64.102564 108
[56] {COFFEE,
SUGAR} => {SET 3 RETROSPOT TEA} 0.0108 1.0000000 0.0108 92.592593 108
[57] {COFFEE,
SET 3 RETROSPOT TEA} => {SUGAR} 0.0108 1.0000000 0.0108 92.592593 108
[58] {GREEN REGENCY TEACUP AND SAUCER,
PINK REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} 0.0170 0.8629442 0.0197 25.085586 170
[59] {PINK REGENCY TEACUP AND SAUCER,
ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.0170 0.8854167 0.0192 29.912725 170
[60] {GREEN REGENCY TEACUP AND SAUCER,
ROSES REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} 0.0170 0.7456140 0.0228 29.705738 170
[61] {GREEN REGENCY TEACUP AND SAUCER,
PINK REGENCY TEACUP AND SAUCER} => {REGENCY CAKESTAND 3 TIER} 0.0108 0.5482234 0.0197 7.055642 108
[62] {PINK REGENCY TEACUP AND SAUCER,
REGENCY CAKESTAND 3 TIER} => {GREEN REGENCY TEACUP AND SAUCER} 0.0108 0.8571429 0.0126 28.957529 108
[63] {GREEN REGENCY TEACUP AND SAUCER,
REGENCY CAKESTAND 3 TIER} => {PINK REGENCY TEACUP AND SAUCER} 0.0108 0.7397260 0.0146 29.471156 108
[64] {PINK REGENCY TEACUP AND SAUCER,
ROSES REGENCY TEACUP AND SAUCER} => {REGENCY CAKESTAND 3 TIER} 0.0107 0.5572917 0.0192 7.172351 107
[65] {PINK REGENCY TEACUP AND SAUCER,
REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} 0.0107 0.8492063 0.0126 24.686231 107
[66] {REGENCY CAKESTAND 3 TIER,
ROSES REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} 0.0107 0.6369048 0.0168 25.374692 107
[67] {GREEN REGENCY TEACUP AND SAUCER,
ROSES REGENCY TEACUP AND SAUCER} => {REGENCY CAKESTAND 3 TIER} 0.0120 0.5263158 0.0228 6.773691 120
[68] {GREEN REGENCY TEACUP AND SAUCER,
REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} 0.0120 0.8219178 0.0146 23.892960 120
[69] {REGENCY CAKESTAND 3 TIER,
ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.0120 0.7142857 0.0168 24.131274 120
[70] {LUNCH BAG BLACK SKULL,
LUNCH BAG PINK POLKADOT} => {LUNCH BAG RED RETROSPOT} 0.0101 0.6601307 0.0153 12.768486 101
[71] {LUNCH BAG PINK POLKADOT,
LUNCH BAG RED RETROSPOT} => {LUNCH BAG BLACK SKULL} 0.0101 0.5179487 0.0195 12.916427 101
[72] {LUNCH BAG BLACK SKULL,
LUNCH BAG RED RETROSPOT} => {LUNCH BAG PINK POLKADOT} 0.0101 0.5260417 0.0192 14.902030 101
- i. If a customer buys sugar, they will also buy Retrospot tea.
- support = 0.0108, which means this rule covers 1% of transactions. Confidence is 1, which means it is correct in 100% purchases involving sugar.
- Lift is 92.59259, which means that transaction containing sugar makes 92 times more possible for tea to be bought as well.
Coffee, tea and sugar being interconnected feel as trivial associations as they are known combination often bought together.
Different colored pink x blue hand warmers, alarm clocks and other clothes and accessories are often bought together revealing an opportunity to implement bundle deals (girl x boy sets).
7
teacup_rules <- subset(retail_rules, items %in% "GREEN REGENCY TEACUP AND SAUCER")
inspect(teacup_rules) lhs rhs support confidence coverage lift count
[1] {PINK REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.0197 0.7848606 0.0251 26.515559 197
[2] {GREEN REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} 0.0197 0.6655405 0.0296 26.515559 197
[3] {GREEN REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} 0.0228 0.7702703 0.0296 22.391578 228
[4] {ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.0228 0.6627907 0.0344 22.391578 228
[5] {GREEN REGENCY TEACUP AND SAUCER,
PINK REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} 0.0170 0.8629442 0.0197 25.085586 170
[6] {PINK REGENCY TEACUP AND SAUCER,
ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.0170 0.8854167 0.0192 29.912725 170
[7] {GREEN REGENCY TEACUP AND SAUCER,
ROSES REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} 0.0170 0.7456140 0.0228 29.705738 170
[8] {GREEN REGENCY TEACUP AND SAUCER,
PINK REGENCY TEACUP AND SAUCER} => {REGENCY CAKESTAND 3 TIER} 0.0108 0.5482234 0.0197 7.055642 108
[9] {PINK REGENCY TEACUP AND SAUCER,
REGENCY CAKESTAND 3 TIER} => {GREEN REGENCY TEACUP AND SAUCER} 0.0108 0.8571429 0.0126 28.957529 108
[10] {GREEN REGENCY TEACUP AND SAUCER,
REGENCY CAKESTAND 3 TIER} => {PINK REGENCY TEACUP AND SAUCER} 0.0108 0.7397260 0.0146 29.471156 108
[11] {GREEN REGENCY TEACUP AND SAUCER,
ROSES REGENCY TEACUP AND SAUCER} => {REGENCY CAKESTAND 3 TIER} 0.0120 0.5263158 0.0228 6.773691 120
[12] {GREEN REGENCY TEACUP AND SAUCER,
REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} 0.0120 0.8219178 0.0146 23.892960 120
[13] {REGENCY CAKESTAND 3 TIER,
ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.0120 0.7142857 0.0168 24.131274 120
If a customer purchases Green regency teacup and saucer, they are more likely to purchase the pink and rose version and regency cake stand 3 tier.
COLLABORATIVE FILTERING
1,2.
steam_ratings <- read_csv("steam_ratings.csv")
steam_ratings <- as(steam_ratings, "matrix")
steam_ratings <- as(steam_ratings, "realRatingMatrix")
steam_ratings2080 x 1581 rating matrix of class 'realRatingMatrix' with 52414 ratings.
3
a.
vector_ratings <- as.vector(steam_ratings@data)
table(vector_ratings)vector_ratings
0 1 2 3 4 5
3236066 4773 12500 19762 10655 4724
In total, there is 52414 ratings. 3 236 066 of 0 value are missing ratings. 4 773 rated 1, 12 500 rated 2, 19 762, the most common, rated 3, 10 655 rated 4 and 4 724 rated 5.
colMeans(steam_ratings) %>%
tibble::enframe(name = "steam", value = "steam_ratings") %>%
ggplot() +
geom_histogram(mapping = aes(x = steam_ratings), color = "white")hist(as.vector(as.matrix(rowCounts(steam_ratings))), main = "Distribution of Steam Ratings",
col = "lightblue", xlab = "Ratings")4
a) b) i, ii, iii, iv)
set.seed(101)
eval_steam = evaluationScheme(data = steam_ratings,
method = "split",
train = 0.8,
given = 6,
goodRating = 3) train_steam <- getData(eval_steam, "train")
known_steam <- getData(eval_steam, "known")
unknown_steam <- getData(eval_steam, "unknown")5
ubcf1_model <- Recommender(data = train_steam,
method = "UBCF",
parameter = list(normalize = "center", method = "Cosine"))
ubcf1_predict <- predict(object = ubcf1_model,
newdata = known_steam,
type = "ratings")
ubcf1_eval <- calcPredictionAccuracy(x = ubcf1_predict,
data = unknown_steam)
ubcf1_eval RMSE MSE MAE
1.1691627 1.3669415 0.9175286
The predicted ratings from the UBCF model are off by approximately 0.92 of a rating.
ubcf2_model <- Recommender(data = train_steam,
method = "UBCF",
parameter = list(normalize = NULL, method = "Cosine"))
ubcf2_predict <- predict(object = ubcf2_model,
newdata = known_steam,
type = "ratings")
ubcf2_eval <- calcPredictionAccuracy(x = ubcf2_predict,
data = unknown_steam)
ubcf2_eval RMSE MSE MAE
1.0793268 1.1649463 0.8189319
The predicted ratings from the UBCF model are off by approximately 0.82 of a rating.
ubcf3_model <- Recommender(data = train_steam,
method = "UBCF",
parameter = list(normalize = "Z-score", method = "Cosine"))
ubcf3_predict <- predict(object = ubcf3_model,
newdata = known_steam,
type = "ratings")
ubcf3_eval <- calcPredictionAccuracy(x = ubcf3_predict,
data = unknown_steam)
ubcf3_eval RMSE MSE MAE
1.1843251 1.4026258 0.9230579
The predicted ratings from the UBCF model are off by approximately 0.92 of a rating.
ubcf4_model <- Recommender(data = train_steam,
method = "UBCF",
parameter = list(normalize = "center", method = "Euclidean"))
ubcf4_predict <- predict(object = ubcf4_model,
newdata = known_steam,
type = "ratings")
ubcf4_eval <- calcPredictionAccuracy(x = ubcf4_predict,
data = unknown_steam)
ubcf4_eval RMSE MSE MAE
1.1892017 1.4142006 0.9145427
The predicted ratings from the UBCF model are off by approximately 0.91 of a rating.
ubcf5_model <- Recommender(data = train_steam,
method = "UBCF",
parameter = list(normalize = NULL, method = "Euclidean"))
ubcf5_predict <- predict(object = ubcf5_model,
newdata = known_steam,
type = "ratings")
ubcf5_eval <- calcPredictionAccuracy(x = ubcf5_predict,
data = unknown_steam)
ubcf5_eval RMSE MSE MAE
1.0990975 1.2080152 0.8294308
The predicted ratings from the UBCF model are off by approximately 0.83 of a rating.
ubcf6_model <- Recommender(data = train_steam,
method = "UBCF",
parameter = list(normalize = "Z-score", method = "Euclidean"))
ubcf6_predict <- predict(object = ubcf6_model,
newdata = known_steam,
type = "ratings")
ubcf6_eval <- calcPredictionAccuracy(x = ubcf6_predict,
data = unknown_steam)
ubcf6_eval RMSE MSE MAE
1.2103043 1.4648366 0.9309755
The predicted ratings from the UBCF model are off by approximately 0.93 of a rating.
ubcf7_model <- Recommender(data = train_steam,
method = "UBCF",
parameter = list(normalize = "center", method = "pearson"))
ubcf7_predict <- predict(object = ubcf7_model,
newdata = known_steam,
type = "ratings")
ubcf7_eval <- calcPredictionAccuracy(x = ubcf7_predict,
data = unknown_steam)
ubcf7_eval RMSE MSE MAE
1.1209660 1.2565649 0.8720308
The predicted ratings from the UBCF model are off by approximately 0.87 of a rating.
ubcf8_model <- Recommender(data = train_steam,
method = "UBCF",
parameter = list(normalize = NULL, method = "pearson"))
ubcf8_predict <- predict(object = ubcf8_model,
newdata = known_steam,
type = "ratings")
ubcf8_eval <- calcPredictionAccuracy(x = ubcf8_predict,
data = unknown_steam)
ubcf8_eval RMSE MSE MAE
1.0949035 1.1988137 0.8284463
The predicted ratings from the UBCF model are off by approximately 0.83 of a rating.
ubcf9_model <- Recommender(data = train_steam,
method = "UBCF",
parameter = list(normalize = "Z-score", method = "pearson"))
ubcf9_predict <- predict(object = ubcf9_model,
newdata = known_steam,
type = "ratings")
ubcf9_eval <- calcPredictionAccuracy(x = ubcf9_predict,
data = unknown_steam)
ubcf9_eval RMSE MSE MAE
1.1308570 1.2788376 0.8754739
The predicted ratings from the UBCF model are off by approximately 0.88 of a rating. 6.
ibcf1_model <- Recommender(data = train_steam,
method = "IBCF",
parameter = list(normalize = "center", method = "Cosine"))
ibcf1_predict <- predict(object = ibcf1_model,
newdata = known_steam,
type = "ratings")
ibcf1_eval <- calcPredictionAccuracy(x = ibcf1_predict,
data = unknown_steam)
ibcf1_eval RMSE MSE MAE
1.500975 2.252927 1.165031
The predicted ratings from the IBCF model are off by approximately 1.17 of a rating.
ibcf2_model <- Recommender(data = train_steam,
method = "IBCF",
parameter = list(normalize = "Z-score", method = "Cosine"))
ibcf2_predict <- predict(object = ibcf2_model,
newdata = known_steam,
type = "ratings")
ibcf2_eval <- calcPredictionAccuracy(x = ibcf2_predict,
data = unknown_steam)
ibcf2_eval RMSE MSE MAE
1.500865 2.252596 1.166651
The predicted ratings from the IBCF model are off by approximately 1.17 of a rating.
ibcf3_model <- Recommender(data = train_steam,
method = "IBCF",
parameter = list(normalize = NULL, method = "Cosine"))
ibcf3_predict <- predict(object = ibcf3_model,
newdata = known_steam,
type = "ratings")
ibcf3_eval <- calcPredictionAccuracy(x = ibcf3_predict,
data = unknown_steam)
ibcf3_eval RMSE MSE MAE
1.587257 2.519385 1.239649
The predicted ratings from the IBCF model are off by approximately 1.24 of a rating.
ibcf4_model <- Recommender(data = train_steam,
method = "IBCF",
parameter = list(normalize = "center", method = "Euclidean"))
ibcf4_predict <- predict(object = ibcf4_model,
newdata = known_steam,
type = "ratings")
ibcf4_eval <- calcPredictionAccuracy(x = ibcf4_predict,
data = unknown_steam)
ibcf4_eval RMSE MSE MAE
1.476175 2.179092 1.140654
The predicted ratings from the IBCF model are off by approximately 1.14 of a rating.
ibcf5_model <- Recommender(data = train_steam,
method = "IBCF",
parameter = list(normalize = "Z-score", method = "Euclidean"))
ibcf5_predict <- predict(object = ibcf5_model,
newdata = known_steam,
type = "ratings")
ibcf5_eval <- calcPredictionAccuracy(x = ibcf5_predict,
data = unknown_steam)
ibcf5_eval RMSE MSE MAE
1.474962 2.175512 1.140897
The predicted ratings from the IBCF model are off by approximately 1.14 of a rating.
ibcf6_model <- Recommender(data = train_steam,
method = "IBCF",
parameter = list(normalize = NULL, method = "Euclidean"))
ibcf6_predict <- predict(object = ibcf6_model,
newdata = known_steam,
type = "ratings")
ibcf6_eval <- calcPredictionAccuracy(x = ibcf6_predict,
data = unknown_steam)
ibcf6_eval RMSE MSE MAE
1.476175 2.179092 1.140654
The predicted ratings from the IBCF model are off by approximately 1.14 of a rating.
ibcf7_model <- Recommender(data = train_steam,
method = "IBCF",
parameter = list(normalize = "center", method = "pearson"))
ibcf7_predict <- predict(object = ibcf7_model,
newdata = known_steam,
type = "ratings")
ibcf7_eval <- calcPredictionAccuracy(x = ibcf7_predict,
data = unknown_steam)
ibcf7_eval RMSE MSE MAE
1.473200 2.170317 1.162027
The predicted ratings from the IBCF model are off by approximately 1.62 of a rating.
ibcf8_model <- Recommender(data = train_steam,
method = "IBCF",
parameter = list(normalize = "Z-score", method = "pearson"))
ibcf8_predict <- predict(object = ibcf8_model,
newdata = known_steam,
type = "ratings")
ibcf8_eval <- calcPredictionAccuracy(x = ibcf8_predict,
data = unknown_steam)
ibcf8_eval RMSE MSE MAE
1.473052 2.169883 1.158043
The predicted ratings from the IBCF model are off by approximately 1.16 of a rating.
ibcf9_model <- Recommender(data = train_steam,
method = "IBCF",
parameter = list(normalize = NULL, method = "pearson"))
ibcf9_predict <- predict(object = ibcf9_model,
newdata = known_steam,
type = "ratings")
ibcf9_eval <- calcPredictionAccuracy(x = ibcf9_predict,
data = unknown_steam)
ibcf9_eval RMSE MSE MAE
1.465543 2.147817 1.154197
The predicted ratings from the IBCF model are off by approximately 1.15 of a rating.
7
The best model to generate recommendations from is UBCF model number 2, Cosine method, NULL normalisation with the lowest MAE of 0.82.
ubcf2_recs <- predict(object = ubcf2_model,
newdata = known_steam,
type = "topNList",
n = 3)
head(as(ubcf2_recs, "list"), 5)$`0`
[1] "Bridge Constructor" "Car Mechanic Simulator 2014"
[3] "Democracy 3"
$`1`
[1] "8BitMMO" "Airline Tycoon 2"
[3] "Alan Wake's American Nightmare"
$`2`
[1] "Cogs" "FINAL FANTASY VII" "Frozen Hearth"
$`3`
[1] "12 Labours of Hercules"
[2] "12 Labours of Hercules II The Cretan Bull"
[3] "Age of Empires Online"
$`4`
[1] "Airline Tycoon 2" "BattleBlock Theater" "Bridge Constructor"
User 1 top 3 recommendations are “Bridge Constructor”, “Car Mechanic Simulator 2014”, “Democracy 3”.
User 2 top 3 recommendations are “8BitMMO”, “Airline Tycoon 2, Alan Wake’s American Nightmare”. User 3 top 3 recommendations are “Cogs”, “FINAL FANTASY VII”, “Frozen Hearth”. User 4 top 3 recommendations are “12 Labours of Hercules”, “12 Labours of Hercules II The Cretan Bull”, “Age of Empires Online”. User 5 top 3 recommendations are “Airline Tycoon 2”, “BattleBlock Theater”, “Bridge Constructor”.
8
Lot of rating is empty, so encouraging people to rate could always bring more accurate results. Collaborative filtering model shows that Steam can use past user rating behaviour to predict which games a user is most likely to enjoy next. The best-performing model was User-Based Collaborative Filtering (UBCF) model 2 using Cosine similarity with no normalisation with lowest MAE of 0.82, which means that on average, its predicted ratings were off by about 0.82 rating points.
Model will be useful for cross-selling and upselling purposes as well as user retention.