library(arules)
retail <- read.transactions("retail_transactions_3.csv", sep = ",")Recommendation Engines
0.1 Part A - Market Basket Analysis
1 1
2 2
3 (a)
There is 10000 transactions in the dataset.
4 (b)
There is 5479 possible items available to be purchased.
5 (c)
The sparse matrix contains 0.002744552 cells. 2.7% cells contain a non zero value. 100005479 = 54,790,000 cells in the sparse matrix. 54,790,0000.002744552 = 150,374 of the cells have a non zero value, which means 150,374 items were purchased on the online retail store.
6 (d)
The largest number of items purchased in a single transaction is WHITE HANGING HEART T-LIGHT HOLDER. It was purchased 822 times out of 10,000 transactions, which means whole milk appeared in 822/10,000 = 8.22% of all transactions.
7 (e)
The mean number of items purchased in a single transaction is 15.04.
8 3
itemFrequencyPlot(retail, topN = 20, horiz = T)9 4
10 (a)
There was 86 rules discovered in the dataset.
11 (b)
The support threshold of 0.01 means that an item appeared at least 1% of the 10,000 transactions. It is the smallest number of transactions.
12 (c)
The confidence threshold of 0.5 means that in order for a rule to be included in the results, then it must appear in 50% of the transactions.
13 5
14 (a)
47 rules have 2 items while 39 rules have 3 items.
15 (b)
The minimum lift value for a rule is 6.58 while the maximum lift value is 44.05.
16 6
17 (a)
18 (i)
A simple sentence to explain the rule is: if a customer buys a Wooden Star Christmas Scandinavian, they will also buy a Wooden Heart Christmas Scandinavian.
19 (ii)
The support values of this rule is 0.0113 which means this rule covers 1.13% of transactions and the confidence values of this rule is 0.7533333 which means it is correct in 75% of purchases involving Wooden Star Christmas Scandinavian.
20 (iii)
The rule has a lift of 44.054581 which means knowing a transaction includes Wooden Star Christmas Scandinavian makes it 44.05 times more likely the transaction also includes a Wooden Heart Christmas Scandinavian.
21 (b)
The rules i consider Trivial are customers who buys a Green Regency Teacup And Saucer will also buy a Pink Regency Teacup And Saucer. It is obvious customers buy them together and not worth mentioning.
Another example are customers who buy Wooden Star Christmas Scandinavian will also buy Wooden Heart Christmas Scandinavian. It is quite obvious that customers will be buy these decorations together so it is not worth mentioning.
22 (c)
The rule i consider actionable are customers who buy a Lunch Bag Cars Blue, Lunch Bag Retrospot will also buy a Lunch Bag Pink Polkadot. Management can offer bundle offers of Buy 2 get 1 Free to encourage customers to buy more items in one transaction rather than just one.
Another example are customers who buy Charlotte Bag Pink Polkadot, also buy Red Retrospot Charlotte Bag. Management can recommend products on their website suggesting similar designs or colours they may like to add to cart.
23 7
If a customer purchases this item, then they are most likely to purchase the Green or Pink Regency Teacup And Saucer and a Regency Cakestand 3 Tier.
23.1 Part B - Collaborative Filtering
24 1
library(recommenderlab)
library(tidyverse)25 2
steam_ratings <- read_csv("steam_ratings.csv")
steam_ratings <- as(steam_ratings, "matrix")
steam_ratings <- as(steam_ratings, "realRatingMatrix")26 3
vector_ratings <- as.vector(steam_ratings@data)
table(vector_ratings)vector_ratings
0 1 2 3 4 5
3236066 4773 12500 19762 10655 4724
27 (a)
The rating of 1 was awarded 4773 times. The rating of 2 was awarded 12500 times. The rating of 3 was awarded 19762 times. The rating of 4 was awarded 10655 times. The rating of 5 was awarded 4724 times.
28 (b)
colMeans(steam_ratings) %>%
tibble::enframe(name = "game", value = "game_rating") %>%
ggplot() +
geom_histogram(mapping = aes(x = game_rating), color = "white")29 (c)
rowCounts(steam_ratings) %>%
tibble::enframe(name = "game", value = "game_rating") %>%
ggplot() +
geom_histogram(mapping = aes(x = game_rating), color = "white")30 4
31 (a)/(b)
set.seed(101)
eval_games <- evaluationScheme(data = steam_ratings,
method = "split",
train = 0.8,
given = 6,
goodRating = 3)32 (c)
train_games <- getData(eval_games, "train")
known_games <- getData(eval_games, "known")
unknown_games <- getData(eval_games, "unknown")32.1 UBCF MODELS: Cosine
33 5
34 (a) - Center
ubcf_model1 <- Recommender(data = train_games,
method = "UBCF",
parameter = list(normalize = "center", method = "Cosine"))
ubcf_predict1 <- predict(object = ubcf_model1,
newdata = known_games,
type = "ratings")35 (b)
ubcf_eval1 <- calcPredictionAccuracy(x = ubcf_predict1,
data = unknown_games)
ubcf_eval1 RMSE MSE MAE
1.1691627 1.3669415 0.9175286
This UBCF model produces an RMSE of 1.16 and MAE of 0.91. The MAE is more intuitive than RMSE because an MAE of 0.92 means that on average, the predicted ratings from the UBCF model are off by approximately 0.92 of a rating.
36 (a) - non-normalised
ubcf_model2 <- Recommender(data = train_games,
method = "UBCF",
parameter= list(normalize = NULL, method="Cosine"))
ubcf_predict2 <- predict(object = ubcf_model2,
newdata = known_games,
type = "ratings")37 (b)
ubcf_eval2 <- calcPredictionAccuracy(x = ubcf_predict2,
data = unknown_games)
ubcf_eval2 RMSE MSE MAE
1.0793268 1.1649463 0.8189319
This UBCF model produces an RMSE of 1.07 and MAE of 0.81. The MAE is more intuitive than RMSE because an MAE of 0.81 means that on average, the predicted ratings from the UBCF model are off by approximately 0.81 of a rating. We will use this UBCF model to generate game recommendations.
38 (a) - Z-score normalisation
ubcf_model3 <- Recommender(data = train_games,
method = "UBCF",
parameter=list(normalize = "Z-score",method="Cosine"))
ubcf_predict3 <- predict(object = ubcf_model3,
newdata = known_games,
type = "ratings")39 (b)
ubcf_eval3 <- calcPredictionAccuracy(x = ubcf_predict3,
data = unknown_games)
ubcf_eval3 RMSE MSE MAE
1.1845437 1.4031437 0.9238983
This UBCF model produces an RMSE of 1.18 and MAE of 0.92. The MAE is more intuitive than RMSE because an MAE of 0.92 means that on average, the predicted ratings from the UBCF model are off by approximately 0.92 of a rating.
39.1 Euclidean Distance
40 (a) - non-normalised
ubcf_model4 <- Recommender(data = train_games,
method = "UBCF",
parameter= list(normalize = NULL, method="Euclidean"))
ubcf_predict4 <- predict(object = ubcf_model4,
newdata = known_games,
type = "ratings")41 (b)
ubcf_eval4 <- calcPredictionAccuracy(x = ubcf_predict4,
data = unknown_games)
ubcf_eval4 RMSE MSE MAE
1.0990975 1.2080152 0.8294308
This UBCF model produces an RMSE of 1.09 and MAE of 0.82. The MAE is more intuitive than RMSE because an MAE of 0.82 means that on average, the predicted ratings from the UBCF model are off by approximately 0.82 of a rating.
42 (a) - center
ubcf_model5 <- Recommender(data = train_games,
method = "UBCF",
parameter= list(normalize = "center", method="Euclidean"))
ubcf_predict5 <- predict(object = ubcf_model5,
newdata = known_games,
type = "ratings")43 (b)
ubcf_eval5 <- calcPredictionAccuracy(x = ubcf_predict5,
data = unknown_games)
ubcf_eval5 RMSE MSE MAE
1.1892017 1.4142006 0.9145427
This UBCF model produces an RMSE of 1.18 and MAE of 0.91. The MAE is more intuitive than RMSE because an MAE of 0.91 means that on average, the predicted ratings from the UBCF model are off by approximately 0.91 of a rating.
44 (a) - Z-score normalisation
ubcf_model6 <- Recommender(data = train_games,
method = "UBCF",
parameter= list(normalize = "Z-score", method="Euclidean"))
ubcf_predict6 <- predict(object = ubcf_model6,
newdata = known_games,
type = "ratings")45 (b)
ubcf_eval6 <- calcPredictionAccuracy(x = ubcf_predict6,
data = unknown_games)
ubcf_eval6 RMSE MSE MAE
1.2103032 1.4648339 0.9309623
This UBCF model produces an RMSE of 1.21 and MAE of 0.93. The MAE is more intuitive than RMSE because an MAE of 0.93 means that on average, the predicted ratings from the UBCF model are off by approximately 0.93 of a rating.
45.1 Pearson Correlation
46 (a) - non-normalised
ubcf_model7 <- Recommender(data = train_games,
method = "UBCF",
parameter= list(normalize = NULL, method="pearson"))
ubcf_predict7 <- predict(object = ubcf_model7,
newdata = known_games,
type = "ratings")47 (b)
ubcf_eval7 <- calcPredictionAccuracy(x = ubcf_predict7,
data = unknown_games)
ubcf_eval7 RMSE MSE MAE
1.1086429 1.2290892 0.8349371
This UBCF model produces an RMSE of 1.10 and MAE of 0.83. The MAE is more intuitive than RMSE because an MAE of 0.83 means that on average, the predicted ratings from the UBCF model are off by approximately 0.83 of a rating.
48 (a) - centered
ubcf_model8 <- Recommender(data = train_games,
method = "UBCF",
parameter= list(normalize = "center", method="pearson"))
ubcf_predict8 <- predict(object = ubcf_model8,
newdata = known_games,
type = "ratings")49 (b)
ubcf_eval8 <- calcPredictionAccuracy(x = ubcf_predict8,
data = unknown_games)
ubcf_eval8 RMSE MSE MAE
1.121214 1.257121 0.870192
This UBCF model produces an RMSE of 1.12 and MAE of 0.87. The MAE is more intuitive than RMSE because an MAE of 0.87 means that on average, the predicted ratings from the UBCF model are off by approximately 0.87 of a rating.
50 (a) - Z-score normalisation
ubcf_model9 <- Recommender(data = train_games,
method = "UBCF",
parameter= list(normalize = "Z-score", method="pearson"))
ubcf_predict9 <- predict(object = ubcf_model9,
newdata = known_games,
type = "ratings")51 (b)
ubcf_eval9 <- calcPredictionAccuracy(x = ubcf_predict9,
data = unknown_games)
ubcf_eval9 RMSE MSE MAE
1.133585 1.285015 0.877884
This UBCF model produces an RMSE of 1.13 and MAE of 0.87. The MAE is more intuitive than RMSE because an MAE of 0.87 means that on average, the predicted ratings from the UBCF model are off by approximately 0.87 of a rating.
51.1 IBCF MODELS: Cosine
52 6
53 (a) - non-normalised
ibcf_model1 <- Recommender(data = train_games,
method = "IBCF",
parameter= list(normalize = NULL, method="Cosine"))
ibcf_predict1 <- predict(object = ibcf_model1,
newdata = known_games,
type = "ratings")54 (b)
ibcf_eval1 <- calcPredictionAccuracy(x = ibcf_predict1,
data = unknown_games)
ibcf_eval1 RMSE MSE MAE
1.587257 2.519385 1.239649
The IBCF model produces an MAE value of 1.23, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.2 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.
55 (a) - center
ibcf_model2 <- Recommender(data = train_games,
method = "IBCF",
parameter= list(normalize = "center", method="Cosine"))
ibcf_predict2 <- predict(object = ibcf_model2,
newdata = known_games,
type = "ratings")56 (b)
ibcf_eval2 <- calcPredictionAccuracy(x = ibcf_predict2,
data = unknown_games)
ibcf_eval2 RMSE MSE MAE
1.506323 2.269009 1.171511
The IBCF model produces an MAE value of 1.17, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.
57 (a) - Z-score normalisation
ibcf_model3 <- Recommender(data = train_games,
method = "IBCF",
parameter= list(normalize = "Z-score", method="Cosine"))
ibcf_predict3 <- predict(object = ibcf_model3,
newdata = known_games,
type = "ratings")58 (b)
ibcf_eval3 <- calcPredictionAccuracy(x = ibcf_predict3,
data = unknown_games)
ibcf_eval3 RMSE MSE MAE
1.501380 2.254142 1.166580
The IBCF model produces an MAE value of 1.16, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.
58.1 Euclidean Distance
59 (a) - non-normalised
ibcf_model4 <- Recommender(data = train_games,
method = "IBCF",
parameter= list(normalize = NULL, method="Euclidean"))
ibcf_predict4 <- predict(object = ibcf_model4,
newdata = known_games,
type = "ratings")60 (b)
ibcf_eval4 <- calcPredictionAccuracy(x = ibcf_predict4,
data = unknown_games)
ibcf_eval4 RMSE MSE MAE
1.476175 2.179092 1.140654
The IBCF model produces an MAE value of 1.14, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.
61 (a) - center
ibcf_model5 <- Recommender(data = train_games,
method = "IBCF",
parameter= list(normalize = "center", method="Euclidean"))
ibcf_predict5 <- predict(object = ibcf_model5,
newdata = known_games,
type = "ratings")62 (b)
ibcf_eval5 <- calcPredictionAccuracy(x = ibcf_predict5,
data = unknown_games)
ibcf_eval5 RMSE MSE MAE
1.476175 2.179092 1.140654
The IBCF model produces an MAE value of 1.14, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.
63 (a) Z-score normalisation
ibcf_model6 <- Recommender(data = train_games,
method = "IBCF",
parameter= list(normalize = "Z-score", method="Euclidean"))
ibcf_predict6 <- predict(object = ibcf_model6,
newdata = known_games,
type = "ratings")64 (b)
ibcf_eval6 <- calcPredictionAccuracy(x = ibcf_predict6,
data = unknown_games)
ibcf_eval6 RMSE MSE MAE
1.474962 2.175512 1.140897
The IBCF model produces an MAE value of 1.14, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.
64.1 Pearson Correlation
65 (a) - non-normalised
ibcf_model7 <- Recommender(data = train_games,
method = "IBCF",
parameter= list(normalize = NULL, method="pearson"))
ibcf_predict7 <- predict(object = ibcf_model7,
newdata = known_games,
type = "ratings")66 (b)
ibcf_eval7 <- calcPredictionAccuracy(x = ibcf_predict7,
data = unknown_games)
ibcf_eval7 RMSE MSE MAE
1.456788 2.122230 1.152312
The IBCF model produces an MAE value of 1.15, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.
67 (a) - center
ibcf_model8 <- Recommender(data = train_games,
method = "IBCF",
parameter= list(normalize = "center", method="pearson"))
ibcf_predict8 <- predict(object = ibcf_model8,
newdata = known_games,
type = "ratings")68 (b)
ibcf_eval8 <- calcPredictionAccuracy(x = ibcf_predict8,
data = unknown_games)
ibcf_eval8 RMSE MSE MAE
1.470169 2.161397 1.158908
The IBCF model produces an MAE value of 1.15, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.
69 (a) - Z-score normalisation
ibcf_model9 <- Recommender(data = train_games,
method = "IBCF",
parameter= list(normalize = "Z-score", method="pearson"))
ibcf_predict9 <- predict(object = ibcf_model9,
newdata = known_games,
type = "ratings")70 (b)
ibcf_eval9 <- calcPredictionAccuracy(x = ibcf_predict9,
data = unknown_games)
ibcf_eval9 RMSE MSE MAE
1.467355 2.153130 1.158796
The IBCF model produces an MAE value of 1.15, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.
71 7
The model with the best MAE score is the UBCF model of the Cosine model that produced 0.8189319. This means that on average, the predicted ratings from the UBCF model are off by approximately 0.81 of a rating. Therefore, the UBCF model performs better than than the IBCF model to generate game recommendations.
The top 3 game recommendations for User 1 is “Bridge Constructor”, “Car Mechanic Simulator 2014” & “Democracy 3”.
The top 3 game recommendations for User 2 is “8BitMMO”, “Airline Tycoon 2” & “Alan Wake’s American Nightmare”.
The top 3 game recommendations for User 3 is “Cogs”, “FINAL FANTASY VII” & “Frozen Hearth”.
The top 3 game recommendations for User 4 is “12 Labours of Hercules”, “12 Labours of Hercules II The Cretan Bull” & “Age of Empires Online”.
The top 3 game recommendations for User 5 is “Airline Tycoon 2”, “BattleBlock Theater” & “Bridge Constructor”.
72 8
Steam can use the collaborative filtering model to display recommended games on the homepage or user dashboard. For example, if a user has previously enjoyed certain types of games, the system can suggest similar games that other users with similar preferences have rated highly. This can increase user engagement by helping users discover new games they are likely to enjoy.
Steam can also use these recommendations in targeted marketing campaigns, such as personalised emails or notifications. By recommending games that match a user’s interests, Steam can increase the likelihood of users returning to the platform and making purchases again.
Although the model has some prediction error shown by the RMSE, MSE, and MAE values, it still provides useful insights into user preferences and can significantly improve recommendation quality.
Overall, collaborative filtering allows Steam to deliver personalised recommendations, which can increase user satisfaction, engagement, and ultimately drive higher sales.