Recommendation Engines

Author

Rosemary Francis

0.1 Part A - Market Basket Analysis

1 1

library(arules)
retail <- read.transactions("retail_transactions_3.csv", sep = ",")

2 2

3 (a)

There is 10000 transactions in the dataset.

4 (b)

There is 5479 possible items available to be purchased.

5 (c)

The sparse matrix contains 0.002744552 cells. 2.7% cells contain a non zero value. 100005479 = 54,790,000 cells in the sparse matrix. 54,790,0000.002744552 = 150,374 of the cells have a non zero value, which means 150,374 items were purchased on the online retail store.

6 (d)

The largest number of items purchased in a single transaction is WHITE HANGING HEART T-LIGHT HOLDER. It was purchased 822 times out of 10,000 transactions, which means whole milk appeared in 822/10,000 = 8.22% of all transactions.

7 (e)

The mean number of items purchased in a single transaction is 15.04.

8 3

itemFrequencyPlot(retail, topN = 20, horiz = T)

9 4

10 (a)

There was 86 rules discovered in the dataset.

11 (b)

The support threshold of 0.01 means that an item appeared at least 1% of the 10,000 transactions. It is the smallest number of transactions.

12 (c)

The confidence threshold of 0.5 means that in order for a rule to be included in the results, then it must appear in 50% of the transactions.

13 5

14 (a)

47 rules have 2 items while 39 rules have 3 items.

15 (b)

The minimum lift value for a rule is 6.58 while the maximum lift value is 44.05.

16 6

17 (a)

18 (i)

A simple sentence to explain the rule is: if a customer buys a Wooden Star Christmas Scandinavian, they will also buy a Wooden Heart Christmas Scandinavian.

19 (ii)

The support values of this rule is 0.0113 which means this rule covers 1.13% of transactions and the confidence values of this rule is 0.7533333 which means it is correct in 75% of purchases involving Wooden Star Christmas Scandinavian.

20 (iii)

The rule has a lift of 44.054581 which means knowing a transaction includes Wooden Star Christmas Scandinavian makes it 44.05 times more likely the transaction also includes a Wooden Heart Christmas Scandinavian.

21 (b)

The rules i consider Trivial are customers who buys a Green Regency Teacup And Saucer will also buy a Pink Regency Teacup And Saucer. It is obvious customers buy them together and not worth mentioning.

Another example are customers who buy Wooden Star Christmas Scandinavian will also buy Wooden Heart Christmas Scandinavian. It is quite obvious that customers will be buy these decorations together so it is not worth mentioning.

22 (c)

The rule i consider actionable are customers who buy a Lunch Bag Cars Blue, Lunch Bag Retrospot will also buy a Lunch Bag Pink Polkadot. Management can offer bundle offers of Buy 2 get 1 Free to encourage customers to buy more items in one transaction rather than just one.

Another example are customers who buy Charlotte Bag Pink Polkadot, also buy Red Retrospot Charlotte Bag. Management can recommend products on their website suggesting similar designs or colours they may like to add to cart.

23 7

If a customer purchases this item, then they are most likely to purchase the Green or Pink Regency Teacup And Saucer and a Regency Cakestand 3 Tier.

23.1 Part B - Collaborative Filtering

24 1

library(recommenderlab)
library(tidyverse)

25 2

steam_ratings <- read_csv("steam_ratings.csv")
steam_ratings <- as(steam_ratings, "matrix")
steam_ratings <- as(steam_ratings, "realRatingMatrix")

26 3

vector_ratings <- as.vector(steam_ratings@data)
table(vector_ratings)

vector_ratings
      0       1       2       3       4       5 
3236066    4773   12500   19762   10655    4724

27 (a)

The rating of 1 was awarded 4773 times. The rating of 2 was awarded 12500 times. The rating of 3 was awarded 19762 times. The rating of 4 was awarded 10655 times. The rating of 5 was awarded 4724 times.

28 (b)

colMeans(steam_ratings) %>% 
  tibble::enframe(name = "game", value = "game_rating") %>% 
  ggplot() +
  geom_histogram(mapping = aes(x = game_rating), color = "white")

29 (c)

rowCounts(steam_ratings) %>% 
  tibble::enframe(name = "game", value = "game_rating") %>% 
  ggplot() +
  geom_histogram(mapping = aes(x = game_rating), color = "white")

30 4

31 (a)/(b)

set.seed(101)
eval_games <- evaluationScheme(data = steam_ratings, 
                                method = "split", 
                                train = 0.8,       
                                given = 6,        
                                goodRating = 3)

32 (c)

train_games <- getData(eval_games, "train")
known_games <- getData(eval_games, "known")
unknown_games <- getData(eval_games, "unknown")

32.1 UBCF MODELS: Cosine

33 5

34 (a) - Center

ubcf_model1 <- Recommender(data = train_games,
                          method = "UBCF", 
                          parameter = list(normalize = "center", method = "Cosine"))

ubcf_predict1 <- predict(object = ubcf_model1,
                        newdata = known_games, 
                        type = "ratings")

35 (b)

ubcf_eval1 <- calcPredictionAccuracy(x = ubcf_predict1,
                                    data = unknown_games)
ubcf_eval1

     RMSE       MSE       MAE 
1.1691627 1.3669415 0.9175286

This UBCF model produces an RMSE of 1.16 and MAE of 0.91. The MAE is more intuitive than RMSE because an MAE of 0.92 means that on average, the predicted ratings from the UBCF model are off by approximately 0.92 of a rating.

36 (a) - non-normalised

ubcf_model2 <- Recommender(data = train_games,
                           method = "UBCF", 
                           parameter= list(normalize = NULL, method="Cosine"))

ubcf_predict2 <- predict(object = ubcf_model2,
                        newdata = known_games, 
                        type = "ratings")

37 (b)

ubcf_eval2 <- calcPredictionAccuracy(x = ubcf_predict2,
                                    data = unknown_games)
ubcf_eval2

     RMSE       MSE       MAE 
1.0793268 1.1649463 0.8189319

This UBCF model produces an RMSE of 1.07 and MAE of 0.81. The MAE is more intuitive than RMSE because an MAE of 0.81 means that on average, the predicted ratings from the UBCF model are off by approximately 0.81 of a rating. We will use this UBCF model to generate game recommendations.

38 (a) - Z-score normalisation

ubcf_model3 <- Recommender(data = train_games,
                           method = "UBCF", 
                           parameter=list(normalize = "Z-score",method="Cosine"))

ubcf_predict3 <- predict(object = ubcf_model3,
                        newdata = known_games, 
                        type = "ratings")

39 (b)

ubcf_eval3 <- calcPredictionAccuracy(x = ubcf_predict3,
                                    data = unknown_games)
ubcf_eval3

     RMSE       MSE       MAE 
1.1845437 1.4031437 0.9238983

This UBCF model produces an RMSE of 1.18 and MAE of 0.92. The MAE is more intuitive than RMSE because an MAE of 0.92 means that on average, the predicted ratings from the UBCF model are off by approximately 0.92 of a rating.

39.1 Euclidean Distance

40 (a) - non-normalised

ubcf_model4 <- Recommender(data = train_games,
                           method = "UBCF", 
                           parameter= list(normalize = NULL, method="Euclidean"))

ubcf_predict4 <- predict(object = ubcf_model4,
                        newdata = known_games, 
                        type = "ratings")

41 (b)

ubcf_eval4 <- calcPredictionAccuracy(x = ubcf_predict4,
                                    data = unknown_games)
ubcf_eval4

     RMSE       MSE       MAE 
1.0990975 1.2080152 0.8294308

This UBCF model produces an RMSE of 1.09 and MAE of 0.82. The MAE is more intuitive than RMSE because an MAE of 0.82 means that on average, the predicted ratings from the UBCF model are off by approximately 0.82 of a rating.

42 (a) - center

ubcf_model5 <- Recommender(data = train_games,
                           method = "UBCF", 
                           parameter= list(normalize = "center", method="Euclidean"))

ubcf_predict5 <- predict(object = ubcf_model5,
                        newdata = known_games, 
                        type = "ratings")

43 (b)

ubcf_eval5 <- calcPredictionAccuracy(x = ubcf_predict5,
                                    data = unknown_games)
ubcf_eval5

     RMSE       MSE       MAE 
1.1892017 1.4142006 0.9145427

This UBCF model produces an RMSE of 1.18 and MAE of 0.91. The MAE is more intuitive than RMSE because an MAE of 0.91 means that on average, the predicted ratings from the UBCF model are off by approximately 0.91 of a rating.

44 (a) - Z-score normalisation

ubcf_model6 <- Recommender(data = train_games,
                           method = "UBCF", 
                           parameter= list(normalize = "Z-score", method="Euclidean"))

ubcf_predict6 <- predict(object = ubcf_model6,
                        newdata = known_games, 
                        type = "ratings")

45 (b)

ubcf_eval6 <- calcPredictionAccuracy(x = ubcf_predict6,
                                    data = unknown_games)
ubcf_eval6

     RMSE       MSE       MAE 
1.2103032 1.4648339 0.9309623

This UBCF model produces an RMSE of 1.21 and MAE of 0.93. The MAE is more intuitive than RMSE because an MAE of 0.93 means that on average, the predicted ratings from the UBCF model are off by approximately 0.93 of a rating.

45.1 Pearson Correlation

46 (a) - non-normalised

ubcf_model7 <- Recommender(data = train_games,
                           method = "UBCF", 
                           parameter= list(normalize = NULL, method="pearson"))

ubcf_predict7 <- predict(object = ubcf_model7,
                        newdata = known_games, 
                        type = "ratings")

47 (b)

ubcf_eval7 <- calcPredictionAccuracy(x = ubcf_predict7,
                                    data = unknown_games)
ubcf_eval7

     RMSE       MSE       MAE 
1.1086429 1.2290892 0.8349371

This UBCF model produces an RMSE of 1.10 and MAE of 0.83. The MAE is more intuitive than RMSE because an MAE of 0.83 means that on average, the predicted ratings from the UBCF model are off by approximately 0.83 of a rating.

48 (a) - centered

ubcf_model8 <- Recommender(data = train_games,
                           method = "UBCF", 
                           parameter= list(normalize = "center", method="pearson"))

ubcf_predict8 <- predict(object = ubcf_model8,
                        newdata = known_games, 
                        type = "ratings")

49 (b)

ubcf_eval8 <- calcPredictionAccuracy(x = ubcf_predict8,
                                    data = unknown_games)
ubcf_eval8

    RMSE      MSE      MAE 
1.121214 1.257121 0.870192

This UBCF model produces an RMSE of 1.12 and MAE of 0.87. The MAE is more intuitive than RMSE because an MAE of 0.87 means that on average, the predicted ratings from the UBCF model are off by approximately 0.87 of a rating.

50 (a) - Z-score normalisation

ubcf_model9 <- Recommender(data = train_games,
                           method = "UBCF", 
                           parameter= list(normalize = "Z-score", method="pearson"))

ubcf_predict9 <- predict(object = ubcf_model9,
                        newdata = known_games, 
                        type = "ratings")

51 (b)

ubcf_eval9 <- calcPredictionAccuracy(x = ubcf_predict9,
                                    data = unknown_games)
ubcf_eval9

    RMSE      MSE      MAE 
1.133585 1.285015 0.877884

This UBCF model produces an RMSE of 1.13 and MAE of 0.87. The MAE is more intuitive than RMSE because an MAE of 0.87 means that on average, the predicted ratings from the UBCF model are off by approximately 0.87 of a rating.

51.1 IBCF MODELS: Cosine

52 6

53 (a) - non-normalised

ibcf_model1 <- Recommender(data = train_games,
                           method = "IBCF", 
                           parameter= list(normalize = NULL, method="Cosine"))

ibcf_predict1 <- predict(object = ibcf_model1,
                        newdata = known_games, 
                        type = "ratings")

54 (b)

ibcf_eval1 <- calcPredictionAccuracy(x = ibcf_predict1,
                                    data = unknown_games)
ibcf_eval1

    RMSE      MSE      MAE 
1.587257 2.519385 1.239649

The IBCF model produces an MAE value of 1.23, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.2 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.

55 (a) - center

ibcf_model2 <- Recommender(data = train_games,
                           method = "IBCF", 
                           parameter= list(normalize = "center", method="Cosine"))

ibcf_predict2 <- predict(object = ibcf_model2,
                        newdata = known_games, 
                        type = "ratings")

56 (b)

ibcf_eval2 <- calcPredictionAccuracy(x = ibcf_predict2,
                                    data = unknown_games)
ibcf_eval2

    RMSE      MSE      MAE 
1.506323 2.269009 1.171511

The IBCF model produces an MAE value of 1.17, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.

57 (a) - Z-score normalisation

ibcf_model3 <- Recommender(data = train_games,
                           method = "IBCF", 
                           parameter= list(normalize = "Z-score", method="Cosine"))

ibcf_predict3 <- predict(object = ibcf_model3,
                        newdata = known_games, 
                        type = "ratings")

58 (b)

ibcf_eval3 <- calcPredictionAccuracy(x = ibcf_predict3,
                                    data = unknown_games)
ibcf_eval3

    RMSE      MSE      MAE 
1.501380 2.254142 1.166580

The IBCF model produces an MAE value of 1.16, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.

58.1 Euclidean Distance

59 (a) - non-normalised

ibcf_model4 <- Recommender(data = train_games,
                           method = "IBCF", 
                           parameter= list(normalize = NULL, method="Euclidean"))

ibcf_predict4 <- predict(object = ibcf_model4,
                        newdata = known_games, 
                        type = "ratings")

60 (b)

ibcf_eval4 <- calcPredictionAccuracy(x = ibcf_predict4,
                                    data = unknown_games)
ibcf_eval4

    RMSE      MSE      MAE 
1.476175 2.179092 1.140654

The IBCF model produces an MAE value of 1.14, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.

61 (a) - center

ibcf_model5 <- Recommender(data = train_games,
                           method = "IBCF", 
                           parameter= list(normalize = "center", method="Euclidean"))

ibcf_predict5 <- predict(object = ibcf_model5,
                        newdata = known_games, 
                        type = "ratings")

62 (b)

ibcf_eval5 <- calcPredictionAccuracy(x = ibcf_predict5,
                                    data = unknown_games)
ibcf_eval5

    RMSE      MSE      MAE 
1.476175 2.179092 1.140654

63 (a) Z-score normalisation

ibcf_model6 <- Recommender(data = train_games,
                           method = "IBCF", 
                           parameter= list(normalize = "Z-score", method="Euclidean"))

ibcf_predict6 <- predict(object = ibcf_model6,
                        newdata = known_games, 
                        type = "ratings")

64 (b)

ibcf_eval6 <- calcPredictionAccuracy(x = ibcf_predict6,
                                    data = unknown_games)
ibcf_eval6

    RMSE      MSE      MAE 
1.474962 2.175512 1.140897

64.1 Pearson Correlation

65 (a) - non-normalised

ibcf_model7 <- Recommender(data = train_games,
                           method = "IBCF", 
                           parameter= list(normalize = NULL, method="pearson"))

ibcf_predict7 <- predict(object = ibcf_model7,
                        newdata = known_games, 
                        type = "ratings")

66 (b)

ibcf_eval7 <- calcPredictionAccuracy(x = ibcf_predict7,
                                    data = unknown_games)
ibcf_eval7

    RMSE      MSE      MAE 
1.456788 2.122230 1.152312

The IBCF model produces an MAE value of 1.15, which means that on average, the predicted ratings from the IBCF model are off by approximately 1.1 of a rating. Therefore, the IBCF model performs worse than the UBCF model and so we won’t use the IBCF model to generate game recommendations.

67 (a) - center

ibcf_model8 <- Recommender(data = train_games,
                           method = "IBCF", 
                           parameter= list(normalize = "center", method="pearson"))

ibcf_predict8 <- predict(object = ibcf_model8,
                        newdata = known_games, 
                        type = "ratings")

68 (b)

ibcf_eval8 <- calcPredictionAccuracy(x = ibcf_predict8,
                                    data = unknown_games)
ibcf_eval8

    RMSE      MSE      MAE 
1.470169 2.161397 1.158908

69 (a) - Z-score normalisation

ibcf_model9 <- Recommender(data = train_games,
                           method = "IBCF", 
                           parameter= list(normalize = "Z-score", method="pearson"))

ibcf_predict9 <- predict(object = ibcf_model9,
                        newdata = known_games, 
                        type = "ratings")

70 (b)

ibcf_eval9 <- calcPredictionAccuracy(x = ibcf_predict9,
                                    data = unknown_games)
ibcf_eval9

    RMSE      MSE      MAE 
1.467355 2.153130 1.158796

71 7

The model with the best MAE score is the UBCF model of the Cosine model that produced 0.8189319. This means that on average, the predicted ratings from the UBCF model are off by approximately 0.81 of a rating. Therefore, the UBCF model performs better than than the IBCF model to generate game recommendations.

The top 3 game recommendations for User 1 is “Bridge Constructor”, “Car Mechanic Simulator 2014” & “Democracy 3”.

The top 3 game recommendations for User 2 is “8BitMMO”, “Airline Tycoon 2” & “Alan Wake’s American Nightmare”.

The top 3 game recommendations for User 3 is “Cogs”, “FINAL FANTASY VII” & “Frozen Hearth”.

The top 3 game recommendations for User 4 is “12 Labours of Hercules”, “12 Labours of Hercules II The Cretan Bull” & “Age of Empires Online”.

The top 3 game recommendations for User 5 is “Airline Tycoon 2”, “BattleBlock Theater” & “Bridge Constructor”.

72 8

Steam can use the collaborative filtering model to display recommended games on the homepage or user dashboard. For example, if a user has previously enjoyed certain types of games, the system can suggest similar games that other users with similar preferences have rated highly. This can increase user engagement by helping users discover new games they are likely to enjoy.

Steam can also use these recommendations in targeted marketing campaigns, such as personalised emails or notifications. By recommending games that match a user’s interests, Steam can increase the likelihood of users returning to the platform and making purchases again.

Although the model has some prediction error shown by the RMSE, MSE, and MAE values, it still provides useful insights into user preferences and can significantly improve recommendation quality.

Overall, collaborative filtering allows Steam to deliver personalised recommendations, which can increase user satisfaction, engagement, and ultimately drive higher sales.