Introduction
This paper will aim to examine which cosmetics ingredients or attributes can be associated with their quality and price. The methodology used will be apriori association rules and the data has been colleted from kaggle:https://www.kaggle.com/kingabzpro/cosmetics-datasets.
Preparing the data
2 variants of the data will be used: one containing only the ingredients as transactions, the other including other features (quality, price) among the ingredients.
library(arules)
library(tidyverse)
library(data.table)
library(gridExtra)
library(arulesViz)
$type_c<- ifelse(data$Combination == 1, "Combination","")
data$type_d<- ifelse(data$Dry == 1, "Dry","")
data$type_o<- ifelse(data$Oily == 1, "Oily","")
data$type_s<- ifelse(data$Sensitive == 1, "Sensitive","")
data$ranking <- ifelse(data$Rank >4, "Good", ifelse(data$Rank<4 & data$Rank>3, "OK", "Bad"))
data$pcat<- ifelse(data$Price > 80, "Luxury", ifelse(data$Price<80 & data$Price>50, "Expensive",
dataifelse(data$Price<50 & data$Price>20, "Affordable", "Cheap")))
$Transactions<-paste(data$type_c,data$type_d, data$type_o, data$type_s,
data$ranking,data$pcat, data$Ingredients, sep=", ")
data<- read.transactions("C:\\Users\\serei\\Desktop\\pls.csv", sep = ",", rm.duplicates = TRUE, format="basket",cols=1) transactions
## distribution of transactions with duplicates:
## items
## 1 2 3 4 6 7 8 9 11 15 18 21 23 27 35 38 42 46 49 66
## 21 7 3 4 2 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1
## 71 151
## 1 1
<- read.transactions("C:\\Users\\serei\\Desktop\\Transactions.csv", sep = ",", rm.duplicates = TRUE, format="basket",cols=1) transactions2
## distribution of transactions with duplicates:
## items
## 1 2 3 4 6 7 8 9 11 15 18 21 23 27 35 38 42 46 49 66
## 22 7 2 5 2 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1
## 71 151
## 1 1
The transactions data includes only the ingredients, while transactions2 includes the other features as well. Let’s take a look at the data:
inspect(transactions[1:3])
## items transactionID
## [1] {} Name Ingredients
## [2] {Alcohol Denat.,
## Aluminum Distearate,
## Benzyl Salicylate,
## Beta-Carotene,
## Calcium Gluconate,
## Citral,
## Citric Acid,
## Citronellol,
## Citrus Aurantifolia (Lime) Extract,
## Copper Gluconate,
## Cyanocobalamin,
## Decyl Oleate,
## Eucalyptus Globulus (Eucalyptus) Leaf Oil,
## Fragrance.,
## Geraniol,
## Glycerin,
## Helianthus Annuus (Sunflower) Seedcake,
## Hydroxycitronellal,
## Isohexadecane,
## Lanolin Alcohol,
## Limonene,
## Linalool,
## Magnesium Gluconate,
## Magnesium Stearate,
## Magnesium Sulfate,
## Medicago Sativa (Alfalfa) Seed Powder,
## Microcrystalline Wax,
## Mineral Oil,
## Niacin,
## Octyldodecanol,
## Panthenol,
## Paraffin,
## Petrolatum,
## Prunus Amygdalus Dulcis (Sweet Almond) Seed Meal,
## Sesamum Indicum (Sesame) Seed Oil,
## Sesamum Indicum (Sesame) Seed Powder,
## Sodium Benzoate,
## Sodium Gluconate,
## Tocopheryl Succinate,
## Water,
## Zinc Gluconate} Crème de la Mer Algae (Seaweed) Extract
## [3] {Butylene Glycol,
## Methylparaben,
## Pentylene Glycol,
## Sodium Benzoate,
## Sorbic Acid.,
## Water} Facial Treatment Essence Galactomyces Ferment Filtrate (Pitera)
summary(transactions)
## transactions as itemMatrix in sparse format with
## 1473 rows (elements/itemsets/transactions) and
## 6189 columns (items) and a density of 0.004471942
##
## most frequent items:
## Glycerin Butylene Glycol Phenoxyethanol Dimethicone
## 896 739 707 418
## Sodium Hyaluronate (Other)
## 412 37596
##
## element (itemset/transaction) length distribution:
## sizes
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## 200 13 8 6 6 4 11 10 20 9 19 11 17 13 21 16 22 30 20 38
## 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
## 32 32 47 25 25 27 36 36 38 26 30 27 27 35 34 34 24 32 28 27
## 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
## 17 36 24 24 20 26 12 14 7 11 14 6 14 12 11 7 9 4 7 6
## 60 61 62 63 64 65 66 67 69 70 71 73 74 75 76 77 78 79 80 82
## 7 5 7 3 1 4 3 4 3 1 3 4 2 1 1 4 1 1 2 2
## 85 86 87 90 94 96 98 100 101 109 112 116 123 138 148
## 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 15.00 27.00 27.68 39.00 148.00
##
## includes extended item information - examples:
## labels
## 1 - Acrylates/C10-30 Alkyl Acrylate Crosspolymer
## 2 (-)-Alpha-Bisabolol
## 3 (+/-):Titanium Dioxide (Ci 77891)
##
## includes extended transaction information - examples:
## transactionID
## 1 Name\tIngredients
## 2 Crème de la Mer\tAlgae (Seaweed) Extract
## 3 Facial Treatment Essence\tGalactomyces Ferment Filtrate (Pitera)
length(transactions)
## [1] 1473
inspect(transactions2[1:3])
## items transactionID
## [1] {} Name Transactions
## [2] {Alcohol Denat.,
## Algae (Seaweed) Extract,
## Aluminum Distearate,
## Benzyl Salicylate,
## Beta-Carotene,
## Calcium Gluconate,
## Citral,
## Citric Acid,
## Citronellol,
## Citrus Aurantifolia (Lime) Extract,
## Copper Gluconate,
## Cyanocobalamin,
## Decyl Oleate,
## Dry,
## Eucalyptus Globulus (Eucalyptus) Leaf Oil,
## Fragrance.,
## Geraniol,
## Glycerin,
## Good,
## Helianthus Annuus (Sunflower) Seedcake,
## Hydroxycitronellal,
## Isohexadecane,
## Lanolin Alcohol,
## Limonene,
## Linalool,
## Luxury,
## Magnesium Gluconate,
## Magnesium Stearate,
## Magnesium Sulfate,
## Medicago Sativa (Alfalfa) Seed Powder,
## Microcrystalline Wax,
## Mineral Oil,
## Niacin,
## Octyldodecanol,
## Oily,
## Panthenol,
## Paraffin,
## Petrolatum,
## Prunus Amygdalus Dulcis (Sweet Almond) Seed Meal,
## Sensitive,
## Sesamum Indicum (Sesame) Seed Oil,
## Sesamum Indicum (Sesame) Seed Powder,
## Sodium Benzoate,
## Sodium Gluconate,
## Tocopheryl Succinate,
## Water,
## Zinc Gluconate} Crème de la Mer Combination
## [3] {Butylene Glycol,
## Dry,
## Galactomyces Ferment Filtrate (Pitera),
## Good,
## Luxury,
## Methylparaben,
## Oily,
## Pentylene Glycol,
## Sensitive,
## Sodium Benzoate,
## Sorbic Acid.,
## Water} Facial Treatment Essence Combination
summary(transactions2)
## transactions as itemMatrix in sparse format with
## 1473 rows (elements/itemsets/transactions) and
## 6437 columns (items) and a density of 0.004945737
##
## most frequent items:
## Water Glycerin Dry Butylene Glycol Phenoxyethanol
## 987 910 843 740 707
## (Other)
## 42707
##
## element (itemset/transaction) length distribution:
## sizes
## 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 6 98 5 30 6 76 9 8 6 11 6 15 9 19 12 20 14 13 17 29
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 26 21 32 34 30 32 28 38 35 28 31 27 29 45 29 23 25 25 39 42
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 31 25 28 22 28 17 28 15 24 17 20 8 15 8 10 18 6 13 7 8
## 61 62 63 64 65 66 67 68 69 71 72 73 75 76 77 78 79 80 81 82
## 8 10 5 4 2 6 6 4 1 4 3 4 1 1 6 1 3 3 1 1
## 83 84 86 88 91 94 100 103 106 107 115 118 122 127 144 152
## 3 1 1 2 3 1 3 1 1 1 1 1 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 19.00 31.00 31.84 43.00 152.00
##
## includes extended item information - examples:
## labels
## 1 - Acrylates/C10-30 Alkyl Acrylate Crosspolymer
## 2 -100 Percent Pure Argan Oil: Nourishes and protects skin with essential fatty acids
## 3 -100 Percent Sugarcane-Derived Squalane.
##
## includes extended transaction information - examples:
## transactionID
## 1 Name\tTransactions
## 2 Crème de la Mer\tCombination
## 3 Facial Treatment Essence\tCombination
size(transactions2)
## [1] 0 47 12 62 83 88 32 30 4 40 43 41 16 56 23 58 152 28
## [19] 16 40 50 47 31 34 56 62 0 4 51 14 46 29 19 6 6 91
## [37] 22 44 59 12 2 34 45 12 53 40 28 40 20 31 25 8 34 44
## [55] 10 18 2 24 24 56 37 51 47 35 14 39 9 43 58 49 8 91
## [73] 35 54 51 66 12 17 51 61 49 60 50 10 106 55 21 49 115 43
## [91] 11 2 6 38 6 40 2 27 44 6 49 43 53 41 39 29 11 41
## [109] 39 2 51 50 12 35 53 0 29 8 40 40 29 44 4 35 2 67
## [127] 22 5 48 18 26 67 21 59 72 40 21 43 62 2 13 6 32 2
## [145] 26 68 21 58 42 71 33 58 40 6 20 32 36 6 28 44 4 4
## [163] 77 37 35 48 39 2 50 41 6 55 42 2 3 38 52 44 42 42
## [181] 45 25 53 23 52 79 25 27 41 36 46 44 4 73 26 12 44 61
## [199] 21 38 6 19 82 4 60 2 41 6 25 28 46 21 35 58 6 49
## [217] 50 23 47 15 55 43 23 24 71 34 14 58 50 2 43 56 25 22
## [235] 76 6 2 60 10 40 4 6 21 34 42 27 46 34 30 50 24 2
## [253] 66 6 39 39 43 34 2 2 27 6 4 37 6 30 43 2 53 25
## [271] 38 61 3 6 78 49 44 14 6 37 14 6 23 39 6 84 41 24
## [289] 53 2 29 45 47 68 39 14 42 38 19 42 35 39 36 34 24 21
## [307] 25 14 13 15 20 4 32 32 2 41 55 36 20 28 31 27 15 21
## [325] 23 18 29 21 32 24 40 2 22 25 28 28 4 39 26 31 26 17
## [343] 33 14 25 37 22 2 2 32 26 27 4 36 2 28 40 28 29 40
## [361] 80 38 45 51 30 16 40 15 22 22 30 32 23 53 60 20 16 36
## [379] 29 25 25 34 49 122 53 62 17 24 20 27 21 18 75 18 28 6
## [397] 7 20 47 44 25 45 38 18 2 17 55 2 34 6 29 37 37 2
## [415] 41 18 29 39 50 9 2 33 2 6 25 26 25 2 2 31 27 14
## [433] 37 29 27 7 32 13 8 52 44 29 24 36 33 18 2 9 7 40
## [451] 39 2 9 26 29 22 6 32 6 28 47 34 50 24 25 29 16 35
## [469] 2 36 34 3 30 31 31 26 23 20 51 49 28 28 30 35 27 20
## [487] 40 47 6 39 61 20 54 2 11 41 144 4 47 2 32 2 26 52
## [505] 32 34 17 21 28 2 23 25 18 40 16 34 31 34 2 6 2 39
## [523] 3 46 2 19 35 22 22 2 21 13 0 14 29 40 31 24 32 39
## [541] 26 16 33 51 48 31 42 22 21 21 2 45 23 20 16 28 35 29
## [559] 42 51 9 15 17 34 26 3 20 28 22 35 2 43 23 2 22 2
## [577] 12 26 34 30 39 6 45 32 25 51 6 29 44 11 39 31 49 30
## [595] 37 4 37 28 41 22 5 54 18 12 44 6 34 25 33 43 39 27
## [613] 23 10 41 67 13 34 6 12 66 31 45 51 22 40 2 13 26 30
## [631] 49 19 61 4 38 52 40 71 50 25 27 41 34 40 36 18 58 27
## [649] 44 30 53 79 42 28 4 39 14 2 15 11 86 33 35 49 6 26
## [667] 14 45 4 26 46 41 2 10 58 33 43 40 100 26 52 8 38 26
## [685] 52 23 47 38 47 53 44 61 33 34 41 16 29 36 10 6 29 40
## [703] 6 42 6 10 28 6 25 57 13 43 17 52 118 10 58 37 33 6
## [721] 55 19 20 2 23 4 30 4 91 49 47 17 14 33 45 43 45 15
## [739] 45 53 37 45 65 50 31 28 62 15 44 49 2 10 71 34 44 36
## [757] 0 31 27 48 41 38 38 20 50 37 49 4 49 33 41 32 48 15
## [775] 51 46 2 6 45 12 10 29 22 17 29 39 55 35 54 32 6 60
## [793] 54 17 24 38 51 6 31 43 6 36 40 31 34 34 9 51 53 39
## [811] 21 40 41 37 2 47 56 45 28 17 12 2 2 38 26 33 2 56
## [829] 47 12 6 69 39 22 60 29 20 41 7 24 127 100 24 23 27 0
## [847] 34 6 28 94 12 6 29 29 28 49 44 56 43 5 80 4 34 19
## [865] 37 107 4 24 40 2 30 20 88 21 20 41 36 38 31 23 28 16
## [883] 41 24 47 32 23 51 11 19 24 17 2 37 4 23 42 26 8 39
## [901] 27 39 23 47 40 39 6 45 27 17 28 23 54 47 41 49 68 63
## [919] 27 48 31 24 32 38 27 2 14 4 33 68 20 2 19 29 44 21
## [937] 22 4 40 34 47 77 6 46 7 37 34 36 33 12 35 34 60 27
## [955] 24 6 35 26 103 28 73 34 34 29 26 31 7 25 28 30 20 35
## [973] 2 2 46 20 22 29 39 54 39 24 36 56 39 21 29 32 20 34
## [991] 25 30 34 23 38 7 26 20 25 41 27 33 57 28 6 79 47 44
## [1009] 48 6 44 6 24 24 36 6 12 6 39 30 6 34 27 26 6 4
## [1027] 40 27 23 42 6 39 2 27 41 73 6 28 40 40 40 40 59 31
## [1045] 46 38 30 16 34 33 34 26 43 32 28 33 63 24 48 28 14 47
## [1063] 36 30 31 23 6 25 32 29 40 45 35 23 40 38 25 31 43 8
## [1081] 47 32 40 13 27 47 33 32 30 32 36 35 28 39 46 83 45 35
## [1099] 64 31 58 63 37 56 15 33 50 62 55 100 66 41 33 33 67 56
## [1117] 50 51 32 42 40 33 24 40 62 57 39 43 47 49 67 31 46 38
## [1135] 39 66 50 48 19 73 24 28 56 56 29 56 55 47 23 21 34 33
## [1153] 48 58 43 47 6 37 56 42 2 6 23 2 61 32 6 49 43 2
## [1171] 35 57 35 38 6 24 41 46 45 26 30 2 64 2 23 39 25 2
## [1189] 66 45 6 27 6 37 63 2 41 45 43 39 26 2 62 36 47 2
## [1207] 37 72 33 39 43 2 80 36 23 48 25 16 23 30 50 33 27 59
## [1225] 7 34 55 43 2 6 6 57 35 30 31 24 28 16 62 64 53 49
## [1243] 50 23 63 58 48 6 49 2 29 20 45 56 2 28 42 64 6 45
## [1261] 29 5 45 77 58 51 59 36 21 38 18 4 2 77 22 49 53 59
## [1279] 2 41 31 31 35 34 42 2 35 56 2 54 40 46 6 40 44 16
## [1297] 14 51 35 24 4 31 19 2 2 35 2 24 35 45 42 59 30 24
## [1315] 51 46 45 48 2 2 29 39 2 38 2 61 31 38 51 36 19 32
## [1333] 39 45 24 41 53 40 81 40 37 43 77 48 25 6 17 2 6 16
## [1351] 2 34 2 37 21 26 2 19 42 30 20 37 62 6 21 8 42 20
## [1369] 19 34 2 48 20 41 6 33 46 46 33 28 25 21 43 2 34 43
## [1387] 30 26 21 26 47 42 56 20 49 67 60 2 16 34 2 31 2 4
## [1405] 15 2 19 56 21 77 18 47 38 34 24 22 65 39 42 42 27 16
## [1423] 16 14 28 2 10 24 20 29 57 42 36 15 29 23 5 6 26 37
## [1441] 23 30 14 26 28 24 19 45 72 25 43 14 30 13 35 49 6 30
## [1459] 16 16 34 7 33 2 5 83 4 42 31 41 20 19 6
length(transactions2)
## [1] 1473
Simple statistics for transactions: (transactions2 was not included as the price and quality categories are the most common transactions)
<- itemFrequency(transactions, type="relative")
relative_frequency<- itemFrequency(transactions, type="absolute")
absolute_frequency<- itemFrequencyPlot(transactions, support = 0.2) frq_plot
<- itemFrequencyPlot(transactions, topN = 15) top_15_plot
Apriori rules
Apriori rules (ingredients only)
These rules do not tell us much, as similar ingredients and common ingredients such as Glycerin and Water appear in almost every product.
<- apriori(transactions, parameter = list(support = 0.01, confidence = 0.7, minlen = 2)) ingredients_rules
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 14
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6189 item(s), 1473 transaction(s)] done [0.02s].
## sorting and recoding items ... [470 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.18s].
## writing ... [872663 rule(s)] done [0.17s].
## creating S4 object ... done [0.58s].
inspect(sort(ingredients_rules, by = "lift")[1:3])
## lhs rhs support confidence coverage lift count
## [1] {Glycerin,
## Polyisobutene,
## Polysorbate 20} => {Polyacrylate-13} 0.01018330 1.0000000 0.01018330 86.64706 15
## [2] {Polyisobutene,
## Polysorbate 20} => {Polyacrylate-13} 0.01086219 0.9411765 0.01154107 81.55017 16
## [3] {Glycerin,
## Polyisobutene} => {Polyacrylate-13} 0.01018330 0.9375000 0.01086219 81.23162 15
inspect(sort(ingredients_rules, by = "confidence")[1:3])
## lhs rhs support confidence coverage lift count
## [1] {Simmondsia Chinensis (Jojoba) Seed Extract} => {Butylene Glycol} 0.01154107 1 0.01154107 1.993234 17
## [2] {Lonicera Caprifolium (Honeysuckle) Flower Extract} => {Glycerin} 0.01018330 1 0.01018330 1.643973 15
## [3] {Hydroxypropyl Methylcellulose} => {Glycerin} 0.01018330 1 0.01018330 1.643973 15
inspect(sort(ingredients_rules, by = "support")[1:3])
## lhs rhs support confidence
## [1] {Butylene Glycol} => {Glycerin} 0.4032587 0.8037889
## [2] {Phenoxyethanol} => {Glycerin} 0.4012220 0.8359264
## [3] {Butylene Glycol, Phenoxyethanol} => {Glycerin} 0.2729124 0.8589744
## coverage lift count
## [1] 0.5016972 1.321407 594
## [2] 0.4799728 1.374241 591
## [3] 0.3177189 1.412131 402
inspect(sort(ingredients_rules, by = "count")[1:3])
## lhs rhs support confidence
## [1] {Butylene Glycol} => {Glycerin} 0.4032587 0.8037889
## [2] {Phenoxyethanol} => {Glycerin} 0.4012220 0.8359264
## [3] {Butylene Glycol, Phenoxyethanol} => {Glycerin} 0.2729124 0.8589744
## coverage lift count
## [1] 0.5016972 1.321407 594
## [2] 0.4799728 1.374241 591
## [3] 0.3177189 1.412131 402
The rules concerning the quality of the products will be much more telling.
Apriori rules
<- apriori(transactions2, parameter = list(support = 0.01, confidence = 0.7, minlen = 2)) ingredients_rules2
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 14
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6437 item(s), 1473 transaction(s)] done [0.02s].
## sorting and recoding items ... [487 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.65s].
## writing ... [2879107 rule(s)] done [0.59s].
## creating S4 object ... done [1.34s].
Luxury products
Let’s examine the rules for Luxury skincare products. The following combinations were around 6 times more likely to appear in Luxury products. * Sesame seed powder paired & water * Eucalyptus leaf oil & water * Alfalfa seed powder & water.
<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005),
rules.Luxuryappearance=list(default="lhs", rhs="Luxury"), control=list(verbose=F))
<-sort(rules.Luxury, by="confidence", decreasing=TRUE)
rules.Luxury.byconfinspect(head(rules.Luxury.byconf))
## lhs rhs support confidence coverage lift count
## [1] {Sesamum Indicum (Sesame) Seed Powder,
## Water} => {Luxury} 0.01018330 1 0.01018330 5.963563 15
## [2] {Eucalyptus Globulus (Eucalyptus) Leaf Oil,
## Water} => {Luxury} 0.01086219 1 0.01086219 5.963563 16
## [3] {Medicago Sativa (Alfalfa) Seed Powder,
## Water} => {Luxury} 0.01086219 1 0.01086219 5.963563 16
## [4] {Tocopheryl Succinate,
## Water} => {Luxury} 0.01086219 1 0.01086219 5.963563 16
## [5] {Prunus Amygdalus Dulcis (Sweet Almond) Seed Meal,
## Water} => {Luxury} 0.01154107 1 0.01154107 5.963563 17
## [6] {Eucalyptus Globulus (Eucalyptus) Leaf Oil,
## Sesamum Indicum (Sesame) Seed Powder,
## Water} => {Luxury} 0.01018330 1 0.01018330 5.963563 15
plot(rules.Luxury, method="graph")
## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).
plot(rules.Luxury, measure=c("support","lift"), shading="confidence", main="Luxury pricing")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
Highly rated products
Surprisingly, the rules for highly rated products differed from the luxury products - the top rules had more elements and lower lift values.
<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005),
rules.Goodappearance=list(default="lhs", rhs="Good"), control=list(verbose=F))
<-sort(rules.Good, by="confidence", decreasing=TRUE)
rules.Good.byconfinspect(head(rules.Good.byconf))
## lhs rhs support confidence coverage lift count
## [1] {Limonene,
## Panthenol,
## Sensitive} => {Good} 0.01086219 1 0.01086219 2.658845 16
## [2] {Cyclohexasiloxane,
## Glycerin,
## Oily,
## Propanediol} => {Good} 0.01018330 1 0.01018330 2.658845 15
## [3] {Cyclohexasiloxane,
## Glycerin,
## Propanediol,
## Sensitive} => {Good} 0.01018330 1 0.01018330 2.658845 15
## [4] {Affordable,
## Cyclohexasiloxane,
## Disodium EDTA,
## Sensitive} => {Good} 0.01018330 1 0.01018330 2.658845 15
## [5] {Cyclohexasiloxane,
## Disodium EDTA,
## Oily,
## Phenoxyethanol} => {Good} 0.01086219 1 0.01086219 2.658845 16
## [6] {Cyclohexasiloxane,
## Disodium EDTA,
## Phenoxyethanol,
## Sensitive} => {Good} 0.01221996 1 0.01221996 2.658845 18
plot(rules.Good, method="graph")
## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).
plot(rules.Good, measure=c("support","lift"), shading="confidence", main="Luxury pricing")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
Cheap and Bad products
The association rules for cheap products and badly rated products were very homogeneous on the category level. Cheap product rules had very high lift values, while badly rated product rules’ confidence levels were much lower than in the other examples.
<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005),
rules.Cheapappearance=list(default="lhs", rhs="Cheap"), control=list(verbose=F))
<-sort(rules.Cheap, by="confidence", decreasing=TRUE)
rules.Cheap.byconfinspect(head(rules.Cheap.byconf))
## lhs rhs support confidence coverage lift count
## [1] {2-Hexanediol,
## Dipotassium Glycyrrhizate,
## Xanthan Gum} => {Cheap} 0.01086219 1 0.01086219 7.220588 16
## [2] {1,
## Dipotassium Glycyrrhizate,
## Xanthan Gum} => {Cheap} 0.01086219 1 0.01086219 7.220588 16
## [3] {2-Hexanediol,
## Allantoin,
## Panthenol} => {Cheap} 0.01018330 1 0.01018330 7.220588 15
## [4] {1,
## 2-Hexanediol,
## Dipotassium Glycyrrhizate,
## Xanthan Gum} => {Cheap} 0.01086219 1 0.01086219 7.220588 16
## [5] {2-Hexanediol,
## Dipotassium Glycyrrhizate,
## Glycerin,
## Xanthan Gum} => {Cheap} 0.01086219 1 0.01086219 7.220588 16
## [6] {2-Hexanediol,
## Dipotassium Glycyrrhizate,
## Water,
## Xanthan Gum} => {Cheap} 0.01018330 1 0.01018330 7.220588 15
plot(rules.Cheap, method="graph")
## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).
plot(rules.Cheap, measure=c("support","lift"), shading="confidence", main="Cheap pricing")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005),
rules.Badappearance=list(default="lhs", rhs="Bad"), control=list(verbose=F))
<-sort(rules.Bad, by="confidence", decreasing=TRUE)
rules.Bad.byconfinspect(head(rules.Bad.byconf))
## lhs rhs support confidence coverage lift count
## [1] {Butylene Glycol,
## Oily,
## Tocopheryl Acetate} => {Bad} 0.01018330 0.1612903 0.06313646 3.210549 15
## [2] {Butylene Glycol,
## Oily,
## Sensitive,
## Tocopheryl Acetate} => {Bad} 0.01018330 0.1612903 0.06313646 3.210549 15
## [3] {Butylene Glycol,
## Dry,
## Oily,
## Tocopheryl Acetate} => {Bad} 0.01018330 0.1612903 0.06313646 3.210549 15
## [4] {Butylene Glycol,
## Dry,
## Sensitive,
## Tocopheryl Acetate} => {Bad} 0.01018330 0.1612903 0.06313646 3.210549 15
## [5] {Butylene Glycol,
## Dry,
## Oily,
## Sensitive,
## Tocopheryl Acetate} => {Bad} 0.01018330 0.1612903 0.06313646 3.210549 15
## [6] {Butylene Glycol,
## Dimethicone,
## Glycerin,
## Oily} => {Bad} 0.01221996 0.1565217 0.07807196 3.115629 18
plot(rules.Bad, method="graph")
## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).
plot(rules.Bad, measure=c("support","lift"), shading="confidence", main="Low Ratings")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
Skin types
Lastly, let’s examine the association rules. Even though the association rules had reasonable metrics, they are hard to interpret as they are mostly associated to products meant for other skin types.
<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005),
rules.Oilyappearance=list(default="lhs", rhs="Oily"), control=list(verbose=F))
<-sort(rules.Oily, by="confidence", decreasing=TRUE)
rules.Oily.byconfinspect(head(rules.Oily.byconf))
## lhs rhs support confidence coverage lift count
## [1] {Dry,
## Glycerin*} => {Oily} 0.01154107 1 0.01154107 2.221719 17
## [2] {Chrysanthemum Parthenium (Feverfew) Extract,
## Dry} => {Oily} 0.01221996 1 0.01221996 2.221719 18
## [3] {Decyl Glucoside,
## Sensitive} => {Oily} 0.01018330 1 0.01018330 2.221719 15
## [4] {Arnica Montana Flower Extract,
## Sensitive} => {Oily} 0.01086219 1 0.01086219 2.221719 16
## [5] {Arnica Montana Flower Extract,
## Dry} => {Oily} 0.01086219 1 0.01086219 2.221719 16
## [6] {Dry,
## Rosmarinus Officinalis (Rosemary) Leaf Oil} => {Oily} 0.01289885 1 0.01289885 2.221719 19
<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005),
rules.Dryappearance=list(default="lhs", rhs="Dry"), control=list(verbose=F))
<-sort(rules.Dry, by="confidence", decreasing=TRUE)
rules.Dry.byconfinspect(head(rules.Dry.byconf))
## lhs rhs support confidence coverage lift count
## [1] {OilyOK} => {Dry} 0.02308215 1 0.02308215 1.747331 34
## [2] {OilyGood} => {Dry} 0.06788866 1 0.06788866 1.747331 100
## [3] {Oily} => {Dry} 0.45010183 1 0.45010183 1.747331 663
## [4] {Glycerin*,
## Oily} => {Dry} 0.01154107 1 0.01154107 1.747331 17
## [5] {Chrysanthemum Parthenium (Feverfew) Extract,
## Oily} => {Dry} 0.01221996 1 0.01221996 1.747331 18
## [6] {OilyOK,
## Phenoxyethanol} => {Dry} 0.01086219 1 0.01086219 1.747331 16
<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005),
rules.Sensitiveappearance=list(default="lhs", rhs="Sensitive"), control=list(verbose=F))
<-sort(rules.Sensitive, by="confidence", decreasing=TRUE)
rules.Sensitive.byconfinspect(head(rules.Sensitive.byconf))
## lhs rhs support confidence coverage lift count
## [1] {Oily} => {Sensitive} 0.45010183 1 0.45010183 2.116379 663
## [2] {Glycerin*,
## Oily} => {Sensitive} 0.01154107 1 0.01154107 2.116379 17
## [3] {Dry,
## Glycerin*} => {Sensitive} 0.01154107 1 0.01154107 2.116379 17
## [4] {Chrysanthemum Parthenium (Feverfew) Extract,
## Oily} => {Sensitive} 0.01221996 1 0.01221996 2.116379 18
## [5] {Chrysanthemum Parthenium (Feverfew) Extract,
## Dry} => {Sensitive} 0.01221996 1 0.01221996 2.116379 18
## [6] {Decyl Glucoside,
## Oily} => {Sensitive} 0.01018330 1 0.01018330 2.116379 15
Conclusion
The results of this study can be used to evaluate the quality or the potential price range of a new skincare product, as well as for guidance when evaluating the ingredient list.