Cosmetics Association Rules

Paulina Sereikyte

2022

Introduction

This paper will aim to examine which cosmetics ingredients or attributes can be associated with their quality and price. The methodology used will be apriori association rules and the data has been colleted from kaggle:https://www.kaggle.com/kingabzpro/cosmetics-datasets.

Preparing the data

2 variants of the data will be used: one containing only the ingredients as transactions, the other including other features (quality, price) among the ingredients.

library(arules)
library(tidyverse)
library(data.table)
library(gridExtra)
library(arulesViz)
data$type_c<- ifelse(data$Combination == 1, "Combination","")
data$type_d<- ifelse(data$Dry == 1, "Dry","")
data$type_o<- ifelse(data$Oily == 1, "Oily","")
data$type_s<- ifelse(data$Sensitive == 1, "Sensitive","")
data$ranking <- ifelse(data$Rank >4, "Good", ifelse(data$Rank<4 & data$Rank>3, "OK", "Bad"))
data$pcat<- ifelse(data$Price > 80, "Luxury", ifelse(data$Price<80 & data$Price>50, "Expensive",
                                                     ifelse(data$Price<50 & data$Price>20, "Affordable", "Cheap")))
data$Transactions<-paste(data$type_c,data$type_d, data$type_o, data$type_s,
                         data$ranking,data$pcat, data$Ingredients, sep=", ")
transactions <- read.transactions("C:\\Users\\serei\\Desktop\\pls.csv", sep = ",", rm.duplicates = TRUE, format="basket",cols=1)
## distribution of transactions with duplicates:
## items
##   1   2   3   4   6   7   8   9  11  15  18  21  23  27  35  38  42  46  49  66 
##  21   7   3   4   2   1   3   2   1   1   1   1   1   1   1   1   1   1   1   1 
##  71 151 
##   1   1
transactions2<- read.transactions("C:\\Users\\serei\\Desktop\\Transactions.csv", sep = ",", rm.duplicates = TRUE, format="basket",cols=1)
## distribution of transactions with duplicates:
## items
##   1   2   3   4   6   7   8   9  11  15  18  21  23  27  35  38  42  46  49  66 
##  22   7   2   5   2   1   3   2   1   1   1   1   1   1   1   1   1   1   1   1 
##  71 151 
##   1   1

The transactions data includes only the ingredients, while transactions2 includes the other features as well. Let’s take a look at the data:

inspect(transactions[1:3])
##     items                                                                                                 transactionID
## [1] {}                                                  Name Ingredients                                               
## [2] {Alcohol Denat.,                                                                                                   
##      Aluminum Distearate,                                                                                              
##      Benzyl Salicylate,                                                                                                
##      Beta-Carotene,                                                                                                    
##      Calcium Gluconate,                                                                                                
##      Citral,                                                                                                           
##      Citric Acid,                                                                                                      
##      Citronellol,                                                                                                      
##      Citrus Aurantifolia (Lime) Extract,                                                                               
##      Copper Gluconate,                                                                                                 
##      Cyanocobalamin,                                                                                                   
##      Decyl Oleate,                                                                                                     
##      Eucalyptus Globulus (Eucalyptus) Leaf Oil,                                                                        
##      Fragrance.,                                                                                                       
##      Geraniol,                                                                                                         
##      Glycerin,                                                                                                         
##      Helianthus Annuus (Sunflower) Seedcake,                                                                           
##      Hydroxycitronellal,                                                                                               
##      Isohexadecane,                                                                                                    
##      Lanolin Alcohol,                                                                                                  
##      Limonene,                                                                                                         
##      Linalool,                                                                                                         
##      Magnesium Gluconate,                                                                                              
##      Magnesium Stearate,                                                                                               
##      Magnesium Sulfate,                                                                                                
##      Medicago Sativa (Alfalfa) Seed Powder,                                                                            
##      Microcrystalline Wax,                                                                                             
##      Mineral Oil,                                                                                                      
##      Niacin,                                                                                                           
##      Octyldodecanol,                                                                                                   
##      Panthenol,                                                                                                        
##      Paraffin,                                                                                                         
##      Petrolatum,                                                                                                       
##      Prunus Amygdalus Dulcis (Sweet Almond) Seed Meal,                                                                 
##      Sesamum Indicum (Sesame) Seed Oil,                                                                                
##      Sesamum Indicum (Sesame) Seed Powder,                                                                             
##      Sodium Benzoate,                                                                                                  
##      Sodium Gluconate,                                                                                                 
##      Tocopheryl Succinate,                                                                                             
##      Water,                                                                                                            
##      Zinc Gluconate}                                    Crème de la Mer  Algae (Seaweed) Extract                        
## [3] {Butylene Glycol,                                                                                                  
##      Methylparaben,                                                                                                    
##      Pentylene Glycol,                                                                                                 
##      Sodium Benzoate,                                                                                                  
##      Sorbic Acid.,                                                                                                     
##      Water}                                             Facial Treatment Essence Galactomyces Ferment Filtrate (Pitera)
summary(transactions)
## transactions as itemMatrix in sparse format with
##  1473 rows (elements/itemsets/transactions) and
##  6189 columns (items) and a density of 0.004471942 
## 
## most frequent items:
##           Glycerin    Butylene Glycol     Phenoxyethanol        Dimethicone 
##                896                739                707                418 
## Sodium Hyaluronate            (Other) 
##                412              37596 
## 
## element (itemset/transaction) length distribution:
## sizes
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19 
## 200  13   8   6   6   4  11  10  20   9  19  11  17  13  21  16  22  30  20  38 
##  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39 
##  32  32  47  25  25  27  36  36  38  26  30  27  27  35  34  34  24  32  28  27 
##  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59 
##  17  36  24  24  20  26  12  14   7  11  14   6  14  12  11   7   9   4   7   6 
##  60  61  62  63  64  65  66  67  69  70  71  73  74  75  76  77  78  79  80  82 
##   7   5   7   3   1   4   3   4   3   1   3   4   2   1   1   4   1   1   2   2 
##  85  86  87  90  94  96  98 100 101 109 112 116 123 138 148 
##   2   1   1   1   1   1   1   1   2   1   1   1   1   1   1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   15.00   27.00   27.68   39.00  148.00 
## 
## includes extended item information - examples:
##                                           labels
## 1 - Acrylates/C10-30 Alkyl Acrylate Crosspolymer
## 2                            (-)-Alpha-Bisabolol
## 3              (+/-):Titanium Dioxide (Ci 77891)
## 
## includes extended transaction information - examples:
##                                                     transactionID
## 1                                                Name\tIngredients
## 2                         Crème de la Mer\tAlgae (Seaweed) Extract
## 3 Facial Treatment Essence\tGalactomyces Ferment Filtrate (Pitera)
length(transactions)
## [1] 1473
inspect(transactions2[1:3])
##     items                                                                      transactionID
## [1] {}                                                  Name Transactions                   
## [2] {Alcohol Denat.,                                                                        
##      Algae (Seaweed) Extract,                                                               
##      Aluminum Distearate,                                                                   
##      Benzyl Salicylate,                                                                     
##      Beta-Carotene,                                                                         
##      Calcium Gluconate,                                                                     
##      Citral,                                                                                
##      Citric Acid,                                                                           
##      Citronellol,                                                                           
##      Citrus Aurantifolia (Lime) Extract,                                                    
##      Copper Gluconate,                                                                      
##      Cyanocobalamin,                                                                        
##      Decyl Oleate,                                                                          
##      Dry,                                                                                   
##      Eucalyptus Globulus (Eucalyptus) Leaf Oil,                                             
##      Fragrance.,                                                                            
##      Geraniol,                                                                              
##      Glycerin,                                                                              
##      Good,                                                                                  
##      Helianthus Annuus (Sunflower) Seedcake,                                                
##      Hydroxycitronellal,                                                                    
##      Isohexadecane,                                                                         
##      Lanolin Alcohol,                                                                       
##      Limonene,                                                                              
##      Linalool,                                                                              
##      Luxury,                                                                                
##      Magnesium Gluconate,                                                                   
##      Magnesium Stearate,                                                                    
##      Magnesium Sulfate,                                                                     
##      Medicago Sativa (Alfalfa) Seed Powder,                                                 
##      Microcrystalline Wax,                                                                  
##      Mineral Oil,                                                                           
##      Niacin,                                                                                
##      Octyldodecanol,                                                                        
##      Oily,                                                                                  
##      Panthenol,                                                                             
##      Paraffin,                                                                              
##      Petrolatum,                                                                            
##      Prunus Amygdalus Dulcis (Sweet Almond) Seed Meal,                                      
##      Sensitive,                                                                             
##      Sesamum Indicum (Sesame) Seed Oil,                                                     
##      Sesamum Indicum (Sesame) Seed Powder,                                                  
##      Sodium Benzoate,                                                                       
##      Sodium Gluconate,                                                                      
##      Tocopheryl Succinate,                                                                  
##      Water,                                                                                 
##      Zinc Gluconate}                                    Crème de la Mer  Combination         
## [3] {Butylene Glycol,                                                                       
##      Dry,                                                                                   
##      Galactomyces Ferment Filtrate (Pitera),                                                
##      Good,                                                                                  
##      Luxury,                                                                                
##      Methylparaben,                                                                         
##      Oily,                                                                                  
##      Pentylene Glycol,                                                                      
##      Sensitive,                                                                             
##      Sodium Benzoate,                                                                       
##      Sorbic Acid.,                                                                          
##      Water}                                             Facial Treatment Essence Combination
summary(transactions2)
## transactions as itemMatrix in sparse format with
##  1473 rows (elements/itemsets/transactions) and
##  6437 columns (items) and a density of 0.004945737 
## 
## most frequent items:
##           Water        Glycerin             Dry Butylene Glycol  Phenoxyethanol 
##             987             910             843             740             707 
##         (Other) 
##           42707 
## 
## element (itemset/transaction) length distribution:
## sizes
##   0   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
##   6  98   5  30   6  76   9   8   6  11   6  15   9  19  12  20  14  13  17  29 
##  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40 
##  26  21  32  34  30  32  28  38  35  28  31  27  29  45  29  23  25  25  39  42 
##  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 
##  31  25  28  22  28  17  28  15  24  17  20   8  15   8  10  18   6  13   7   8 
##  61  62  63  64  65  66  67  68  69  71  72  73  75  76  77  78  79  80  81  82 
##   8  10   5   4   2   6   6   4   1   4   3   4   1   1   6   1   3   3   1   1 
##  83  84  86  88  91  94 100 103 106 107 115 118 122 127 144 152 
##   3   1   1   2   3   1   3   1   1   1   1   1   1   1   1   1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   19.00   31.00   31.84   43.00  152.00 
## 
## includes extended item information - examples:
##                                                                                labels
## 1                                      - Acrylates/C10-30 Alkyl Acrylate Crosspolymer
## 2 -100 Percent Pure Argan Oil: Nourishes and protects skin with essential fatty acids
## 3                                            -100 Percent Sugarcane-Derived Squalane.
## 
## includes extended transaction information - examples:
##                          transactionID
## 1                    Name\tTransactions
## 2          Crème de la Mer\tCombination
## 3 Facial Treatment Essence\tCombination
size(transactions2)
##    [1]   0  47  12  62  83  88  32  30   4  40  43  41  16  56  23  58 152  28
##   [19]  16  40  50  47  31  34  56  62   0   4  51  14  46  29  19   6   6  91
##   [37]  22  44  59  12   2  34  45  12  53  40  28  40  20  31  25   8  34  44
##   [55]  10  18   2  24  24  56  37  51  47  35  14  39   9  43  58  49   8  91
##   [73]  35  54  51  66  12  17  51  61  49  60  50  10 106  55  21  49 115  43
##   [91]  11   2   6  38   6  40   2  27  44   6  49  43  53  41  39  29  11  41
##  [109]  39   2  51  50  12  35  53   0  29   8  40  40  29  44   4  35   2  67
##  [127]  22   5  48  18  26  67  21  59  72  40  21  43  62   2  13   6  32   2
##  [145]  26  68  21  58  42  71  33  58  40   6  20  32  36   6  28  44   4   4
##  [163]  77  37  35  48  39   2  50  41   6  55  42   2   3  38  52  44  42  42
##  [181]  45  25  53  23  52  79  25  27  41  36  46  44   4  73  26  12  44  61
##  [199]  21  38   6  19  82   4  60   2  41   6  25  28  46  21  35  58   6  49
##  [217]  50  23  47  15  55  43  23  24  71  34  14  58  50   2  43  56  25  22
##  [235]  76   6   2  60  10  40   4   6  21  34  42  27  46  34  30  50  24   2
##  [253]  66   6  39  39  43  34   2   2  27   6   4  37   6  30  43   2  53  25
##  [271]  38  61   3   6  78  49  44  14   6  37  14   6  23  39   6  84  41  24
##  [289]  53   2  29  45  47  68  39  14  42  38  19  42  35  39  36  34  24  21
##  [307]  25  14  13  15  20   4  32  32   2  41  55  36  20  28  31  27  15  21
##  [325]  23  18  29  21  32  24  40   2  22  25  28  28   4  39  26  31  26  17
##  [343]  33  14  25  37  22   2   2  32  26  27   4  36   2  28  40  28  29  40
##  [361]  80  38  45  51  30  16  40  15  22  22  30  32  23  53  60  20  16  36
##  [379]  29  25  25  34  49 122  53  62  17  24  20  27  21  18  75  18  28   6
##  [397]   7  20  47  44  25  45  38  18   2  17  55   2  34   6  29  37  37   2
##  [415]  41  18  29  39  50   9   2  33   2   6  25  26  25   2   2  31  27  14
##  [433]  37  29  27   7  32  13   8  52  44  29  24  36  33  18   2   9   7  40
##  [451]  39   2   9  26  29  22   6  32   6  28  47  34  50  24  25  29  16  35
##  [469]   2  36  34   3  30  31  31  26  23  20  51  49  28  28  30  35  27  20
##  [487]  40  47   6  39  61  20  54   2  11  41 144   4  47   2  32   2  26  52
##  [505]  32  34  17  21  28   2  23  25  18  40  16  34  31  34   2   6   2  39
##  [523]   3  46   2  19  35  22  22   2  21  13   0  14  29  40  31  24  32  39
##  [541]  26  16  33  51  48  31  42  22  21  21   2  45  23  20  16  28  35  29
##  [559]  42  51   9  15  17  34  26   3  20  28  22  35   2  43  23   2  22   2
##  [577]  12  26  34  30  39   6  45  32  25  51   6  29  44  11  39  31  49  30
##  [595]  37   4  37  28  41  22   5  54  18  12  44   6  34  25  33  43  39  27
##  [613]  23  10  41  67  13  34   6  12  66  31  45  51  22  40   2  13  26  30
##  [631]  49  19  61   4  38  52  40  71  50  25  27  41  34  40  36  18  58  27
##  [649]  44  30  53  79  42  28   4  39  14   2  15  11  86  33  35  49   6  26
##  [667]  14  45   4  26  46  41   2  10  58  33  43  40 100  26  52   8  38  26
##  [685]  52  23  47  38  47  53  44  61  33  34  41  16  29  36  10   6  29  40
##  [703]   6  42   6  10  28   6  25  57  13  43  17  52 118  10  58  37  33   6
##  [721]  55  19  20   2  23   4  30   4  91  49  47  17  14  33  45  43  45  15
##  [739]  45  53  37  45  65  50  31  28  62  15  44  49   2  10  71  34  44  36
##  [757]   0  31  27  48  41  38  38  20  50  37  49   4  49  33  41  32  48  15
##  [775]  51  46   2   6  45  12  10  29  22  17  29  39  55  35  54  32   6  60
##  [793]  54  17  24  38  51   6  31  43   6  36  40  31  34  34   9  51  53  39
##  [811]  21  40  41  37   2  47  56  45  28  17  12   2   2  38  26  33   2  56
##  [829]  47  12   6  69  39  22  60  29  20  41   7  24 127 100  24  23  27   0
##  [847]  34   6  28  94  12   6  29  29  28  49  44  56  43   5  80   4  34  19
##  [865]  37 107   4  24  40   2  30  20  88  21  20  41  36  38  31  23  28  16
##  [883]  41  24  47  32  23  51  11  19  24  17   2  37   4  23  42  26   8  39
##  [901]  27  39  23  47  40  39   6  45  27  17  28  23  54  47  41  49  68  63
##  [919]  27  48  31  24  32  38  27   2  14   4  33  68  20   2  19  29  44  21
##  [937]  22   4  40  34  47  77   6  46   7  37  34  36  33  12  35  34  60  27
##  [955]  24   6  35  26 103  28  73  34  34  29  26  31   7  25  28  30  20  35
##  [973]   2   2  46  20  22  29  39  54  39  24  36  56  39  21  29  32  20  34
##  [991]  25  30  34  23  38   7  26  20  25  41  27  33  57  28   6  79  47  44
## [1009]  48   6  44   6  24  24  36   6  12   6  39  30   6  34  27  26   6   4
## [1027]  40  27  23  42   6  39   2  27  41  73   6  28  40  40  40  40  59  31
## [1045]  46  38  30  16  34  33  34  26  43  32  28  33  63  24  48  28  14  47
## [1063]  36  30  31  23   6  25  32  29  40  45  35  23  40  38  25  31  43   8
## [1081]  47  32  40  13  27  47  33  32  30  32  36  35  28  39  46  83  45  35
## [1099]  64  31  58  63  37  56  15  33  50  62  55 100  66  41  33  33  67  56
## [1117]  50  51  32  42  40  33  24  40  62  57  39  43  47  49  67  31  46  38
## [1135]  39  66  50  48  19  73  24  28  56  56  29  56  55  47  23  21  34  33
## [1153]  48  58  43  47   6  37  56  42   2   6  23   2  61  32   6  49  43   2
## [1171]  35  57  35  38   6  24  41  46  45  26  30   2  64   2  23  39  25   2
## [1189]  66  45   6  27   6  37  63   2  41  45  43  39  26   2  62  36  47   2
## [1207]  37  72  33  39  43   2  80  36  23  48  25  16  23  30  50  33  27  59
## [1225]   7  34  55  43   2   6   6  57  35  30  31  24  28  16  62  64  53  49
## [1243]  50  23  63  58  48   6  49   2  29  20  45  56   2  28  42  64   6  45
## [1261]  29   5  45  77  58  51  59  36  21  38  18   4   2  77  22  49  53  59
## [1279]   2  41  31  31  35  34  42   2  35  56   2  54  40  46   6  40  44  16
## [1297]  14  51  35  24   4  31  19   2   2  35   2  24  35  45  42  59  30  24
## [1315]  51  46  45  48   2   2  29  39   2  38   2  61  31  38  51  36  19  32
## [1333]  39  45  24  41  53  40  81  40  37  43  77  48  25   6  17   2   6  16
## [1351]   2  34   2  37  21  26   2  19  42  30  20  37  62   6  21   8  42  20
## [1369]  19  34   2  48  20  41   6  33  46  46  33  28  25  21  43   2  34  43
## [1387]  30  26  21  26  47  42  56  20  49  67  60   2  16  34   2  31   2   4
## [1405]  15   2  19  56  21  77  18  47  38  34  24  22  65  39  42  42  27  16
## [1423]  16  14  28   2  10  24  20  29  57  42  36  15  29  23   5   6  26  37
## [1441]  23  30  14  26  28  24  19  45  72  25  43  14  30  13  35  49   6  30
## [1459]  16  16  34   7  33   2   5  83   4  42  31  41  20  19   6
length(transactions2)
## [1] 1473

Simple statistics for transactions: (transactions2 was not included as the price and quality categories are the most common transactions)

relative_frequency<- itemFrequency(transactions, type="relative")
absolute_frequency<- itemFrequency(transactions, type="absolute")
frq_plot<- itemFrequencyPlot(transactions, support = 0.2)

top_15_plot<- itemFrequencyPlot(transactions, topN = 15)

Apriori rules

Apriori rules (ingredients only)

These rules do not tell us much, as similar ingredients and common ingredients such as Glycerin and Water appear in almost every product.

ingredients_rules <- apriori(transactions, parameter = list(support = 0.01, confidence = 0.7, minlen = 2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 14 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6189 item(s), 1473 transaction(s)] done [0.02s].
## sorting and recoding items ... [470 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.18s].
## writing ... [872663 rule(s)] done [0.17s].
## creating S4 object  ... done [0.58s].
inspect(sort(ingredients_rules, by = "lift")[1:3])
##     lhs                 rhs                  support confidence   coverage     lift count
## [1] {Glycerin,                                                                           
##      Polyisobutene,                                                                      
##      Polysorbate 20} => {Polyacrylate-13} 0.01018330  1.0000000 0.01018330 86.64706    15
## [2] {Polyisobutene,                                                                      
##      Polysorbate 20} => {Polyacrylate-13} 0.01086219  0.9411765 0.01154107 81.55017    16
## [3] {Glycerin,                                                                           
##      Polyisobutene}  => {Polyacrylate-13} 0.01018330  0.9375000 0.01086219 81.23162    15
inspect(sort(ingredients_rules, by = "confidence")[1:3])
##     lhs                                                    rhs                  support confidence   coverage     lift count
## [1] {Simmondsia Chinensis (Jojoba) Seed Extract}        => {Butylene Glycol} 0.01154107          1 0.01154107 1.993234    17
## [2] {Lonicera Caprifolium (Honeysuckle) Flower Extract} => {Glycerin}        0.01018330          1 0.01018330 1.643973    15
## [3] {Hydroxypropyl Methylcellulose}                     => {Glycerin}        0.01018330          1 0.01018330 1.643973    15
inspect(sort(ingredients_rules, by = "support")[1:3])
##     lhs                                  rhs        support   confidence
## [1] {Butylene Glycol}                 => {Glycerin} 0.4032587 0.8037889 
## [2] {Phenoxyethanol}                  => {Glycerin} 0.4012220 0.8359264 
## [3] {Butylene Glycol, Phenoxyethanol} => {Glycerin} 0.2729124 0.8589744 
##     coverage  lift     count
## [1] 0.5016972 1.321407 594  
## [2] 0.4799728 1.374241 591  
## [3] 0.3177189 1.412131 402
inspect(sort(ingredients_rules, by = "count")[1:3])
##     lhs                                  rhs        support   confidence
## [1] {Butylene Glycol}                 => {Glycerin} 0.4032587 0.8037889 
## [2] {Phenoxyethanol}                  => {Glycerin} 0.4012220 0.8359264 
## [3] {Butylene Glycol, Phenoxyethanol} => {Glycerin} 0.2729124 0.8589744 
##     coverage  lift     count
## [1] 0.5016972 1.321407 594  
## [2] 0.4799728 1.374241 591  
## [3] 0.3177189 1.412131 402

The rules concerning the quality of the products will be much more telling.

Apriori rules

ingredients_rules2 <- apriori(transactions2, parameter = list(support = 0.01, confidence = 0.7, minlen = 2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 14 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6437 item(s), 1473 transaction(s)] done [0.02s].
## sorting and recoding items ... [487 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.65s].
## writing ... [2879107 rule(s)] done [0.59s].
## creating S4 object  ... done [1.34s].

Luxury products

Let’s examine the rules for Luxury skincare products. The following combinations were around 6 times more likely to appear in Luxury products. * Sesame seed powder paired & water * Eucalyptus leaf oil & water * Alfalfa seed powder & water.

rules.Luxury<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005), 
                       appearance=list(default="lhs", rhs="Luxury"), control=list(verbose=F)) 
rules.Luxury.byconf<-sort(rules.Luxury, by="confidence", decreasing=TRUE)
inspect(head(rules.Luxury.byconf))
##     lhs                                                    rhs         support confidence   coverage     lift count
## [1] {Sesamum Indicum (Sesame) Seed Powder,                                                                         
##      Water}                                             => {Luxury} 0.01018330          1 0.01018330 5.963563    15
## [2] {Eucalyptus Globulus (Eucalyptus) Leaf Oil,                                                                    
##      Water}                                             => {Luxury} 0.01086219          1 0.01086219 5.963563    16
## [3] {Medicago Sativa (Alfalfa) Seed Powder,                                                                        
##      Water}                                             => {Luxury} 0.01086219          1 0.01086219 5.963563    16
## [4] {Tocopheryl Succinate,                                                                                         
##      Water}                                             => {Luxury} 0.01086219          1 0.01086219 5.963563    16
## [5] {Prunus Amygdalus Dulcis (Sweet Almond) Seed Meal,                                                             
##      Water}                                             => {Luxury} 0.01154107          1 0.01154107 5.963563    17
## [6] {Eucalyptus Globulus (Eucalyptus) Leaf Oil,                                                                    
##      Sesamum Indicum (Sesame) Seed Powder,                                                                         
##      Water}                                             => {Luxury} 0.01018330          1 0.01018330 5.963563    15
plot(rules.Luxury, method="graph")
## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).

plot(rules.Luxury, measure=c("support","lift"), shading="confidence", main="Luxury pricing")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

Highly rated products

Surprisingly, the rules for highly rated products differed from the luxury products - the top rules had more elements and lower lift values.

rules.Good<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005), 
                      appearance=list(default="lhs", rhs="Good"), control=list(verbose=F)) 
rules.Good.byconf<-sort(rules.Good, by="confidence", decreasing=TRUE)
inspect(head(rules.Good.byconf))
##     lhs                     rhs       support confidence   coverage     lift count
## [1] {Limonene,                                                                    
##      Panthenol,                                                                   
##      Sensitive}          => {Good} 0.01086219          1 0.01086219 2.658845    16
## [2] {Cyclohexasiloxane,                                                           
##      Glycerin,                                                                    
##      Oily,                                                                        
##      Propanediol}        => {Good} 0.01018330          1 0.01018330 2.658845    15
## [3] {Cyclohexasiloxane,                                                           
##      Glycerin,                                                                    
##      Propanediol,                                                                 
##      Sensitive}          => {Good} 0.01018330          1 0.01018330 2.658845    15
## [4] {Affordable,                                                                  
##      Cyclohexasiloxane,                                                           
##      Disodium EDTA,                                                               
##      Sensitive}          => {Good} 0.01018330          1 0.01018330 2.658845    15
## [5] {Cyclohexasiloxane,                                                           
##      Disodium EDTA,                                                               
##      Oily,                                                                        
##      Phenoxyethanol}     => {Good} 0.01086219          1 0.01086219 2.658845    16
## [6] {Cyclohexasiloxane,                                                           
##      Disodium EDTA,                                                               
##      Phenoxyethanol,                                                              
##      Sensitive}          => {Good} 0.01221996          1 0.01221996 2.658845    18
plot(rules.Good, method="graph")
## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).

plot(rules.Good, measure=c("support","lift"), shading="confidence", main="Luxury pricing")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

Cheap and Bad products

The association rules for cheap products and badly rated products were very homogeneous on the category level. Cheap product rules had very high lift values, while badly rated product rules’ confidence levels were much lower than in the other examples.

rules.Cheap<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005), 
                      appearance=list(default="lhs", rhs="Cheap"), control=list(verbose=F)) 
rules.Cheap.byconf<-sort(rules.Cheap, by="confidence", decreasing=TRUE)
inspect(head(rules.Cheap.byconf))
##     lhs                             rhs        support confidence   coverage     lift count
## [1] {2-Hexanediol,                                                                         
##      Dipotassium Glycyrrhizate,                                                            
##      Xanthan Gum}                => {Cheap} 0.01086219          1 0.01086219 7.220588    16
## [2] {1,                                                                                    
##      Dipotassium Glycyrrhizate,                                                            
##      Xanthan Gum}                => {Cheap} 0.01086219          1 0.01086219 7.220588    16
## [3] {2-Hexanediol,                                                                         
##      Allantoin,                                                                            
##      Panthenol}                  => {Cheap} 0.01018330          1 0.01018330 7.220588    15
## [4] {1,                                                                                    
##      2-Hexanediol,                                                                         
##      Dipotassium Glycyrrhizate,                                                            
##      Xanthan Gum}                => {Cheap} 0.01086219          1 0.01086219 7.220588    16
## [5] {2-Hexanediol,                                                                         
##      Dipotassium Glycyrrhizate,                                                            
##      Glycerin,                                                                             
##      Xanthan Gum}                => {Cheap} 0.01086219          1 0.01086219 7.220588    16
## [6] {2-Hexanediol,                                                                         
##      Dipotassium Glycyrrhizate,                                                            
##      Water,                                                                                
##      Xanthan Gum}                => {Cheap} 0.01018330          1 0.01018330 7.220588    15
plot(rules.Cheap, method="graph")
## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).

plot(rules.Cheap, measure=c("support","lift"), shading="confidence", main="Cheap pricing")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

rules.Bad<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005), 
                     appearance=list(default="lhs", rhs="Bad"), control=list(verbose=F)) 
rules.Bad.byconf<-sort(rules.Bad, by="confidence", decreasing=TRUE)
inspect(head(rules.Bad.byconf))
##     lhs                     rhs      support confidence   coverage     lift count
## [1] {Butylene Glycol,                                                            
##      Oily,                                                                       
##      Tocopheryl Acetate} => {Bad} 0.01018330  0.1612903 0.06313646 3.210549    15
## [2] {Butylene Glycol,                                                            
##      Oily,                                                                       
##      Sensitive,                                                                  
##      Tocopheryl Acetate} => {Bad} 0.01018330  0.1612903 0.06313646 3.210549    15
## [3] {Butylene Glycol,                                                            
##      Dry,                                                                        
##      Oily,                                                                       
##      Tocopheryl Acetate} => {Bad} 0.01018330  0.1612903 0.06313646 3.210549    15
## [4] {Butylene Glycol,                                                            
##      Dry,                                                                        
##      Sensitive,                                                                  
##      Tocopheryl Acetate} => {Bad} 0.01018330  0.1612903 0.06313646 3.210549    15
## [5] {Butylene Glycol,                                                            
##      Dry,                                                                        
##      Oily,                                                                       
##      Sensitive,                                                                  
##      Tocopheryl Acetate} => {Bad} 0.01018330  0.1612903 0.06313646 3.210549    15
## [6] {Butylene Glycol,                                                            
##      Dimethicone,                                                                
##      Glycerin,                                                                   
##      Oily}               => {Bad} 0.01221996  0.1565217 0.07807196 3.115629    18
plot(rules.Bad, method="graph")
## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).

plot(rules.Bad, measure=c("support","lift"), shading="confidence", main="Low Ratings")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

Skin types

Lastly, let’s examine the association rules. Even though the association rules had reasonable metrics, they are hard to interpret as they are mostly associated to products meant for other skin types.

rules.Oily<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005), 
                   appearance=list(default="lhs", rhs="Oily"), control=list(verbose=F)) 
rules.Oily.byconf<-sort(rules.Oily, by="confidence", decreasing=TRUE)
inspect(head(rules.Oily.byconf))
##     lhs                                               rhs       support confidence   coverage     lift count
## [1] {Dry,                                                                                                   
##      Glycerin*}                                    => {Oily} 0.01154107          1 0.01154107 2.221719    17
## [2] {Chrysanthemum Parthenium (Feverfew) Extract,                                                           
##      Dry}                                          => {Oily} 0.01221996          1 0.01221996 2.221719    18
## [3] {Decyl Glucoside,                                                                                       
##      Sensitive}                                    => {Oily} 0.01018330          1 0.01018330 2.221719    15
## [4] {Arnica Montana Flower Extract,                                                                         
##      Sensitive}                                    => {Oily} 0.01086219          1 0.01086219 2.221719    16
## [5] {Arnica Montana Flower Extract,                                                                         
##      Dry}                                          => {Oily} 0.01086219          1 0.01086219 2.221719    16
## [6] {Dry,                                                                                                   
##      Rosmarinus Officinalis (Rosemary) Leaf Oil}   => {Oily} 0.01289885          1 0.01289885 2.221719    19
rules.Dry<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005), 
                   appearance=list(default="lhs", rhs="Dry"), control=list(verbose=F)) 
rules.Dry.byconf<-sort(rules.Dry, by="confidence", decreasing=TRUE)
inspect(head(rules.Dry.byconf))
##     lhs                                               rhs      support confidence   coverage     lift count
## [1] {OilyOK}                                       => {Dry} 0.02308215          1 0.02308215 1.747331    34
## [2] {OilyGood}                                     => {Dry} 0.06788866          1 0.06788866 1.747331   100
## [3] {Oily}                                         => {Dry} 0.45010183          1 0.45010183 1.747331   663
## [4] {Glycerin*,                                                                                            
##      Oily}                                         => {Dry} 0.01154107          1 0.01154107 1.747331    17
## [5] {Chrysanthemum Parthenium (Feverfew) Extract,                                                          
##      Oily}                                         => {Dry} 0.01221996          1 0.01221996 1.747331    18
## [6] {OilyOK,                                                                                               
##      Phenoxyethanol}                               => {Dry} 0.01086219          1 0.01086219 1.747331    16
rules.Sensitive<-apriori(data=transactions2, parameter=list(supp=0.01,conf = 0.005), 
                   appearance=list(default="lhs", rhs="Sensitive"), control=list(verbose=F)) 
rules.Sensitive.byconf<-sort(rules.Sensitive, by="confidence", decreasing=TRUE)
inspect(head(rules.Sensitive.byconf))
##     lhs                                               rhs            support confidence   coverage     lift count
## [1] {Oily}                                         => {Sensitive} 0.45010183          1 0.45010183 2.116379   663
## [2] {Glycerin*,                                                                                                  
##      Oily}                                         => {Sensitive} 0.01154107          1 0.01154107 2.116379    17
## [3] {Dry,                                                                                                        
##      Glycerin*}                                    => {Sensitive} 0.01154107          1 0.01154107 2.116379    17
## [4] {Chrysanthemum Parthenium (Feverfew) Extract,                                                                
##      Oily}                                         => {Sensitive} 0.01221996          1 0.01221996 2.116379    18
## [5] {Chrysanthemum Parthenium (Feverfew) Extract,                                                                
##      Dry}                                          => {Sensitive} 0.01221996          1 0.01221996 2.116379    18
## [6] {Decyl Glucoside,                                                                                            
##      Oily}                                         => {Sensitive} 0.01018330          1 0.01018330 2.116379    15

Conclusion

The results of this study can be used to evaluate the quality or the potential price range of a new skincare product, as well as for guidance when evaluating the ingredient list.