Predictive Analytics - Homework #10

Oluwakemi Omotunde

2019-05-05

Problem 1

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’. That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item. The data set is attached. Your assignment is to use R to mine the data for association rules. You should report support, confidence and lift and your top 10 rules by lift.

Extra credit: do a simple cluster analysis on the data as well. Use whichever packages you like. Due May 5 before midnight.

I’ll be using the arules package, which provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). It also provides C implementations of the association mining algorithms Apriori and Eclat. We’ll begin by loading our data and using the read.transactions function to convert our cvs into a format that can be analyzed with the arules package.

For the format parameter we can either go with ‘basket’, whereby each line in the transaction data file represents a transaction where the items (item labels) are separated by the characters specified by sep or ‘single’, wher each line corresponds to a single item, containing at least ids for the transaction and the item. From the instructions, we know that each line represents a transactions and each column is an item.

Now that we have our data loaded, we’ll tak a look to see the frequency of each item in the dataset using the itemFrequencyPlot() function.

A few things that I noticed was that whole milk appeared the most. Other vegetables, rolls/buns, soda an yogurt round out the top 5 most frequent items. I wondered why they put whipped and sour cream together. Now I’ll use the apriori function to mine and fit assosiation rules to our dataset.

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target   ext
##       3  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [29 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## set of 29 rules
## 
## rule length distribution (lhs + rhs):sizes
##  3 
## 29 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       3       3       3       3       3       3 
## 
## summary of quality measures:
##     support           confidence          lift            count      
##  Min.   :0.001017   Min.   :0.8000   Min.   : 3.131   Min.   :10.00  
##  1st Qu.:0.001118   1st Qu.:0.8125   1st Qu.: 3.261   1st Qu.:11.00  
##  Median :0.001220   Median :0.8462   Median : 3.613   Median :12.00  
##  Mean   :0.001473   Mean   :0.8613   Mean   : 4.000   Mean   :14.48  
##  3rd Qu.:0.001729   3rd Qu.:0.9091   3rd Qu.: 4.199   3rd Qu.:17.00  
##  Max.   :0.002542   Max.   :1.0000   Max.   :11.235   Max.   :25.00  
## 
## mining info:
##          data ntransactions support confidence
##  grocery.data          9835   0.001        0.8

Using the is.redundant function, we will remove redundancies and then use the inspect function to see how everything ranks. We will take a look at our output ranked by confidence-measure of probability that the association rule will be correct for out of sample data, lift - measure of effectiveness of the rule in finding consequents, and support - frequency of the relationship in the dataset.

## set of 29 rules
## 
## rule length distribution (lhs + rhs):sizes
##  3 
## 29 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       3       3       3       3       3       3 
## 
## summary of quality measures:
##     support           confidence          lift            count      
##  Min.   :0.001017   Min.   :0.8000   Min.   : 3.131   Min.   :10.00  
##  1st Qu.:0.001118   1st Qu.:0.8125   1st Qu.: 3.261   1st Qu.:11.00  
##  Median :0.001220   Median :0.8462   Median : 3.613   Median :12.00  
##  Mean   :0.001473   Mean   :0.8613   Mean   : 4.000   Mean   :14.48  
##  3rd Qu.:0.001729   3rd Qu.:0.9091   3rd Qu.: 4.199   3rd Qu.:17.00  
##  Max.   :0.002542   Max.   :1.0000   Max.   :11.235   Max.   :25.00  
## 
## mining info:
##          data ntransactions support confidence
##  grocery.data          9835   0.001        0.8
##      lhs                         rhs                    support confidence      lift count
## [1]  {liquor,                                                                             
##       red/blush wine}         => {bottled beer}     0.001931876  0.9047619 11.235269    19
## [2]  {cereals,                                                                            
##       curd}                   => {whole milk}       0.001016777  0.9090909  3.557863    10
## [3]  {cereals,                                                                            
##       yogurt}                 => {whole milk}       0.001728521  0.8095238  3.168192    17
## [4]  {butter,                                                                             
##       jam}                    => {whole milk}       0.001016777  0.8333333  3.261374    10
## [5]  {bottled beer,                                                                       
##       soups}                  => {whole milk}       0.001118454  0.9166667  3.587512    11
## [6]  {house keeping products,                                                             
##       napkins}                => {whole milk}       0.001321810  0.8125000  3.179840    13
## [7]  {house keeping products,                                                             
##       whipped/sour cream}     => {whole milk}       0.001220132  0.9230769  3.612599    12
## [8]  {pastry,                                                                             
##       sweet spreads}          => {whole milk}       0.001016777  0.9090909  3.557863    10
## [9]  {curd,                                                                               
##       turkey}                 => {other vegetables} 0.001220132  0.8000000  4.134524    12
## [10] {rice,                                                                               
##       sugar}                  => {whole milk}       0.001220132  1.0000000  3.913649    12
## [11] {butter,                                                                             
##       rice}                   => {whole milk}       0.001525165  0.8333333  3.261374    15
## [12] {domestic eggs,                                                                      
##       rice}                   => {whole milk}       0.001118454  0.8461538  3.311549    11
## [13] {bottled water,                                                                      
##       rice}                   => {whole milk}       0.001220132  0.9230769  3.612599    12
## [14] {rice,                                                                               
##       yogurt}                 => {other vegetables} 0.001931876  0.8260870  4.269346    19
## [15] {mustard,                                                                            
##       oil}                    => {whole milk}       0.001220132  0.8571429  3.354556    12
## [16] {canned fish,                                                                        
##       hygiene articles}       => {whole milk}       0.001118454  1.0000000  3.913649    11
## [17] {fruit/vegetable juice,                                                              
##       herbs}                  => {other vegetables} 0.001220132  0.8000000  4.134524    12
## [18] {herbs,                                                                              
##       shopping bags}          => {other vegetables} 0.001931876  0.8260870  4.269346    19
## [19] {herbs,                                                                              
##       tropical fruit}         => {whole milk}       0.002338587  0.8214286  3.214783    23
## [20] {herbs,                                                                              
##       rolls/buns}             => {whole milk}       0.002440264  0.8000000  3.130919    24
## [21] {chocolate,                                                                          
##       pickled vegetables}     => {whole milk}       0.001220132  0.8571429  3.354556    12
## [22] {grapes,                                                                             
##       onions}                 => {other vegetables} 0.001118454  0.9166667  4.737476    11
## [23] {margarine,                                                                          
##       meat}                   => {other vegetables} 0.001728521  0.8500000  4.392932    17
## [24] {hard cheese,                                                                        
##       oil}                    => {other vegetables} 0.001118454  0.9166667  4.737476    11
## [25] {butter milk,                                                                        
##       onions}                 => {other vegetables} 0.001321810  0.8125000  4.199126    13
## [26] {butter milk,                                                                        
##       pork}                   => {other vegetables} 0.001830198  0.8571429  4.429848    18
## [27] {onions,                                                                             
##       waffles}                => {other vegetables} 0.001220132  0.8000000  4.134524    12
## [28] {curd,                                                                               
##       hamburger meat}         => {whole milk}       0.002541942  0.8064516  3.156169    25
## [29] {bottled beer,                                                                       
##       hamburger meat}         => {whole milk}       0.001728521  0.8095238  3.168192    17

When comparing the summaries of the two rules, I noticed that there was not a difference. I realized this was because we had set the maxlen parameter to 3.

##      lhs                         rhs                    support confidence      lift count
## [1]  {liquor,                                                                             
##       red/blush wine}         => {bottled beer}     0.001931876  0.9047619 11.235269    19
## [2]  {cereals,                                                                            
##       curd}                   => {whole milk}       0.001016777  0.9090909  3.557863    10
## [3]  {cereals,                                                                            
##       yogurt}                 => {whole milk}       0.001728521  0.8095238  3.168192    17
## [4]  {butter,                                                                             
##       jam}                    => {whole milk}       0.001016777  0.8333333  3.261374    10
## [5]  {bottled beer,                                                                       
##       soups}                  => {whole milk}       0.001118454  0.9166667  3.587512    11
## [6]  {house keeping products,                                                             
##       napkins}                => {whole milk}       0.001321810  0.8125000  3.179840    13
## [7]  {house keeping products,                                                             
##       whipped/sour cream}     => {whole milk}       0.001220132  0.9230769  3.612599    12
## [8]  {pastry,                                                                             
##       sweet spreads}          => {whole milk}       0.001016777  0.9090909  3.557863    10
## [9]  {curd,                                                                               
##       turkey}                 => {other vegetables} 0.001220132  0.8000000  4.134524    12
## [10] {rice,                                                                               
##       sugar}                  => {whole milk}       0.001220132  1.0000000  3.913649    12
## [11] {butter,                                                                             
##       rice}                   => {whole milk}       0.001525165  0.8333333  3.261374    15
## [12] {domestic eggs,                                                                      
##       rice}                   => {whole milk}       0.001118454  0.8461538  3.311549    11
## [13] {bottled water,                                                                      
##       rice}                   => {whole milk}       0.001220132  0.9230769  3.612599    12
## [14] {rice,                                                                               
##       yogurt}                 => {other vegetables} 0.001931876  0.8260870  4.269346    19
## [15] {mustard,                                                                            
##       oil}                    => {whole milk}       0.001220132  0.8571429  3.354556    12
## [16] {canned fish,                                                                        
##       hygiene articles}       => {whole milk}       0.001118454  1.0000000  3.913649    11
## [17] {fruit/vegetable juice,                                                              
##       herbs}                  => {other vegetables} 0.001220132  0.8000000  4.134524    12
## [18] {herbs,                                                                              
##       shopping bags}          => {other vegetables} 0.001931876  0.8260870  4.269346    19
## [19] {herbs,                                                                              
##       tropical fruit}         => {whole milk}       0.002338587  0.8214286  3.214783    23
## [20] {herbs,                                                                              
##       rolls/buns}             => {whole milk}       0.002440264  0.8000000  3.130919    24
## [21] {chocolate,                                                                          
##       pickled vegetables}     => {whole milk}       0.001220132  0.8571429  3.354556    12
## [22] {grapes,                                                                             
##       onions}                 => {other vegetables} 0.001118454  0.9166667  4.737476    11
## [23] {margarine,                                                                          
##       meat}                   => {other vegetables} 0.001728521  0.8500000  4.392932    17
## [24] {hard cheese,                                                                        
##       oil}                    => {other vegetables} 0.001118454  0.9166667  4.737476    11
## [25] {butter milk,                                                                        
##       onions}                 => {other vegetables} 0.001321810  0.8125000  4.199126    13
## [26] {butter milk,                                                                        
##       pork}                   => {other vegetables} 0.001830198  0.8571429  4.429848    18
## [27] {onions,                                                                             
##       waffles}                => {other vegetables} 0.001220132  0.8000000  4.134524    12
## [28] {curd,                                                                               
##       hamburger meat}         => {whole milk}       0.002541942  0.8064516  3.156169    25
## [29] {bottled beer,                                                                       
##       hamburger meat}         => {whole milk}       0.001728521  0.8095238  3.168192    17
##      lhs                         rhs                    support confidence      lift count
## [1]  {liquor,                                                                             
##       red/blush wine}         => {bottled beer}     0.001931876  0.9047619 11.235269    19
## [2]  {cereals,                                                                            
##       curd}                   => {whole milk}       0.001016777  0.9090909  3.557863    10
## [3]  {cereals,                                                                            
##       yogurt}                 => {whole milk}       0.001728521  0.8095238  3.168192    17
## [4]  {butter,                                                                             
##       jam}                    => {whole milk}       0.001016777  0.8333333  3.261374    10
## [5]  {bottled beer,                                                                       
##       soups}                  => {whole milk}       0.001118454  0.9166667  3.587512    11
## [6]  {house keeping products,                                                             
##       napkins}                => {whole milk}       0.001321810  0.8125000  3.179840    13
## [7]  {house keeping products,                                                             
##       whipped/sour cream}     => {whole milk}       0.001220132  0.9230769  3.612599    12
## [8]  {pastry,                                                                             
##       sweet spreads}          => {whole milk}       0.001016777  0.9090909  3.557863    10
## [9]  {curd,                                                                               
##       turkey}                 => {other vegetables} 0.001220132  0.8000000  4.134524    12
## [10] {rice,                                                                               
##       sugar}                  => {whole milk}       0.001220132  1.0000000  3.913649    12
## [11] {butter,                                                                             
##       rice}                   => {whole milk}       0.001525165  0.8333333  3.261374    15
## [12] {domestic eggs,                                                                      
##       rice}                   => {whole milk}       0.001118454  0.8461538  3.311549    11
## [13] {bottled water,                                                                      
##       rice}                   => {whole milk}       0.001220132  0.9230769  3.612599    12
## [14] {rice,                                                                               
##       yogurt}                 => {other vegetables} 0.001931876  0.8260870  4.269346    19
## [15] {mustard,                                                                            
##       oil}                    => {whole milk}       0.001220132  0.8571429  3.354556    12
## [16] {canned fish,                                                                        
##       hygiene articles}       => {whole milk}       0.001118454  1.0000000  3.913649    11
## [17] {fruit/vegetable juice,                                                              
##       herbs}                  => {other vegetables} 0.001220132  0.8000000  4.134524    12
## [18] {herbs,                                                                              
##       shopping bags}          => {other vegetables} 0.001931876  0.8260870  4.269346    19
## [19] {herbs,                                                                              
##       tropical fruit}         => {whole milk}       0.002338587  0.8214286  3.214783    23
## [20] {herbs,                                                                              
##       rolls/buns}             => {whole milk}       0.002440264  0.8000000  3.130919    24
## [21] {chocolate,                                                                          
##       pickled vegetables}     => {whole milk}       0.001220132  0.8571429  3.354556    12
## [22] {grapes,                                                                             
##       onions}                 => {other vegetables} 0.001118454  0.9166667  4.737476    11
## [23] {margarine,                                                                          
##       meat}                   => {other vegetables} 0.001728521  0.8500000  4.392932    17
## [24] {hard cheese,                                                                        
##       oil}                    => {other vegetables} 0.001118454  0.9166667  4.737476    11
## [25] {butter milk,                                                                        
##       onions}                 => {other vegetables} 0.001321810  0.8125000  4.199126    13
## [26] {butter milk,                                                                        
##       pork}                   => {other vegetables} 0.001830198  0.8571429  4.429848    18
## [27] {onions,                                                                             
##       waffles}                => {other vegetables} 0.001220132  0.8000000  4.134524    12
## [28] {curd,                                                                               
##       hamburger meat}         => {whole milk}       0.002541942  0.8064516  3.156169    25
## [29] {bottled beer,                                                                       
##       hamburger meat}         => {whole milk}       0.001728521  0.8095238  3.168192    17
##       support confidence      lift count
## 1 0.001931876  0.9047619 11.235269    19
## 2 0.001016777  0.9090909  3.557863    10
## 3 0.001728521  0.8095238  3.168192    17
## 4 0.001016777  0.8333333  3.261374    10
## 5 0.001118454  0.9166667  3.587512    11
## 6 0.001321810  0.8125000  3.179840    13

I’m going to go ahead and plot to see the associations.