Introduction

The aim of this project is to analyze customer behavior using association rules, unraveling the intricate patterns and connections within Instacart’s vast dataset. By employing association rules mining techniques, we seek to uncover meaningful insights into the composition of customer baskets, identifying frequently co-occurring products and shedding light on the dynamic nature of online grocery shopping habits.

Instacart, in essence, is a popular online grocery delivery and pick-up service that connects customers with personal shoppers who fulfill their orders from local grocery stores. The platform provides a convenient solution for users to shop for their everyday essentials, offering a diverse range of products from various retailers. Instacart’s seamless interface and efficient delivery system have made it a go-to choice for consumers seeking a hassle-free and time-saving approach to grocery shopping.

Loading crucial packages

library(dplyr)
library(tidyr)
library(arules)
library(arulesViz)
library(methods)

Loading the data

Data used for this analysis was taken from Kaggle (https://www.kaggle.com/competitions/instacart-market-basket-analysis/data), where Instacart hosted a competition where people where supposed to predict users’s next order based on what they purchased before. This competition provides us with huge data of over 3 million baskets with around 50 000 different products.

orders<-read.csv('order_products__prior.csv')
products<-read.csv('products.csv')

order_baskets <- orders %>% 
  inner_join(products, by="product_id") %>% 
  group_by(order_id) %>%
  summarise(basket = as.vector(list(product_name)))

transactions <- as(order_baskets$basket, "transactions")

Frequency plots

This plot shows us what products were chosen most frequently - absolutely and relatively.

itemFrequencyPlot(transactions, topN=15, type="absolute", main="Item Frequency") 

itemFrequencyPlot(transactions, topN=15, type="relative", main="Item Frequency")

Apriori algorithm

The Apriori algorithm is a classic association rule mining technique used to discover frequent itemsets in a dataset. It operates on the principle of “apriori property,” which states that if an itemset is frequent, then all of its subsets must also be frequent. The algorithm involves iteratively scanning the dataset to identify frequent itemsets and uses these to generate candidate itemsets for the next iteration. By setting a minimum support threshold, Apriori efficiently filters out infrequent itemsets, ultimately revealing associations and patterns in the data. The algorithm is widely employed for market basket analysis, helping businesses understand customer purchasing behavior and optimize strategies based on discovered associations.

Rules with high confidence

For our first algorithm we choose very low support and high confidence.

rules1 <- apriori(transactions, parameter = list(supp = 0.0001, conf = 0.65, maxlen=3), control = list(verbose = FALSE))
inspect(sort(rules1, by="lift",linebreak = FALSE)[1:5])
##     lhs                                                                                rhs                                                                       support confidence     coverage      lift count
## [1] {Oh My Yog! Gingered Pear Trilayer Yogurt,                                                                                                                                                                  
##      Oh My Yog! Pacific Coast Strawberry Trilayer Yogurt}                           => {Oh My Yog! Organic Wild Quebec Blueberry Cream Top Yogurt & Fruit}  0.0001838330  0.7224939 0.0002544423 1022.3269   591
## [2] {Oh My Yog! Madagascar Vanilla Trilayer Yogyurt,                                                                                                                                                            
##      Oh My Yog! Organic Wild Quebec Blueberry Cream Top Yogurt & Fruit}             => {Oh My Yog! Pacific Coast Strawberry Trilayer Yogurt}                0.0002385786  0.7794715 0.0003060773  877.1098   767
## [3] {Mighty 4 Essential Tots Spinach, Kiwi, Barley & Greek Yogurt Nutrition Blend,                                                                                                                              
##      Mighty 4 Kale, Strawberry, Amaranth & Greek Yogurt Tots Snack}                 => {Mighty 4 Sweet Potato, Blueberry, Millet & Greek Yogurt Tots Snack} 0.0001290875  0.6704362 0.0001925425  850.2437   415
## [4] {Oh My Yog! Gingered Pear Trilayer Yogurt,                                                                                                                                                                  
##      Oh My Yog! Organic Wild Quebec Blueberry Cream Top Yogurt & Fruit}             => {Oh My Yog! Pacific Coast Strawberry Trilayer Yogurt}                0.0001838330  0.7509530 0.0002447996  845.0190   591
## [5] {Mighty 4 Kale, Strawberry, Amaranth & Greek Yogurt Tots Snack,                                                                                                                                             
##      Mighty 4 Purple Carrot Blackberry Quinoa & Greek Yogurt Tots Snack}            => {Mighty 4 Sweet Potato, Blueberry, Millet & Greek Yogurt Tots Snack} 0.0001499281  0.6513514 0.0002301801  826.0404   482
inspect(sort(rules1, by="confidence",linebreak = FALSE)[1:5])
##     lhs                                                                    rhs                                                                                 support confidence     coverage       lift count
## [1] {Oh My Yog! Madagascar Vanilla Trilayer Yogyurt,                                                                                                                                                           
##      Oh My Yog! Organic Wild Quebec Blueberry Cream Top Yogurt & Fruit} => {Oh My Yog! Pacific Coast Strawberry Trilayer Yogurt}                          0.0002385786  0.7794715 0.0003060773  877.10984   767
## [2] {Organic Apples, Carrots and Parsnips Puree,                                                                                                                                                               
##      Stage 1 Apples & Strawberries Organic Pureed Baby Food}            => {Stage 1 Apples Sweet Potatoes Pumpkin & Blueberries Organic Pureed Baby Food} 0.0001377970  0.7559727 0.0001822778  435.46980   443
## [3] {Oh My Yog! Gingered Pear Trilayer Yogurt,                                                                                                                                                                 
##      Oh My Yog! Organic Wild Quebec Blueberry Cream Top Yogurt & Fruit} => {Oh My Yog! Pacific Coast Strawberry Trilayer Yogurt}                          0.0001838330  0.7509530 0.0002447996  845.01898   591
## [4] {All Natural Apricot Sparkling Water,                                                                                                                                                                      
##      Sparkling Water Berry}                                             => {Sparkling Water Grapefruit}                                                   0.0001359307  0.7344538 0.0001850772   31.11478   437
## [5] {Oh My Yog! Gingered Pear Trilayer Yogurt,                                                                                                                                                                 
##      Oh My Yog! Pacific Coast Strawberry Trilayer Yogurt}               => {Oh My Yog! Organic Wild Quebec Blueberry Cream Top Yogurt & Fruit}            0.0001838330  0.7224939 0.0002544423 1022.32694   591
inspect(sort(rules1, by="support",linebreak = FALSE)[1:5])
##     lhs                                                                    rhs                                                        support confidence     coverage      lift count
## [1] {Almond Milk Blueberry Yogurt,                                                                                                                                                   
##      Almond Milk Peach Yogurt}                                          => {Almond Milk Strawberry Yogurt}                       0.0004852445  0.6902655 0.0007029824 388.02528  1560
## [2] {Oh My Yog! Organic Wild Quebec Blueberry Cream Top Yogurt & Fruit} => {Oh My Yog! Pacific Coast Strawberry Trilayer Yogurt} 0.0004672034  0.6610915 0.0007067151 743.90131  1502
## [3] {Pure Sparkling Water,                                                                                                                                                           
##      Sparkling Water Berry}                                             => {Sparkling Water Grapefruit}                          0.0004373422  0.6573165 0.0006653449  27.84690  1406
## [4] {Peach Pear Flavored Sparkling Water,                                                                                                                                            
##      Pure Sparkling Water}                                              => {Sparkling Water Grapefruit}                          0.0004015709  0.6661507 0.0006028230  28.22115  1291
## [5] {Oh My Yog! Madagascar Vanilla Trilayer Yogyurt,                                                                                                                                 
##      Oh My Yog! Organic Wild Quebec Blueberry Cream Top Yogurt & Fruit} => {Oh My Yog! Pacific Coast Strawberry Trilayer Yogurt} 0.0002385786  0.7794715 0.0003060773 877.10984   767

The association rules generated from the Apriori algorithm provide valuable insights into the purchasing patterns of Instacart customers, revealing compelling connections between various products. For instance, the high confidence levels observed in Rules 1 and 3 suggest that customers who opt for specific combinations of Oh My Yog! yogurts, such as Madagascar Vanilla Trilayer Yogyurt and Organic Wild Quebec Blueberry Cream Top Yogurt & Fruit, are significantly inclined to also include the Pacific Coast Strawberry Trilayer Yogurt in their baskets. This insight is valuable for targeted product placements or promotions, optimizing the potential for cross-selling. Additionally, Rules 4 and 5 unveil intriguing associations in the sparkling water category, showcasing that customers who select particular flavors are more likely to diversify their choices, particularly favoring Sparkling Water Grapefruit. These findings can guide marketing strategies, allowing Instacart and its retailers to enhance product recommendations, create compelling bundles, and ultimately enrich the overall shopping experience for their diverse customer base.

plot(rules1, method="paracoord", control=list(reorder=TRUE))
plot
plot
plot(rules1)

This algorithm will have high support (for this data) and lower confidence than before.

Rules with high support

rules2 <- apriori(transactions, parameter = list(supp = 0.01, conf = 0.15, maxlen=3), control = list(verbose = FALSE))
inspect(sort(rules2, by="lift",linebreak = FALSE)[1:5])
##     lhs                         rhs                      support    confidence
## [1] {Organic Raspberries}    => {Organic Strawberries}   0.01053323 0.2470724 
## [2] {Organic Fuji Apple}     => {Banana}                 0.01055811 0.3786929 
## [3] {Organic Raspberries}    => {Bag of Organic Bananas} 0.01259863 0.2955194 
## [4] {Organic Hass Avocado}   => {Bag of Organic Bananas} 0.01939143 0.2918805 
## [5] {Bag of Organic Bananas} => {Organic Hass Avocado}   0.01939143 0.1642931 
##     coverage   lift     count
## [1] 0.04263215 3.000973 33863
## [2] 0.02788041 2.576259 33943
## [3] 0.04263215 2.503775 40503
## [4] 0.06643620 2.472945 62341
## [5] 0.11802951 2.472945 62341
inspect(sort(rules2, by="confidence",linebreak = FALSE)[1:5])
##     lhs                       rhs                      support    confidence
## [1] {Organic Fuji Apple}   => {Banana}                 0.01055811 0.3786929 
## [2] {Organic Avocado}      => {Banana}                 0.01660874 0.3019823 
## [3] {Organic Raspberries}  => {Bag of Organic Bananas} 0.01259863 0.2955194 
## [4] {Organic Hass Avocado} => {Bag of Organic Bananas} 0.01939143 0.2918805 
## [5] {Strawberries}         => {Banana}                 0.01282539 0.2884345 
##     coverage   lift     count
## [1] 0.02788041 2.576259 33943
## [2] 0.05499905 2.054395 53395
## [3] 0.04263215 2.503775 40503
## [4] 0.06643620 2.472945 62341
## [5] 0.04446551 1.962229 41232
inspect(sort(rules2, by="support",linebreak = FALSE)[1:5])
##     lhs                         rhs                      support    confidence
## [1] {Organic Hass Avocado}   => {Bag of Organic Bananas} 0.01939143 0.2918805 
## [2] {Bag of Organic Bananas} => {Organic Hass Avocado}   0.01939143 0.1642931 
## [3] {Organic Strawberries}   => {Bag of Organic Bananas} 0.01916965 0.2328370 
## [4] {Bag of Organic Bananas} => {Organic Strawberries}   0.01916965 0.1624140 
## [5] {Organic Strawberries}   => {Banana}                 0.01746756 0.2121632 
##     coverage   lift     count
## [1] 0.06643620 2.472945 62341
## [2] 0.11802951 2.472945 62341
## [3] 0.08233075 1.972702 61628
## [4] 0.11802951 1.972702 61628
## [5] 0.08233075 1.443353 56156

The association rules derived from the Apriori algorithm shed light on prevalent purchasing patterns among Instacart customers, offering actionable insights for retailers and the platform itself. Notably, the association between Organic Raspberries and Organic Strawberries (Rule 1) indicates that a quarter of customers who purchase Organic Raspberries also opt for Organic Strawberries, presenting an opportunity for targeted promotions or bundling strategies. The connection between Organic Fuji Apple and Bananas (Rule 2) highlights a substantial likelihood of Banana purchases alongside Organic Fuji Apple, guiding retailers in optimizing product displays or collaborative marketing initiatives. Rules 3 and 4 underscore the strong association between certain organic products, such as Raspberries, Hass Avocado, and Bag of Organic Bananas, providing valuable information for inventory management and suggesting potential cross-selling opportunities. Additionally, Rules 5 and 6 reveal bidirectional relationships between Bag of Organic Bananas and Organic Strawberries, emphasizing the interconnected nature of these items in customer baskets. Overall, these insights empower Instacart and its retailers to refine their marketing strategies, enhance product recommendations, and create a more personalized and satisfying shopping experience for their diverse customer base.

plot(rules2, method="graph")

plot(rules2, method="paracoord", control=list(reorder=TRUE))

plot(rules2)

Conclusion

In summary, the association rules analysis of Instacart customer baskets offers a comprehensive perspective on shopping behaviors, revealing hidden connections between products. This insight into consumer preferences empowers retailers to optimize their strategies, from targeted marketing and inventory management to personalized recommendations. The observed associations, like the link between Organic Raspberries and Strawberries or Organic Fuji Apple and Bananas, provide a foundation for strategic decision-making. The analysis underscores the interplay of certain organic product categories, suggesting opportunities for cross-selling and enhancing overall customer satisfaction. By leveraging these findings, Instacart and its retail partners can tailor their offerings to align with customer preferences, creating a more engaging and efficient online grocery shopping experience. The association rules analysis proves to be a valuable asset for businesses aiming to stay agile and responsive in the dynamic e-commerce landscape.