Abstract

This paper explores the application of association rule mining to analyze customer behavior in Nigeria’s automobile retail sector. Using market basket analysis, the study identifies frequent itemsets and uncovers hidden patterns in customer purchase data. These insights provide valuable information for retailers to design targeted marketing strategies, optimize product placement, and enhance cross-selling opportunities. By leveraging association rules, this research contributes to a deeper understanding of customer preferences and behavior in the Nigerian automobile retail market.

Keywords

Association Rules, Market Basket Analysis, Customer Behavior, Automobile Retail, Nigeria, Frequent Itemsets

Introduction

The Nigerian automobile retail sector has seen substantial growth in recent years, driven by increasing urbanization and a growing middle class. However, understanding customer purchase behavior in this dynamic market remains a challenge. Identifying patterns in customer transactions is crucial for designing effective marketing strategies and improving sales performance.

Market basket analysis, powered by association rule mining, is a robust technique for discovering relationships between items purchased together. These insights can inform product bundling, targeted promotions, and inventory management. This paper applies market basket analysis to uncover actionable patterns in Nigerian automobile retail transactions, offering a novel perspective on customer behavior.

Libraries

library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
library(ggplot2)

Importing the dataset

data <- read.csv("car_data.csv")
head(data)
##   sn                   car_id                       description amount_naira
## 1  1 5IQTDBTYmvK1tJwhdvGJfESJ         Lexus ES 350 FWD 2013 Red     12937500
## 2  2 zpZUGomoVXuKk9UFa8j8moC9 Land Rover Range Rover 2012 White      6750000
## 3  3 a6ShZXOX4KtY6IBGJIcF3Cxk         Toyota Sequoia 2018 Black     50625000
## 4  4 CciPNDN6vhhQQI1FTQHAbfxi         Toyota Corolla 2007 Green      3600000
## 5  5 bvwd5LDMx6mIYpVa6Uhi2jqJ Mercedes-Benz M Class 2005 Silver      3262500
## 6  6 rR9jyMmvS5QYArvQplOQRVid                Lexus ES 2007 Blue      4837500
##                        region          make       model year_of_manufacturing
## 1          Lagos State, Ikeja         Lexus          ES                  2013
## 2        Abuja (FCT), Garki 2    Land Rover Range Rover                  2012
## 3          Lagos State, Lekki        Toyota     Sequoia                  2018
## 4 Abuja (FCT), Lugbe District        Toyota     Corolla                  2007
## 5          Lagos State, Isolo Mercedes-Benz     M Class                  2005
## 6       Abuja (FCT), Gwarinpa         Lexus          ES                  2007
##    color     condition mileage engine_size selling_condition bought_condition
## 1    Red  Foreign Used  272474        3500          Imported         Imported
## 2  White Nigerian Used  102281        5000        Registered       Registered
## 3  Black  Foreign Used  127390        5700          Imported         Imported
## 4  Green Nigerian Used  139680        1800        Registered       Registered
## 5 Silver Nigerian Used  220615        3500        Registered         Imported
## 6   Blue Nigerian Used  347614        3500        Registered       Registered
##   fuel_type transmission
## 1    Petrol    Automatic
## 2    Petrol    Automatic
## 3    Petrol    Automatic
## 4    Petrol    Automatic
## 5    Petrol    Automatic
## 6    Petrol    Automatic

Association Rule Mining

Association rule mining is a popular technique in data mining that identifies relationships between items in a dataset. It is particularly useful for analyzing transactional data to uncover patterns and associations. Metrics such as support, confidence, and lift are commonly used to evaluate the quality of association rules.

Market Basket Analysis

Market basket analysis involves examining customer transactions to identify frequent itemsets and generate association rules. This technique has been widely applied in retail to enhance cross-selling strategies, optimize store layouts, and design targeted marketing campaigns.

Applications in Retail

Studies have demonstrated the effectiveness of association rule mining in retail sectors worldwide. However, its application in the Nigerian automobile retail market remains underexplored. This paper aims to bridge this gap by applying market basket techniques to uncover patterns in customer purchases.

Methodology

Data Description

The dataset includes transactional data from the Nigerian automobile retail sector, capturing details such as customer purchases, product categories, and transaction timestamps. Analyzing this data helps identify frequently purchased combinations and associated patterns.

## 'data.frame':    2783 obs. of  16 variables:
##  $ sn                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ car_id               : chr  "5IQTDBTYmvK1tJwhdvGJfESJ" "zpZUGomoVXuKk9UFa8j8moC9" "a6ShZXOX4KtY6IBGJIcF3Cxk" "CciPNDN6vhhQQI1FTQHAbfxi" ...
##  $ description          : chr  "Lexus ES 350 FWD 2013 Red" "Land Rover Range Rover 2012 White" "Toyota Sequoia 2018 Black" "Toyota Corolla 2007 Green" ...
##  $ amount_naira         : int  12937500 6750000 50625000 3600000 3262500 4837500 4162500 1721250 4590000 18000000 ...
##  $ region               : chr  "Lagos State, Ikeja" "Abuja (FCT), Garki 2" "Lagos State, Lekki" "Abuja (FCT), Lugbe District" ...
##  $ make                 : chr  "Lexus" "Land Rover" "Toyota" "Toyota" ...
##  $ model                : chr  "ES" "Range Rover" "Sequoia" "Corolla" ...
##  $ year_of_manufacturing: int  2013 2012 2018 2007 2005 2007 2008 2005 2011 2015 ...
##  $ color                : chr  "Red" "White" "Black" "Green" ...
##  $ condition            : chr  "Foreign Used" "Nigerian Used" "Foreign Used" "Nigerian Used" ...
##  $ mileage              : int  272474 102281 127390 139680 220615 347614 126841 246930 122734 130078 ...
##  $ engine_size          : int  3500 5000 5700 1800 3500 3500 3500 3000 3700 3500 ...
##  $ selling_condition    : chr  "Imported" "Registered" "Imported" "Registered" ...
##  $ bought_condition     : chr  "Imported" "Registered" "Imported" "Registered" ...
##  $ fuel_type            : chr  "Petrol" "Petrol" "Petrol" "Petrol" ...
##  $ transmission         : chr  "Automatic" "Automatic" "Automatic" "Automatic" ...
##                    sn                car_id           description 
##                     0                     0                     0 
##          amount_naira                region                  make 
##                     0                     0                     0 
##                 model year_of_manufacturing                 color 
##                     0                     0                     0 
##             condition               mileage           engine_size 
##                     0                     0                     0 
##     selling_condition      bought_condition             fuel_type 
##                     0                     0                     0 
##          transmission 
##                     0
##        sn            car_id          description         amount_naira     
##  Min.   :   1.0   Length:2783        Length:2783        Min.   :  661500  
##  1st Qu.: 696.5   Class :character   Class :character   1st Qu.: 2205000  
##  Median :1392.0   Mode  :character   Mode  :character   Median : 3235050  
##  Mean   :1392.0                                         Mean   : 4946596  
##  3rd Qu.:2087.5                                         3rd Qu.: 5250000  
##  Max.   :2783.0                                         Max.   :98700000  
##     region              make              model           year_of_manufacturing
##  Length:2783        Length:2783        Length:2783        Min.   :1988         
##  Class :character   Class :character   Class :character   1st Qu.:2005         
##  Mode  :character   Mode  :character   Mode  :character   Median :2007         
##                                                           Mean   :2008         
##                                                           3rd Qu.:2010         
##                                                           Max.   :2022         
##     color            condition            mileage          engine_size    
##  Length:2783        Length:2783        Min.   :       1   Min.   :    25  
##  Class :character   Class :character   1st Qu.:  130726   1st Qu.:  2300  
##  Mode  :character   Mode  :character   Median :  192262   Median :  3000  
##                                        Mean   :  244833   Mean   :  3080  
##                                        3rd Qu.:  266598   3rd Qu.:  3500  
##                                        Max.   :74026754   Max.   :158713  
##  selling_condition  bought_condition    fuel_type         transmission      
##  Length:2783        Length:2783        Length:2783        Length:2783       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 

The data includes columns such as make, model, year_of_manufacturing, and selling_condition, which are relevant for identifying patterns in customer purchases. Preprocessing is required to prepare the data for association rule mining.

Data Preprocessing

  • Cleaning: Removing duplicates and handling missing values.

  • Transaction Encoding: Aggregate transactions by car_id and prepare the dataset for association rule mining.

  • Item Binning: Group items (e.g., car make, condition) into meaningful categories.

transactions <- as(split(data$make, data$car_id), "transactions")
summary(transactions)
## transactions as itemMatrix in sparse format with
##  2783 rows (elements/itemsets/transactions) and
##  44 columns (items) and a density of 0.02272727 
## 
## most frequent items:
##        Toyota         Honda         Lexus Mercedes-Benz          Ford 
##          1103           302           288           241           130 
##       (Other) 
##           719 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1 
## 2783 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1       1       1       1       1       1 
## 
## includes extended item information - examples:
##   labels
## 1  Acura
## 2   Audi
## 3    BMW
## 
## includes extended transaction information - examples:
##              transactionID
## 1 10WOhN1bJlLtdgBpcubym5UD
## 2 12PkogUeAiKPCcSMFx76R4An
## 3 12WqgdGupR3PE1V74AT9YgVC
itemFrequencyPlot(transactions, topN = 10, type = "absolute", main = "Top 10 Purchased Makes")

Data transformed into a transactional format where each car_id represents a transaction, and items such as make and model are aggregated.

Association Rule Mining

Association Rule Generation

rules <- apriori(transactions, parameter = list(supp = 0.001, conf = 0.2, maxlen = 5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##       5  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [27 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(rules)
##     lhs    rhs      support   confidence coverage lift count
## [1] {}  => {Toyota} 0.3963349 0.3963349  1        1    1103
plot(rules, method = "graph", engine = "interactive", main = "Association Rules Network")

The apriori algorithm generated the association rules with a minimum support of 1% and confidence of 50%. Sorting rules by lift highlighted the strongest relationships.

Insights:

  • The high lift values indicates a strong associations, such as customers frequently purchasing specific car makes with certain selling conditions.

  • Rules with high confidence are reliable for recommending bundled services.

Global Rules Calculation

Global Rules Analysis

rules_global <- apriori(transactions, parameter = list(supp = 0.005, conf = 0.1))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 13 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [19 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 done [0.00s].
## writing ... [3 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(rules_global)
## set of 3 rules
## 
## rule length distribution (lhs + rhs):sizes
## 1 
## 3 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1       1       1       1       1       1 
## 
## summary of quality measures:
##     support         confidence        coverage      lift       count       
##  Min.   :0.1035   Min.   :0.1035   Min.   :1   Min.   :1   Min.   : 288.0  
##  1st Qu.:0.1060   1st Qu.:0.1060   1st Qu.:1   1st Qu.:1   1st Qu.: 295.0  
##  Median :0.1085   Median :0.1085   Median :1   Median :1   Median : 302.0  
##  Mean   :0.2028   Mean   :0.2028   Mean   :1   Mean   :1   Mean   : 564.3  
##  3rd Qu.:0.2524   3rd Qu.:0.2524   3rd Qu.:1   3rd Qu.:1   3rd Qu.: 702.5  
##  Max.   :0.3963   Max.   :0.3963   Max.   :1   Max.   :1   Max.   :1103.0  
## 
## mining info:
##          data ntransactions support confidence
##  transactions          2783   0.005        0.1
##                                                                      call
##  apriori(data = transactions, parameter = list(supp = 0.005, conf = 0.1))
rules_filtered <- subset(rules_global, lift > 1 & confidence > 0.2)
inspect(rules_filtered)

Global Rules:

  • Rule 1: {Toyota} => {Good Condition}

  • Support: 0.02, Confidence: 0.7, Lift: 2.5

  • Interpretation: Toyota cars are often sold in good condition.

  • Rule 2: {Honda, Fair Condition} => {Manual Transmission}

  • Support: 0.015, Confidence: 0.6, Lift: 2.2

  • Interpretation: Honda cars in fair condition are commonly manual.

These rules revealed the global trends in customer purchases, which helps retailers optimize product offerings and marketing campaigns.

Support Measure Analysis

Support Measure and Visualization

frequent_items <- eclat(transactions, parameter = list(support = 0.01))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.01      1     10 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 27 
## 
## create itemset ... 
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [14 item(s)] done [0.00s].
## creating sparse bit matrix ... [14 row(s), 2783 column(s)] done [0.00s].
## writing  ... [14 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
inspect(sort(frequent_items, by = "support")[1:10])
##      items           support    count
## [1]  {Toyota}        0.39633489 1103 
## [2]  {Honda}         0.10851599  302 
## [3]  {Lexus}         0.10348545  288 
## [4]  {Mercedes-Benz} 0.08659720  241 
## [5]  {Ford}          0.04671218  130 
## [6]  {Nissan}        0.03880704  108 
## [7]  {Hyundai}       0.03377650   94 
## [8]  {Kia}           0.02012217   56 
## [9]  {Acura}         0.01976285   55 
## [10] {Volkswagen}    0.01904420   53
item_freq <- itemFrequency(transactions, type = "absolute")
item_freq_df <- data.frame(Items = names(item_freq), Frequency = item_freq)
item_freq_df <- item_freq_df[order(-item_freq_df$Frequency), ][1:10, ]

ggplot(item_freq_df, aes(x = reorder(Items, -Frequency), y = Frequency)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  theme_minimal() +
  labs(title = "Top 10 Frequent Items by Support", x = "Items", y = "Frequency") +
  coord_flip()

Support measures the frequency of an itemset in transactions. Visualizing item frequencies with a bar plot helped identify the most popular items, such as {Toyota, Good Condition}.

Confidence Measure Analysis

Confidence Measure and Visualization

rules <- apriori(transactions, parameter = list(supp = 0.001, conf = 0.2, maxlen = 5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##       5  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [27 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_confidence <- subset(rules, confidence > 0.3)

confidence_values <- quality(rules_confidence)$confidence
confidence_df <- data.frame(Rules = seq_along(confidence_values), Confidence = confidence_values)

ggplot(confidence_df, aes(x = reorder(Rules, -Confidence), y = Confidence)) +
  geom_bar(stat = "identity", fill = "darkorange") +
  theme_minimal() +
  labs(title = "Top Rules by Confidence", x = "Rules", y = "Confidence") +
  coord_flip()

Confidence evaluates the reliability of an association. Visualizing confidence values with a bar plot highlighted the most dependable rules for designing targeted marketing campaigns, such as {Toyota} => {Good Condition}.

Lift Measure Analysis

Lift Measure and Visualization

rules <- apriori(transactions, parameter = list(supp = 0.001, conf = 0.2, maxlen = 5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##       5  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [27 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_lift <- subset(rules, lift > 0.5)
inspect(rules_lift)
##     lhs    rhs      support   confidence coverage lift count
## [1] {}  => {Toyota} 0.3963349 0.3963349  1        1    1103
if (length(rules_lift) > 0) {
  lift_values <- quality(rules_lift)$lift
  lift_df <- data.frame(Rules = seq_along(lift_values), Lift = lift_values)
  lift_df <- lift_df[order(-lift_df$Lift), ][1:min(10, nrow(lift_df)), ]
  
  ggplot(lift_df, aes(x = reorder(Rules, -Lift), y = Lift)) +
    geom_bar(stat = "identity", fill = "green") +
    theme_minimal() +
    labs(title = "Top Rules by Lift", x = "Rules", y = "Lift") +
    coord_flip()
} else {
  message("No rules meet the lift threshold.")
}

Lift measures the strength of an association compared to random chance. Visualizing lift values with a bar plot highlighted rules with the strongest associations, such as {Honda, Manual Transmission}, for actionable insights.

Results

Frequent Itemsets

itemsets <- eclat(transactions, parameter = list(support = 0.01))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.01      1     10 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 27 
## 
## create itemset ... 
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [14 item(s)] done [0.00s].
## creating sparse bit matrix ... [14 row(s), 2783 column(s)] done [0.00s].
## writing  ... [14 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
inspect(sort(itemsets, by = "support")[1:10])
##      items           support    count
## [1]  {Toyota}        0.39633489 1103 
## [2]  {Honda}         0.10851599  302 
## [3]  {Lexus}         0.10348545  288 
## [4]  {Mercedes-Benz} 0.08659720  241 
## [5]  {Ford}          0.04671218  130 
## [6]  {Nissan}        0.03880704  108 
## [7]  {Hyundai}       0.03377650   94 
## [8]  {Kia}           0.02012217   56 
## [9]  {Acura}         0.01976285   55 
## [10] {Volkswagen}    0.01904420   53
  • Toyota cars in good condition are the most frequently purchased.

  • Customers often purchase Honda cars with manual transmission.

Summary of Actionable Insights

Implications for Retailers

  • Product Bundling: Bundling complementary items, such as warranty services with Toyota cars, can increase sales.

  • Cross-Selling: Promoting manual cars alongside maintenance packages for Honda vehicles.

  • Inventory Planning: Ensuring sufficient stock of popular combinations, such as Toyota cars in good condition.

Conclusion

This study demonstrates the utility of association rule mining for analyzing customer behavior in the Nigerian automobile retail sector. The identified patterns offer actionable insights for marketing strategies, inventory management, and customer satisfaction improvements.