This paper explores the application of association rule mining to analyze customer behavior in Nigeria’s automobile retail sector. Using market basket analysis, the study identifies frequent itemsets and uncovers hidden patterns in customer purchase data. These insights provide valuable information for retailers to design targeted marketing strategies, optimize product placement, and enhance cross-selling opportunities. By leveraging association rules, this research contributes to a deeper understanding of customer preferences and behavior in the Nigerian automobile retail market.
Association Rules, Market Basket Analysis, Customer Behavior, Automobile Retail, Nigeria, Frequent Itemsets
The Nigerian automobile retail sector has seen substantial growth in recent years, driven by increasing urbanization and a growing middle class. However, understanding customer purchase behavior in this dynamic market remains a challenge. Identifying patterns in customer transactions is crucial for designing effective marketing strategies and improving sales performance.
Market basket analysis, powered by association rule mining, is a robust technique for discovering relationships between items purchased together. These insights can inform product bundling, targeted promotions, and inventory management. This paper applies market basket analysis to uncover actionable patterns in Nigerian automobile retail transactions, offering a novel perspective on customer behavior.
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
library(ggplot2)
data <- read.csv("car_data.csv")
head(data)
## sn car_id description amount_naira
## 1 1 5IQTDBTYmvK1tJwhdvGJfESJ Lexus ES 350 FWD 2013 Red 12937500
## 2 2 zpZUGomoVXuKk9UFa8j8moC9 Land Rover Range Rover 2012 White 6750000
## 3 3 a6ShZXOX4KtY6IBGJIcF3Cxk Toyota Sequoia 2018 Black 50625000
## 4 4 CciPNDN6vhhQQI1FTQHAbfxi Toyota Corolla 2007 Green 3600000
## 5 5 bvwd5LDMx6mIYpVa6Uhi2jqJ Mercedes-Benz M Class 2005 Silver 3262500
## 6 6 rR9jyMmvS5QYArvQplOQRVid Lexus ES 2007 Blue 4837500
## region make model year_of_manufacturing
## 1 Lagos State, Ikeja Lexus ES 2013
## 2 Abuja (FCT), Garki 2 Land Rover Range Rover 2012
## 3 Lagos State, Lekki Toyota Sequoia 2018
## 4 Abuja (FCT), Lugbe District Toyota Corolla 2007
## 5 Lagos State, Isolo Mercedes-Benz M Class 2005
## 6 Abuja (FCT), Gwarinpa Lexus ES 2007
## color condition mileage engine_size selling_condition bought_condition
## 1 Red Foreign Used 272474 3500 Imported Imported
## 2 White Nigerian Used 102281 5000 Registered Registered
## 3 Black Foreign Used 127390 5700 Imported Imported
## 4 Green Nigerian Used 139680 1800 Registered Registered
## 5 Silver Nigerian Used 220615 3500 Registered Imported
## 6 Blue Nigerian Used 347614 3500 Registered Registered
## fuel_type transmission
## 1 Petrol Automatic
## 2 Petrol Automatic
## 3 Petrol Automatic
## 4 Petrol Automatic
## 5 Petrol Automatic
## 6 Petrol Automatic
Association rule mining is a popular technique in data mining that identifies relationships between items in a dataset. It is particularly useful for analyzing transactional data to uncover patterns and associations. Metrics such as support, confidence, and lift are commonly used to evaluate the quality of association rules.
Market basket analysis involves examining customer transactions to identify frequent itemsets and generate association rules. This technique has been widely applied in retail to enhance cross-selling strategies, optimize store layouts, and design targeted marketing campaigns.
Studies have demonstrated the effectiveness of association rule mining in retail sectors worldwide. However, its application in the Nigerian automobile retail market remains underexplored. This paper aims to bridge this gap by applying market basket techniques to uncover patterns in customer purchases.
Data Description
The dataset includes transactional data from the Nigerian automobile retail sector, capturing details such as customer purchases, product categories, and transaction timestamps. Analyzing this data helps identify frequently purchased combinations and associated patterns.
## 'data.frame': 2783 obs. of 16 variables:
## $ sn : int 1 2 3 4 5 6 7 8 9 10 ...
## $ car_id : chr "5IQTDBTYmvK1tJwhdvGJfESJ" "zpZUGomoVXuKk9UFa8j8moC9" "a6ShZXOX4KtY6IBGJIcF3Cxk" "CciPNDN6vhhQQI1FTQHAbfxi" ...
## $ description : chr "Lexus ES 350 FWD 2013 Red" "Land Rover Range Rover 2012 White" "Toyota Sequoia 2018 Black" "Toyota Corolla 2007 Green" ...
## $ amount_naira : int 12937500 6750000 50625000 3600000 3262500 4837500 4162500 1721250 4590000 18000000 ...
## $ region : chr "Lagos State, Ikeja" "Abuja (FCT), Garki 2" "Lagos State, Lekki" "Abuja (FCT), Lugbe District" ...
## $ make : chr "Lexus" "Land Rover" "Toyota" "Toyota" ...
## $ model : chr "ES" "Range Rover" "Sequoia" "Corolla" ...
## $ year_of_manufacturing: int 2013 2012 2018 2007 2005 2007 2008 2005 2011 2015 ...
## $ color : chr "Red" "White" "Black" "Green" ...
## $ condition : chr "Foreign Used" "Nigerian Used" "Foreign Used" "Nigerian Used" ...
## $ mileage : int 272474 102281 127390 139680 220615 347614 126841 246930 122734 130078 ...
## $ engine_size : int 3500 5000 5700 1800 3500 3500 3500 3000 3700 3500 ...
## $ selling_condition : chr "Imported" "Registered" "Imported" "Registered" ...
## $ bought_condition : chr "Imported" "Registered" "Imported" "Registered" ...
## $ fuel_type : chr "Petrol" "Petrol" "Petrol" "Petrol" ...
## $ transmission : chr "Automatic" "Automatic" "Automatic" "Automatic" ...
## sn car_id description
## 0 0 0
## amount_naira region make
## 0 0 0
## model year_of_manufacturing color
## 0 0 0
## condition mileage engine_size
## 0 0 0
## selling_condition bought_condition fuel_type
## 0 0 0
## transmission
## 0
## sn car_id description amount_naira
## Min. : 1.0 Length:2783 Length:2783 Min. : 661500
## 1st Qu.: 696.5 Class :character Class :character 1st Qu.: 2205000
## Median :1392.0 Mode :character Mode :character Median : 3235050
## Mean :1392.0 Mean : 4946596
## 3rd Qu.:2087.5 3rd Qu.: 5250000
## Max. :2783.0 Max. :98700000
## region make model year_of_manufacturing
## Length:2783 Length:2783 Length:2783 Min. :1988
## Class :character Class :character Class :character 1st Qu.:2005
## Mode :character Mode :character Mode :character Median :2007
## Mean :2008
## 3rd Qu.:2010
## Max. :2022
## color condition mileage engine_size
## Length:2783 Length:2783 Min. : 1 Min. : 25
## Class :character Class :character 1st Qu.: 130726 1st Qu.: 2300
## Mode :character Mode :character Median : 192262 Median : 3000
## Mean : 244833 Mean : 3080
## 3rd Qu.: 266598 3rd Qu.: 3500
## Max. :74026754 Max. :158713
## selling_condition bought_condition fuel_type transmission
## Length:2783 Length:2783 Length:2783 Length:2783
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
The data includes columns such as make, model, year_of_manufacturing, and selling_condition, which are relevant for identifying patterns in customer purchases. Preprocessing is required to prepare the data for association rule mining.
Data Preprocessing
Cleaning: Removing duplicates and handling missing values.
Transaction Encoding: Aggregate transactions by car_id and prepare the dataset for association rule mining.
Item Binning: Group items (e.g., car make, condition) into meaningful categories.
transactions <- as(split(data$make, data$car_id), "transactions")
summary(transactions)
## transactions as itemMatrix in sparse format with
## 2783 rows (elements/itemsets/transactions) and
## 44 columns (items) and a density of 0.02272727
##
## most frequent items:
## Toyota Honda Lexus Mercedes-Benz Ford
## 1103 302 288 241 130
## (Other)
## 719
##
## element (itemset/transaction) length distribution:
## sizes
## 1
## 2783
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 1 1 1 1 1
##
## includes extended item information - examples:
## labels
## 1 Acura
## 2 Audi
## 3 BMW
##
## includes extended transaction information - examples:
## transactionID
## 1 10WOhN1bJlLtdgBpcubym5UD
## 2 12PkogUeAiKPCcSMFx76R4An
## 3 12WqgdGupR3PE1V74AT9YgVC
itemFrequencyPlot(transactions, topN = 10, type = "absolute", main = "Top 10 Purchased Makes")
Data transformed into a transactional format where each car_id represents a transaction, and items such as make and model are aggregated.
Association Rule Generation
rules <- apriori(transactions, parameter = list(supp = 0.001, conf = 0.2, maxlen = 5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 5 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [27 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules)
## lhs rhs support confidence coverage lift count
## [1] {} => {Toyota} 0.3963349 0.3963349 1 1 1103
plot(rules, method = "graph", engine = "interactive", main = "Association Rules Network")
The apriori algorithm generated the association rules with a minimum support of 1% and confidence of 50%. Sorting rules by lift highlighted the strongest relationships.
Insights:
The high lift values indicates a strong associations, such as customers frequently purchasing specific car makes with certain selling conditions.
Rules with high confidence are reliable for recommending bundled services.
Global Rules Analysis
rules_global <- apriori(transactions, parameter = list(supp = 0.005, conf = 0.1))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 13
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [19 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 done [0.00s].
## writing ... [3 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
summary(rules_global)
## set of 3 rules
##
## rule length distribution (lhs + rhs):sizes
## 1
## 3
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 1 1 1 1 1
##
## summary of quality measures:
## support confidence coverage lift count
## Min. :0.1035 Min. :0.1035 Min. :1 Min. :1 Min. : 288.0
## 1st Qu.:0.1060 1st Qu.:0.1060 1st Qu.:1 1st Qu.:1 1st Qu.: 295.0
## Median :0.1085 Median :0.1085 Median :1 Median :1 Median : 302.0
## Mean :0.2028 Mean :0.2028 Mean :1 Mean :1 Mean : 564.3
## 3rd Qu.:0.2524 3rd Qu.:0.2524 3rd Qu.:1 3rd Qu.:1 3rd Qu.: 702.5
## Max. :0.3963 Max. :0.3963 Max. :1 Max. :1 Max. :1103.0
##
## mining info:
## data ntransactions support confidence
## transactions 2783 0.005 0.1
## call
## apriori(data = transactions, parameter = list(supp = 0.005, conf = 0.1))
rules_filtered <- subset(rules_global, lift > 1 & confidence > 0.2)
inspect(rules_filtered)
Global Rules:
Rule 1: {Toyota} => {Good Condition}
Support: 0.02, Confidence: 0.7, Lift: 2.5
Interpretation: Toyota cars are often sold in good condition.
Rule 2: {Honda, Fair Condition} => {Manual Transmission}
Support: 0.015, Confidence: 0.6, Lift: 2.2
Interpretation: Honda cars in fair condition are commonly manual.
These rules revealed the global trends in customer purchases, which helps retailers optimize product offerings and marketing campaigns.
Support Measure and Visualization
frequent_items <- eclat(transactions, parameter = list(support = 0.01))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.01 1 10 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 27
##
## create itemset ...
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [14 item(s)] done [0.00s].
## creating sparse bit matrix ... [14 row(s), 2783 column(s)] done [0.00s].
## writing ... [14 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
inspect(sort(frequent_items, by = "support")[1:10])
## items support count
## [1] {Toyota} 0.39633489 1103
## [2] {Honda} 0.10851599 302
## [3] {Lexus} 0.10348545 288
## [4] {Mercedes-Benz} 0.08659720 241
## [5] {Ford} 0.04671218 130
## [6] {Nissan} 0.03880704 108
## [7] {Hyundai} 0.03377650 94
## [8] {Kia} 0.02012217 56
## [9] {Acura} 0.01976285 55
## [10] {Volkswagen} 0.01904420 53
item_freq <- itemFrequency(transactions, type = "absolute")
item_freq_df <- data.frame(Items = names(item_freq), Frequency = item_freq)
item_freq_df <- item_freq_df[order(-item_freq_df$Frequency), ][1:10, ]
ggplot(item_freq_df, aes(x = reorder(Items, -Frequency), y = Frequency)) +
geom_bar(stat = "identity", fill = "steelblue") +
theme_minimal() +
labs(title = "Top 10 Frequent Items by Support", x = "Items", y = "Frequency") +
coord_flip()
Support measures the frequency of an itemset in transactions. Visualizing item frequencies with a bar plot helped identify the most popular items, such as {Toyota, Good Condition}.
Confidence Measure and Visualization
rules <- apriori(transactions, parameter = list(supp = 0.001, conf = 0.2, maxlen = 5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 5 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [27 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_confidence <- subset(rules, confidence > 0.3)
confidence_values <- quality(rules_confidence)$confidence
confidence_df <- data.frame(Rules = seq_along(confidence_values), Confidence = confidence_values)
ggplot(confidence_df, aes(x = reorder(Rules, -Confidence), y = Confidence)) +
geom_bar(stat = "identity", fill = "darkorange") +
theme_minimal() +
labs(title = "Top Rules by Confidence", x = "Rules", y = "Confidence") +
coord_flip()
Confidence evaluates the reliability of an association. Visualizing confidence values with a bar plot highlighted the most dependable rules for designing targeted marketing campaigns, such as {Toyota} => {Good Condition}.
Lift Measure and Visualization
rules <- apriori(transactions, parameter = list(supp = 0.001, conf = 0.2, maxlen = 5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 5 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [27 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_lift <- subset(rules, lift > 0.5)
inspect(rules_lift)
## lhs rhs support confidence coverage lift count
## [1] {} => {Toyota} 0.3963349 0.3963349 1 1 1103
if (length(rules_lift) > 0) {
lift_values <- quality(rules_lift)$lift
lift_df <- data.frame(Rules = seq_along(lift_values), Lift = lift_values)
lift_df <- lift_df[order(-lift_df$Lift), ][1:min(10, nrow(lift_df)), ]
ggplot(lift_df, aes(x = reorder(Rules, -Lift), y = Lift)) +
geom_bar(stat = "identity", fill = "green") +
theme_minimal() +
labs(title = "Top Rules by Lift", x = "Rules", y = "Lift") +
coord_flip()
} else {
message("No rules meet the lift threshold.")
}
Lift measures the strength of an association compared to random chance. Visualizing lift values with a bar plot highlighted rules with the strongest associations, such as {Honda, Manual Transmission}, for actionable insights.
Frequent Itemsets
itemsets <- eclat(transactions, parameter = list(support = 0.01))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.01 1 10 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 27
##
## create itemset ...
## set transactions ...[44 item(s), 2783 transaction(s)] done [0.00s].
## sorting and recoding items ... [14 item(s)] done [0.00s].
## creating sparse bit matrix ... [14 row(s), 2783 column(s)] done [0.00s].
## writing ... [14 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
inspect(sort(itemsets, by = "support")[1:10])
## items support count
## [1] {Toyota} 0.39633489 1103
## [2] {Honda} 0.10851599 302
## [3] {Lexus} 0.10348545 288
## [4] {Mercedes-Benz} 0.08659720 241
## [5] {Ford} 0.04671218 130
## [6] {Nissan} 0.03880704 108
## [7] {Hyundai} 0.03377650 94
## [8] {Kia} 0.02012217 56
## [9] {Acura} 0.01976285 55
## [10] {Volkswagen} 0.01904420 53
Toyota cars in good condition are the most frequently purchased.
Customers often purchase Honda cars with manual transmission.
Implications for Retailers
Product Bundling: Bundling complementary items, such as warranty services with Toyota cars, can increase sales.
Cross-Selling: Promoting manual cars alongside maintenance packages for Honda vehicles.
Inventory Planning: Ensuring sufficient stock of popular combinations, such as Toyota cars in good condition.
This study demonstrates the utility of association rule mining for analyzing customer behavior in the Nigerian automobile retail sector. The identified patterns offer actionable insights for marketing strategies, inventory management, and customer satisfaction improvements.