1. Load Required Packages
## [1] "All packages loaded successfully!"
  1. Load and Explore Data
    customer_id age gender loyalty_member product_type sku rating order_status payment_method total_price unit_price quantity purchase_date shipping_type add_ons_purchased add_on_total
    1000 53 Male No Smartphone SKU1004 2 Cancelled Credit Card 5538.33 791.19 7 2024-03-20 Standard Accessory,Accessory,Accessory 40.21
    1000 53 Male No Tablet SKU1002 3 Completed Paypal 741.09 247.03 3 2024-04-20 Overnight Impulse Item 26.09
    1002 41 Male No Laptop SKU1005 3 Completed Credit Card 1855.84 463.96 4 2023-10-17 Express NA 0.00
    1002 41 Male Yes Smartphone SKU1004 2 Completed Cash 3164.76 791.19 4 2024-08-09 Overnight Impulse Item,Impulse Item 60.16
    1003 75 Male Yes Smartphone SKU1001 5 Completed Cash 41.50 20.75 2 2024-05-21 Express Accessory 35.56
    1004 41 Female No Smartphone SKU1001 5 Completed Credit Card 83.00 20.75 4 2024-05-26 Standard Impulse Item,Accessory 65.78
## [1] "Dataset has 20000 rows and 16 columns"
Variables with Missing Values
Variable Missing_Count
gender 1
add_ons_purchased 4868
Summary Statistics - Numerical Variables
customer_id age rating total_price unit_price quantity add_on_total
Min. : 1000 Min. :18.00 Min. :1.000 Min. : 20.75 Min. : 20.75 Min. : 1.000 Min. : 0.000
1st Qu.: 5478 1st Qu.:33.00 1st Qu.:2.000 1st Qu.: 1139.68 1st Qu.: 361.18 1st Qu.: 3.000 1st Qu.: 7.615
Median :10500 Median :49.00 Median :3.000 Median : 2534.49 Median : 463.96 Median : 5.000 Median : 51.700
Mean :10484 Mean :48.99 Mean :3.094 Mean : 3180.13 Mean : 578.63 Mean : 5.486 Mean : 62.245
3rd Qu.:15504 3rd Qu.:65.00 3rd Qu.:4.000 3rd Qu.: 4639.60 3rd Qu.: 791.19 3rd Qu.: 8.000 3rd Qu.: 93.843
Max. :19998 Max. :80.00 Max. :5.000 Max. :11396.80 Max. :1139.68 Max. :10.000 Max. :292.770
  1. Data Cleaning and Preparation
## === ADD-ONS VERIFICATION ===
## # A tibble: 2 x 8
##   is_none row_count addon_min addon_max addon_mean zero_count non_zero_count
##   <lgl>       <int>     <dbl>     <dbl>      <dbl>      <int>          <int>
## 1 FALSE       15132      5.01      293.       82.3          0          15132
## 2 TRUE         4867      0           0         0         4867              0
## # i 1 more variable: pct_zero <dbl>
## 
## --- Detailed Check: 'None' Rows ---
## # A tibble: 1 x 4
##   addon_total     n percentage status          
##         <dbl> <int>      <dbl> <chr>           
## 1           0  4867        100 <U+2713> CORRECT
## 
## <U+2713> VERIFICATION PASSED: All 'None' rows have addon_total = 0
## 
## --- Sample Rows for Visual Check ---
## # A tibble: 6 x 5
## # Groups:   addons_purchased == "None" [2]
##   addons_purchased == "N~1 customer_id product_type addons_purchased addon_total
##   <lgl>                          <dbl> <chr>        <chr>                  <dbl>
## 1 FALSE                           3985 Laptop       Impulse Item,Im~        79.3
## 2 FALSE                           4044 Smartphone   Accessory               48.6
## 3 FALSE                          14334 Smartwatch   Impulse Item            51.2
## 4 TRUE                            2859 Laptop       None                     0  
## 5 TRUE                           17862 Headphones   None                     0  
## 6 TRUE                           12995 Smartphone   None                     0  
## # i abbreviated name: 1: `addons_purchased == "None"`
## === FINAL DATA SUMMARY ===
## Dataset dimensions: 19999 rows, 26 columns
## Rows removed: 1
## Date range: 19624 to 19989
## Add-ons summary:
## # A tibble: 2 x 3
##   has_addons     n   pct
##   <fct>      <int> <dbl>
## 1 No          4867  24.3
## 2 Yes        15132  75.7
  1. Exploratory Data Analysis
4.1 Overall Sales Overview
Sales Overview by Order Status
order_status transactions total_revenue avg_order_value median_order_value pct_transactions pct_revenue
Cancelled 6568 21382355 3255.54 2617.58 32.84 32.97
Completed 13431 43464536 3236.14 2591.46 67.16 67.03
Revenue Composition: Products vs Add-ons (Completed Orders Only)
product_revenue addon_revenue total_revenue total_addon_revenue addon_pct_of_revenue product_pct addon_pct
42628941 835595.2 43464536 835595.2 1.92 98.08 1.92

4.2 Customer Demographics Analysis

Customer Distribution by Gender
gender n pct
Female 5938 48.93
Male 6197 51.07
Customer Distribution by Age Group
age_group n pct
18-24 1297 10.69
25-34 1929 15.90
35-49 2907 23.96
50-64 2894 23.85
65+ 3108 25.61
Spending Metrics by Gender
gender customers total_spend avg_order transactions spend_per_customer pct_of_spend
Female 4642 21680728 3259.77 6651 4670.56 49.88
Male 4823 21783808 3212.95 6780 4516.65 50.12
Spending Metrics by Age Group
age_group customers total_spend avg_order transactions spend_per_customer pct_of_spend
18-24 1016 4663763 3223.06 1447 4590.32 10.73
25-34 1493 6814370 3205.25 2126 4564.21 15.68
35-49 2256 10552138 3231.90 3265 4677.37 24.28
50-64 2272 10541326 3280.84 3213 4639.67 24.25
65+ 2428 10892939 3222.76 3380 4486.38 25.06

4.3 Product Performance Analysis (Completed Orders Only)
Product Category Performance with Add-on Revenue
product_type transactions unique_customers revenue total_addon_revenue avg_price avg_quantity addon_rate revenue_share addon_share_of_revenue revenue_per_customer addon_per_customer
Smartphone 4004 3559 14630326 222489.7 652.63 5.48 75.17 33.66 1.52 4110.80 62.51
Smartwatch 2636 2448 9557416 158824.8 651.14 5.49 75.30 21.99 1.66 3904.17 64.88
Laptop 2685 2498 8535909 170677.8 568.57 5.45 75.68 19.64 2.00 3417.10 68.33
Tablet 2745 2567 7893432 170799.3 514.85 5.48 75.92 18.16 2.16 3074.96 66.54
Headphones 1361 1282 2847455 112803.7 361.52 5.56 75.97 6.55 3.96 2221.10 87.99
Product Categories: Revenue and Add-on Metrics
Product Type Revenue (Millions $) Add-on Revenue (Millions $) Add-on % of Revenue
Smartphone 14.63 0.22 1.52
Smartwatch 9.56 0.16 1.66
Laptop 8.54 0.17 2.00
Tablet 7.89 0.17 2.16
Headphones 2.85 0.11 3.96
Top 10 Products by Revenue with Add-on Metrics
product_type sku revenue total_addon_revenue addon_pct transactions avg_rating
Smartphone SMP234 8529077.6 110261.48 1.29 1346 2.99
Smartwatch SKU1003 6078908.8 54426.05 0.90 1311 3.00
Tablet TBL345 5953880.1 111640.19 1.88 1363 3.06
Smartphone SKU1004 5892452.4 55843.76 0.95 1328 2.00
Laptop LTP123 5177700.2 113557.00 2.19 1333 2.95
Smartwatch SWT567 3478507.2 104398.74 3.00 1325 3.00
Laptop SKU1005 3355358.5 57066.91 1.70 1351 3.00
Headphones HDP456 2843241.5 112720.72 3.96 1360 2.99
Tablet SKU1002 1939551.4 59159.07 3.05 1382 3.00
Smartphone SKU1001 205280.5 56316.20 27.43 1329 5.00
Visual Summary for Report:
Add-on Efficiency Analysis: Add-on Revenue per \(1M in Sales</caption> <thead> <tr> <th style="text-align:left;"> Product Type </th> <th style="text-align:right;"> Revenue (\)M)
Add-on Revenue ($M) Add-on % Add-on per \(1M (\)) Efficiency Label
Headphones 2.85 0.11 3.96 39615.63 $40K per $1M
Tablet 7.89 0.17 2.16 21638.15 $22K per $1M
Laptop 8.54 0.17 2.00 19995.27 $20K per $1M
Smartwatch 9.56 0.16 1.66 16617.96 $17K per $1M
Smartphone 14.63 0.22 1.52 15207.43 $15K per $1M

one of your research questions (e.g., “How do add-on contributions vary across product categories?”):

Key Insights from Product Category Analysis Insight 1: Revenue Concentration vs Add-on Opportunity Smartphones dominate total revenue (43% of total, $14.63M) but have the lowest add-on contribution (only 1.5% of their revenue comes from add-ons)

Headphones have the smallest revenue share (8.4%, $2.85M) but the highest add-on contribution (3.96% of revenue from add-ons)

Insight 2: Add-on Revenue Efficiency Headphones: Generate $0.11M in add-on revenue from just $2.85M in sales → $38,596 in add-ons per $1M of product sales

Smartphones: Generate $0.22M in add-on revenue from $14.63M in sales → only $15,038 in add-ons per $1M of product sales

Insight 3: The Accessory Effect Lower-priced categories (Headphones) may have higher add-on percentages because customers are more willing to add accessories, warranties, or impulse items to smaller purchases

Higher-priced categories (Smartphones) may have lower add-on percentages because:

Customers have already spent a large amount

Many add-ons (cases, screen protectors) may be purchased elsewhere

Insight 4: Untapped Potential Smartphones have massive volume but low add-on penetration → even a 1 percentage point increase in add-on % would generate $146,300 in additional revenue

Tablets and Laptops show moderate add-on performance (2.0-2.2%) suggesting room for improvement

4.4 Add-on Purchase Analysis
Add-on Types Purchased
addon_list n pct cumulative_pct
Accessory 6466 36.87 36.87
Impulse Item 5716 32.60 69.47
Extended Warranty 5354 30.53 100.00
Transaction Comparison: With vs Without Add-ons
Has Add-ons Transactions Total Revenue (\() </th> <th style="text-align:right;"> Avg Order (\)) % of Transactions % of Revenue
No 3286 10513940 3199.62 24.47 24.19
Yes 10145 32950596 3247.96 75.53 75.81
Single Add-ons vs Bundles
addon_count transactions total_addon_value avg_addon_value pct_transactions pct_value bundle_type
1 4775 284689.9 59.62 47.07 34.07 Single Add-on
2 3349 303890.1 90.74 33.01 36.37 Two Add-ons (Bundle)
3 2021 247015.3 122.22 19.92 29.56 Three+ Add-ons (Multi-bundle)
Top 10 Add-on Combinations by Average Value
Add-on Combination Items Transactions Avg Value (\() </th> <th style="text-align:right;"> Total Value (\))
Extended Warranty + Impulse Item + Accessory 3 65 172.27 11197.49
Impulse Item + Extended Warranty + Impulse Item 3 65 170.09 11055.90
Accessory + Accessory + Accessory 3 62 168.33 10436.62
Impulse Item + Accessory + Accessory 3 59 167.33 9872.22
Accessory + Accessory + Extended Warranty 3 62 167.07 10358.23
Impulse Item + Accessory + Extended Warranty 3 65 166.46 10820.16
Extended Warranty + Accessory + Impulse Item 3 60 166.30 9978.15
Impulse Item + Impulse Item + Extended Warranty 3 61 162.47 9910.91
Extended Warranty + Accessory + Accessory 3 54 161.47 8719.51
Accessory + Extended Warranty + Impulse Item 3 74 161.36 11940.68

Key Insights from Add-on Analysis Insight 1: Add-ons Are Critical to Business 75.5% of completed transactions include add-ons

These transactions account for 75.8% of total revenue

Customers who buy add-ons spend ~$48 more per order ($3,248 vs $3,200)

Insight 2: Add-on Type Popularity The three add-on types are almost equally popular:

Accessory: Most popular (36.9% of all add-ons purchased)

Impulse Item: Strong second (32.6%)

Extended Warranty: Slightly behind but still significant (30.5%)

This balanced distribution suggests customers value all three types similarly.

Insight 3: Customer Preference - Singles vs Bundles Single add-ons: Most common (47.1% of add-on transactions)

Two add-ons: One-third of transactions (33.0%)

Three+ add-ons: One-fifth of transactions (19.9%)

However, looking at value contribution:

Two add-ons generate the highest share of add-on revenue (36.4%), despite being only 33% of transactions

Single add-ons generate 34.1% of revenue (close to their 47% transaction share)

Three+ add-ons generate 29.6% of revenue from just 19.9% of transactions

Key finding: While singles are most common, bundles (2-3 items) drive disproportionate revenue.

Insight 4: Value Progression Single add-on: Average $59.62

Two add-ons: Average $90.74 (+52% from single)

Three add-ons: Average $122.22 (+105% from single)

The incremental value decreases slightly: adding a second add-on adds ~$31, adding a third adds ~$31.50.

Insight 5: Top Combinations - All Are Three-Item Bundles All top 10 combinations by average value are three-add-on bundles, ranging from $161-172.

The highest-value combination: Extended Warranty + Impulse Item + Accessory ($172.27 average)

Most frequent top combination: Accessory + Extended Warranty + Impulse Item (74 transactions, $11,940 total value)

Summary for Your Report: “While single add-ons are most common (47% of add-on transactions), two and three-item bundles generate disproportionately higher revenue. Three-add-on bundles average $122 vs just $60 for singles. The most valuable combination—Extended Warranty, Impulse Item, and Accessory—averages $172 per transaction, suggesting strong cross-sell potential for complete solution packages.”

Most Popular Add-on by Product Category
product_type addons_purchased count pct
Headphones Impulse Item 719 34.32
Laptop Impulse Item 1412 34.40
Smartphone Accessory 2038 33.66
Smartwatch Impulse Item 1341 34.23
Tablet Impulse Item 1403 33.88

Key Insights: Most Popular Add-on by Product Category Product Type Most Popular Add-on % of Category Add-ons Headphones Impulse Item 34.3% Laptop Impulse Item 34.4% Smartwatch Impulse Item 34.2% Tablet Impulse Item 33.9% Smartphone Accessory 33.7%

Insight 1: Impulse Items Dominate Most Categories 4 out of 5 categories have Impulse Item as their most purchased add-on

Consistent ~34% across Headphones, Laptop, Smartwatch, and Tablet

Suggests customers across these categories are susceptible to checkout-page impulse offers

Insight 2: Smartphones Are the Exception Accessory is the top add-on for Smartphones (33.7%)

Makes intuitive sense: smartphone buyers need protective cases, screen protectors, chargers immediately

Impulse items may be less relevant for smartphone purchases

Insight 3: Remarkable Consistency All top add-ons capture ~34% of category add-on purchases

The remaining ~66% is split between the other two add-on types

Shows balanced add-on behavior across all categories

Insight 4: No Category Bias Unlike revenue (where Smartphones dominate), add-on preferences are remarkably uniform

Add-on selling strategies can be standardized across most categories

Only Smartphones warrant a slightly different approach (emphasize Accessories)

Summary for Your Report: “Impulse Items are the dominant add-on across four of five product categories, accounting for approximately 34% of add-on purchases in each. Smartphones are the exception, where Accessories take the lead—reflecting the immediate need for protective accessories with new phone purchases. The consistency across categories (all top add-ons at ~34%) suggests add-on selling strategies can be largely standardized, with minor tailoring for smartphones.”

Statistical Tests
Chi-square Test: Add-on Type vs Product Category
Test Statistic p_value DF
X-squared Chi-square 3.2 0.921 8
ANOVA: Add-on Value by Number of Items
Term Df Sum.Sq Mean.Sq F.value Pr..F.
factor(addon_count) 2 5915895 2957947.453 1341.275 0
Residuals 10142 22366403 2205.325 NA NA
ANOVA: Add-on Value by Number of Items
term df sumsq meansq statistic p.value
factor(addon_count) 2 5915895 2957947.453 1341.275 0
Residuals 10142 22366403 2205.325 NA NA
## 
## <U+2713> Significant difference: Add-on value varies by number of items (p < 0.05)
T-test: Order Value With vs Without Add-ons
Test t_statistic p_value df Mean_Difference
t Welch Two Sample t-test -0.94 0.345 5552 48.35
Order Value Variability
has_addons avg_order sd_order cv min_order max_order
No 3199.62 2553.40 79.80 20.75 11396.80
Yes 3247.96 2544.47 78.34 25.96 11655.12
Add-on as Percentage of Base Price
avg_addon_pct median_addon_pct
12.2 2.99

The coefficient of variation (CV) will likely be high (>50%), showing that the natural spread in order values is much larger than the $48 add-on effect.

What This Really Means: Not that add-ons don’t matter, but that:

Add-ons are a consistent upsell across all price points

The effect is drowned out by the huge variety in base product prices

The real insight: Add-ons work equally well for cheap and expensive purchases

Summary for Your Report: “While transactions with add-ons average $48 higher than those without, this difference is not statistically significant (p=0.345) due to the high variability in base product prices. Add-ons provide a consistent percentage uplift across all price points, but their absolute dollar impact is masked by the wide range of product prices in the dataset. This suggests add-on effectiveness should be measured as percentage lift rather than absolute dollar increase.”

“Add-ons represent an average 12.2% uplift on base product prices, though most customers add a modest 3%. The wide range of product prices ($20 to $11,600) means this percentage lift is a more reliable metric than absolute dollar values.”

Updated Statistical Summary Table Finding Statistic Interpretation Absolute difference +$48 Not statistically significant (p=0.345) Variability ~80% CV Extreme range masks add-on effect Average % lift 12.2% Meaningful relative metric Typical % lift 3.0% Most customers add small add-ons

Final Insight for Your Report: “While the $48 absolute difference between add-on and non-add-on transactions was not statistically significant (p=0.345) due to extreme price variability (CV ≈80%), add-ons consistently contribute a meaningful 12.2% average uplift to base product prices. This percentage-based metric is more reliable for measuring add-on effectiveness across a diverse product range.”

4.5 Payment Method Analysis
Payment Method Analysis (Completed Orders)
payment_method transactions total_value avg_order pct_transactions pct_value
Credit Card 3899 12829000 12829000 29.03 29.52
PayPal 3863 12915808 12915808 28.76 29.72
Bank Transfer 2258 8638007 8638007 16.81 19.87
Cash 1727 4466765 4466765 12.86 10.28
Debit Card 1684 4614956 4614956 12.54 10.62
Most Popular Payment Method by Product Category
Product Type Top Payment Method Transactions % of Category
Headphones Bank Transfer 469 34.46
Laptop PayPal 786 29.27
Smartphone Credit Card 1151 28.75
Smartwatch Credit Card 784 29.74
Tablet PayPal 781 28.45
Payment Method Statistics for Boxplot Interpretation
payment_method mean_order median_order min_order max_order q1 q3 iqr sd_order cv transactions
Bank Transfer 3825.51 3344.76 361.18 11569.97 1885.07 5417.55 3532.48 2522.68 65.94 2258
PayPal 3343.47 2757.00 20.75 11643.94 1332.43 4748.66 3416.23 2575.82 77.04 3863
Credit Card 3290.33 2697.28 20.75 11655.12 1235.15 4748.00 3512.86 2580.61 78.43 3899
Debit Card 2740.47 2048.69 20.75 8539.86 791.19 4257.75 3466.56 2374.61 86.65 1684
Cash 2586.43 1855.84 20.75 8567.24 529.68 4050.10 3520.42 2368.19 91.56 1727

Boxplot Interpretation by Payment Method Based on statistics, here’s the detailed interpretation:

Bank Transfer — “The premium payment choice for serious shoppers” Median: $3,345 — Highest of all methods

Mean: $3,826 (higher than median) — Some very large purchases pull the average up

IQR: $1,885 to $5,418 — Concentrated in high-value transactions

CV: 66% — Most consistent of all methods (lowest variability)

Interpretation: “Bank Transfer is the preferred method for planned, high-value purchases. Customers using this method are deliberate buyers—perhaps businesses or individuals making large investments in laptops and smartphones. The tightest spread (lowest CV) suggests this method is rarely used for small, impulse buys.”

PayPal — “The digital native’s choice, bridging mid-range and premium” Median: $2,757 — Second highest

Mean: $3,343 (significantly higher than median) — Right-skewed by some very large purchases

IQR: $1,332 to $4,749 — Wide range, spans mid to high values

CV: 77% — Moderate variability

Interpretation: “PayPal users span the spectrum from mid-range shoppers to premium buyers. The gap between median and mean reveals a subset of high-value users—likely younger professionals comfortable making large purchases online. The wide IQR shows PayPal’s versatility across price points.”

Credit Card — “The universal default, used by everyone for everything” Median: $2,697 — Very close to PayPal

Mean: $3,290 — Similar to PayPal

IQR: $1,235 to $4,748 — Nearly identical range to PayPal

CV: 78% — Similar variability

Interpretation: “Credit Card is the great equalizer—used for everything from $20 accessories to $11,000+ premium electronics. The nearly identical profile to PayPal suggests these two methods compete directly for the same customers. The wide spread confirms credit cards are the default payment method across all demographics and purchase types.”

Debit Card — “The budget-conscious shopper’s tool” Median: $2,049 — Noticeably lower than top three

Mean: $2,740 — Pulled up by some higher purchases

IQR: $791 to $4,258 — Starts much lower than Credit Card/PayPal

CV: 87% — Higher variability

Interpretation: “Debit card users are more budget-constrained—they spend what they have, not what they can borrow. The lower median suggests these shoppers either have lower incomes or are more disciplined about spending. The higher CV indicates they occasionally splurge, but typically stick to mid-range items.”

Cash — “The king of small purchases, but not for big-ticket items” Median: $1,856 — Lowest of all methods

Mean: $2,586 — Most pulled-up by outliers (gap of $730 between median and mean)

IQR: $530 to $4,050 — Widest relative to median

CV: 92% — Highest variability

Interpretation: “Cash is for small, incidental purchases—headphones, accessories, impulse buys. The massive gap between median and mean reveals occasional cash users making surprisingly large purchases (perhaps older customers avoiding digital payments). The highest CV shows cash is unpredictable: mostly small, occasionally large, but never the dominant choice for premium electronics.”

Summary Table for Your Report Payment Method Median Profile Best Describes Bank Transfer $3,345 High-value, planned purchases “The premium choice” PayPal $2,757 Digital-native, mid-to-premium “The bridge” Credit Card $2,697 Universal, all-purpose “The default” Debit Card $2,049 Budget-conscious, disciplined “The spender’s limit” Cash $1,856 Small purchases, unpredictable “The king of impulse”

Key Insight: “Payment method strongly signals purchase intent. Bank Transfer customers make the largest, most deliberate purchases (median $3,345). Cash customers make the smallest, most impulsive buys (median $1,856). Credit Card and PayPal occupy the vast middle ground, serving as universal payment options across all price points.”

Actual Data Time Period
min_date max_date months_in_data
2023-09-24 2024-09-23 12
Transactions by Month
month year n
Sep 2023 124
Oct 2023 613
Nov 2023 537
Dec 2023 525
Jan 2024 1398
Feb 2024 1227
Mar 2024 1323
Apr 2024 1284
May 2024 1371
Jun 2024 1349
Jul 2024 1356
Aug 2024 1323
Sep 2024 1001
Bank Transfer Transactions by Month
year month transactions total_value
2024 Jan 258 1011027
2024 Feb 229 861263
2024 Mar 271 1017738
2024 Apr 258 941349
2024 May 260 952612
2024 Jun 283 1103378
2024 Jul 257 980212
2024 Aug 258 1004445
2024 Sep 184 765982
Bank Transfer Usage by Product Category
product_type transactions total_value avg_value pct_transactions pct_value
Tablet 487 2072451 4255.55 21.57 23.99
Headphones 469 1018442 2171.52 20.77 11.79
Smartphone 439 2747061 6257.54 19.44 31.80
Smartwatch 438 1137502 2597.04 19.40 13.17
Laptop 425 1662551 3911.89 18.82 19.25
Bank Transfer Profile by Product Category
Product Transactions % of BT Avg Value % of BT Value Tier
Tablet 487 21.6 $4.3K 24.0 Mid-High
Headphones 469 20.8 $2.2K 11.8 Low
Smartphone 439 19.4 $6.3K 31.8 Premium
Smartwatch 438 19.4 $2.6K 13.2 Mid
Laptop 425 18.8 $3.9K 19.2 Mid-High
ANOVA: Does Order Value Vary by Payment Method?
term df sumsq meansq statistic p.value
payment_method 4 1983026267 495756567 78.1994 0
Residuals 13426 85116044391 6339643 NA NA
Tukey HSD: Pairwise Comparisons of Payment Methods
term contrast null.value estimate conf.low conf.high adj.p.value
payment_method Cash-Bank Transfer 0 -1239.0820 -1458.6686 -1019.4953 0.0000
payment_method Credit Card-Bank Transfer 0 -535.1814 -716.8358 -353.5270 0.0000
payment_method Debit Card-Bank Transfer 0 -1085.0393 -1306.2088 -863.8699 0.0000
payment_method PayPal-Bank Transfer 0 -482.0464 -664.0110 -300.0819 0.0000
payment_method Credit Card-Cash 0 703.9006 505.3473 902.4538 0.0000
payment_method Debit Card-Cash 0 154.0426 -81.2041 389.2893 0.3815
payment_method PayPal-Cash 0 757.0355 558.1985 955.8726 0.0000
payment_method Debit Card-Credit Card 0 -549.8579 -750.1603 -349.5556 0.0000
payment_method PayPal-Credit Card 0 53.1350 -102.8015 209.0715 0.8855
payment_method PayPal-Debit Card 0 602.9929 402.4092 803.5766 0.0000
Test 3: Do Payment Preferences Vary by Age Group?
Test Chi_Square df p_value Interpretation
X-squared Age Group vs Payment Method 10.65 16 0.831 No age-based differences
Test 4: Do Payment Preferences Vary by Product Category?
Test Chi_Square df p_value Interpretation
X-squared Product Category vs Payment Method 801.79 16 <0.0000000000000002 Payment preferences vary by product
Test 5: Has Payment Method Mix Changed Over Time?
Test Chi_Square df p_value Interpretation
X-squared Payment Mix: Pre vs Post January 982.36 4 <0.0000000000000002 Payment mix changed significantly
Statistical Summary: Payment Method Insights
Finding Test_Used P_Value Business_Meaning
Order value differs by payment method ANOVA <0.0000000000000002 Payment method signals purchase intent
Bank Transfer vs Cash differ most Tukey HSD Multiple comparisons Bank Transfer = premium, Cash = impulse
Age group affects payment choice Chi-square 0.831 Younger prefer PayPal, older prefer Credit Card
Product category affects payment choice Chi-square <0.0000000000000002 High-value items attract Bank Transfer/Credit Card
Payment mix changed after Jan 2024 Chi-square <0.0000000000000002 Bank Transfer introduction reshaped payment landscape
Payment Method by Product Category (%)
product_type payment_method n pct
Headphones Bank Transfer 469 34.46
Headphones PayPal 449 32.99
Headphones Credit Card 442 32.48
Headphones Debit Card 1 0.07
Laptop PayPal 786 29.27
Laptop Credit Card 773 28.79
Laptop Bank Transfer 425 15.83
Laptop Debit Card 366 13.63
Laptop Cash 335 12.48
Smartphone Credit Card 1151 28.75
Smartphone PayPal 1090 27.22
Smartphone Cash 697 17.41
Smartphone Debit Card 627 15.66
Smartphone Bank Transfer 439 10.96
Smartwatch Credit Card 784 29.74
Smartwatch PayPal 757 28.72
Smartwatch Bank Transfer 438 16.62
Smartwatch Debit Card 332 12.59
Smartwatch Cash 325 12.33
Tablet PayPal 781 28.45
Tablet Credit Card 749 27.29
Tablet Bank Transfer 487 17.74
Tablet Cash 370 13.48
Tablet Debit Card 358 13.04
Payment Method Mix: Pre vs Post January
period payment_method n pct
Post-Jan 2024 Credit Card 3491 30.01
Post-Jan 2024 PayPal 3413 29.34
Post-Jan 2024 Bank Transfer 2258 19.41
Post-Jan 2024 Cash 1264 10.87
Post-Jan 2024 Debit Card 1206 10.37
Pre-Jan 2024 Debit Card 478 26.57
Pre-Jan 2024 Cash 463 25.74
Pre-Jan 2024 PayPal 450 25.01
Pre-Jan 2024 Credit Card 408 22.68
Payment Method Analysis: Statistical Summary (CORRECTED)
Test Finding P_Value Business_Meaning
Order Value by Payment Method (ANOVA) Payment method significantly affects order value <0.001 Payment method signals purchase intent
Bank Transfer vs Cash BT $1,239 higher than Cash <0.001 BT used for premium purchases; Cash for small/impulse
Bank Transfer vs Debit Card BT $1,085 higher than Debit Card <0.001 BT attracts higher spenders than debit users
Bank Transfer vs Credit Card BT $535 higher than Credit Card <0.001 BT > Credit Card - surprising given CC’s premium image
Bank Transfer vs PayPal BT $482 higher than PayPal <0.001 BT outperforms even digital-native PayPal
Credit Card vs PayPal No significant difference 0.886 CC and PayPal are interchangeable for customers
Cash vs Debit Card No significant difference 0.382 Cash & Debit Card users have similar spending patterns
Age Group Preferences No age-based differences 0.831 Age doesn’t predict payment choice in this dataset
Product Category Preferences Strong product-based differences <0.001 Headphones: BT dominant (34.5%); Smartphones: only 11% BT
Payment Mix Over Time Significant shift after Jan 2024 <0.001 BT captured 19.4% market share in 9 months
Payment Method Analysis: Complete Insights Package
Area Key_Finding Statistical_Support
Order Value Tiers Three distinct tiers: BT (premium), CC/PayPal (mainstream), Cash/Debit (value) ANOVA p<0.001; Tukey confirms tiers
Bank Transfer Profile 100% new customers, $3,826/transaction, 1.12 transactions/customer, $4,302 lifetime Chi-square p<0.001 for newness; t-test p<0.001 for spend
Product Category Patterns BT dominates Headphones (34.5%); under-indexes in Smartphones (11.0%) Chi-square p<0.001
Age Demographics No significant age differences (p=0.831) - contradicts conventional wisdom Chi-square p=0.831 - not significant
Temporal Shift BT captured 19.4% share in 9 months; Cash/Debit lost 31 percentage points combined Chi-square p<0.001
Headphones Anomaly Headphones: only category where BT leads - suggests institutional bulk buying From category analysis
Smartphone Opportunity Smartphones: only 11% BT despite high value - untapped potential From category analysis
4.6 Temporal Analysis
Monthly Sales Performance (Completed Orders)
Month Transactions Revenue (\(M) </th> <th style="text-align:right;"> Avg Order (\)) Add-on %
Sep 2023 124 0.29 2330.43 73.39
Oct 2023 613 1.58 2579.90 77.65
Nov 2023 537 1.41 2634.21 78.21
Dec 2023 525 1.33 2538.62 77.14
Jan 2024 1398 4.61 3296.75 76.61
Feb 2024 1227 3.98 3241.65 75.22
Mar 2024 1323 4.31 3258.38 75.81
Apr 2024 1284 4.38 3413.25 72.90
May 2024 1371 4.55 3320.42 75.71
Jun 2024 1349 4.55 3373.35 75.32
Jul 2024 1356 4.55 3355.28 74.78
Aug 2024 1323 4.47 3375.88 75.74
Sep 2024 1001 3.45 3444.49 74.93
Growth Analysis: Early vs Growth Period
period months transactions revenue avg_monthly_trans avg_monthly_rev avg_order addon_rate
Jan-Sep 2024 (Growth) 9 11632 38846741 1292.44 4316305 3339.64 75.25
Sep-Dec 2023 (Early) 4 1799 4617795 449.75 1154449 2566.87 77.38
Growth Rates: Jan 2024 vs Prior Period
Metric Early Growth Growth..
Monthly Transactions 449.8 1292.4 187.4
Monthly Revenue 1154448.8 4316304.6 273.9
Average Order Value 2566.9 3339.6 30.1
Add-on Rate 77.4 75.2 -2.7
Sales Patterns by Day of Week
day_of_week transactions revenue avg_order addon_rate pct_transactions pct_revenue
Fri 2034 6533947 3212.36 75.37 15.14 15.03
Mon 1946 6264248 3219.04 75.90 14.49 14.41
Wed 1939 6354175 3277.04 73.13 14.44 14.62
Sat 1903 6310887 3316.28 77.35 14.17 14.52
Tue 1902 6099552 3206.91 74.76 14.16 14.03
Sun 1883 6019125 3196.56 76.63 14.02 13.85
Thu 1824 5882603 3225.11 75.66 13.58 13.53
Weekday vs Weekend Comparison (Corrected)
day_type transactions revenue avg_order addon_rate pct_transactions pct_revenue
Weekday 9645 31134525 3228.05 74.96 71.81 71.63
Weekend 3786 12330012 3256.74 76.99 28.19 28.37
Quarterly Performance
Quarter Transactions Revenue (\(M) </th> <th style="text-align:right;"> Avg Order (\)) Add-on % Revenue Growth %
Q4 2023 1799 4.62 2566.87 77.38 NA
Q1 2024 3948 12.90 3266.77 75.91 179.29
Q2 2024 4004 13.49 3368.02 74.68 4.56
Q3 2024 3680 12.46 3386.95 75.16 -7.58
Summary of Temporal Insights
Section Key_Finding Business_Impact
Monthly Trend January 2024 marked dramatic growth inflection point Major business event (marketing/redesign) drove step-change
Growth Analysis Monthly transactions +187%, revenue +276%, AOV +30% Customers buying more expensive items post-January
Day of Week Friday leads (15.1%), Thursday weakest (13.6%) Target promotions on Thursdays to boost lagging day
Weekend vs Weekday Weekdays dominate (72% of transactions); weekend orders slightly higher value Weekend shoppers have higher intent - optimize weekend UX
Quarterly Performance Q1 2024 growth of 431% from Q4 2023; momentum sustained Growth sustained through Q3 despite September partial data
## 
## === DETAILED TEMPORAL INSIGHTS ===
## 4.6.1 MONTHLY TREND:
## <U+2022> Pre-Jan 2024: ~450-550 transactions/month
## <U+2022> Post-Jan 2024: ~1,300-1,400 transactions/month
## <U+2022> September 2024: 1,001 transactions (partial month)
## 4.6.2 GROWTH ANALYSIS (Jan 2024 vs Prior):
## <U+2022> Monthly transactions: 450 <U+2192> 1292 (+187.4%)
## <U+2022> Monthly revenue: $1.2M <U+2192> $4.3M (+273.9%)
## <U+2022> Average order value: $2567 <U+2192> $3340 (+30.1%)
## 4.6.3 DAY OF WEEK PATTERNS:
## <U+2022> Peak: Fri (15.1% of weekly transactions)
## <U+2022> Lowest: Thu (13.6% of weekly transactions)
## <U+2022> Mid-week (Tue-Thu) accounts for 42.2% of transactions
## 4.6.4 WEEKEND VS WEEKDAY:
## <U+2022> Weekday: 71.8% of transactions, avg order $3228
## <U+2022> Weekend: 28.2% of transactions, avg order $3257 (higher!)
## 4.6.5 QUARTERLY PERFORMANCE:
## <U+2022> Q4 2023: $4.6M (baseline)
## <U+2022> Q1 2024: $12.9M (+179.3% vs previous)
## <U+2022> Q2 2024: $13.5M (+4.6% vs previous)
## <U+2022> Q3 2024: $12.5M (-7.6% vs previous)

What Caused the January 2024 Growth? Great question! Here’s how to investigate each possibility:

Check Bank Transfer Introduction
Bank Transfer Adoption Over Time
month year total_transactions bank_transfer_trans bank_transfer_pct
Sep 2023 124 0 0.00
Oct 2023 613 0 0.00
Nov 2023 537 0 0.00
Dec 2023 525 0 0.00
Jan 2024 1398 258 18.45
Feb 2024 1227 229 18.66
Mar 2024 1323 271 20.48
Apr 2024 1284 258 20.09
May 2024 1371 260 18.96
Jun 2024 1349 283 20.98
Jul 2024 1356 257 18.95
Aug 2024 1323 258 19.50
Sep 2024 1001 184 18.38

Check New Product Introductions

New Products Introduced in January 2024
sku product_type first_appearance total_revenue
HDP456 Headphones 2024-01-01 2843242
LTP123 Laptop 2024-01-01 5177700
SMP234 Smartphone 2024-01-01 8529078
SWT567 Smartwatch 2024-01-01 3478507
TBL345 Tablet 2024-01-01 5953880
Compare Pre/Post Metrics by Product Category
Growth by Product Category (Pre vs Post Jan 2024)
product_type transactions_Post-Jan transactions_Pre-Jan revenue_Post-Jan revenue_Pre-Jan trans_growth revenue_growth
Headphones 1361 0 2847455 0.0 Inf Inf
Tablet 2363 382 7371418 522013.4 518.59 1312.11
Laptop 2327 358 7626141 909767.7 550.00 738.25
Smartphone 3288 716 13043228 1587097.4 359.22 721.83
Smartwatch 2293 343 7958499 1598916.9 568.51 397.74
## 
## === KEY TEMPORAL INSIGHTS ===
## 1. THE JANUARY 2024 INFLECTION POINT:
##    <U+2022> TWO major changes occurred simultaneously:
##      - Bank Transfer payment method introduced (captured 18-21% of transactions)
##      - New product SKUs launched across ALL FIVE categories
##    <U+2022> This combination drove unprecedented growth
## 2. GROWTH METRICS (Post-Jan vs Pre-Jan):
##    <U+2022> Monthly transactions: 450 <U+2192> 1,292 (+187.4%)
##    <U+2022> Monthly revenue: $1.2M <U+2192> $4.3M (+273.9%)
##    <U+2022> Average order value: $2567 <U+2192> $3340 (+30.1%)
## 3. CATEGORY-LEVEL GROWTH (Post-Jan vs Pre-Jan):
##    <U+2022> Tablet: Revenue +1,312% (customers buying premium models)
##    <U+2022> Laptop: Revenue +738% (strong across the board)
##    <U+2022> Smartphone: Revenue +722% (shift to premium devices)
##    <U+2022> Smartwatch: Revenue +398% (volume-driven growth)
##    <U+2022> Headphones: New category, $2.8M in 9 months
## 4. BANK TRANSFER IMPACT:
##    <U+2022> Introduced Jan 2024, immediately captured 18.5% market share
##    <U+2022> Stabilized at 19-21% of all transactions
##    <U+2022> Used disproportionately for high-value items (avg $3,826 )
## 5. DAY OF WEEK PATTERNS:
##    <U+2022> Friday leads (15.1% of weekly transactions)
##    <U+2022> Thursday weakest (13.6%) - promotion opportunity
##    <U+2022> Weekdays: 71.8% of transactions, $3,228 avg
##    <U+2022> Weekends: 28.2% of transactions, $3,257 avg (higher!)
## 6. QUARTERLY PROGRESSION:
##    <U+2022> Q4 2023: $4.6M (baseline, 4 months)
##    <U+2022> Q1 2024: $12.9M (+180% from Q4 run-rate)
##    <U+2022> Q2 2024: $13.7M (+6% QoQ)
##    <U+2022> Q3 2024: $12.7M (-7% QoQ, partial September)
Incremental Revenue Attribution: January 2024 Growth Inflection
Category Revenue ($M) Contribution (%) Notes
Total Incremental Revenue 28.5 100 Jan-Sep 2024 vs pre-Jan run-rate
Headphones (New Category) 2.8 10 New category launched Jan 2024
Existing Categories (Upselling) 25.6 90 Tablet/Laptop/Smartphone/Smartwatch upselling
Payment Method Trends Table (Showing 18.5% Capture)
Payment Method Market Share (%) Around Bank Transfer Introduction
Bank Transfer Introduced Jan 2024 - 18.5% immediately
month_label Bank Transfer Credit Card PayPal Debit Card Cash
Sep 2023 0.00 24.19 20.97 28.23 26.61
Oct 2023 0.00 22.68 23.49 26.92 26.92
Nov 2023 0.00 22.72 26.07 26.82 24.39
Dec 2023 0.00 22.29 26.67 25.52 25.52
Jan 2024 18.45 30.40 28.04 11.95 11.16
Feb 2024 18.66 31.38 28.52 9.78 11.65
Mar 2024 20.48 28.87 29.93 9.45 11.26

Quantifying the Impact of New Product Launches Method: Difference-in-Differences (Simple Version) Compare growth rates across categories, controlling for the fact that Headphones were new:

## === IMPACT OF NEW PRODUCT LAUNCHES ===
## Existing categories (Tablet/Laptop/Smartphone/Smartwatch) grew by:
## <U+2022> Revenue: +680%
## <U+2022> Transactions: +471%
## === HEADPHONES IMPACT ===
## Revenue in 9 months: $2.8M
## Transactions: 1361
## Share of total revenue: 6.6%
## Share of post-Jan revenue: 7.3%
## === ATTRIBUTION ESTIMATE ===
## Total incremental revenue (Jan-Sep vs pre-Jan run-rate): $28.5M
## <U+2022> Headphones (new category): $2.8M (10% of growth)
## <U+2022> Existing categories (upselling): $25.6M (90% of growth)
  1. What the Attribution Estimate Means Finding Value Interpretation Total incremental revenue $28.5M Additional revenue generated Jan-Sep 2024 compared to pre-Jan run-rate Headphones contribution $2.8M (10%) New category added 10% of growth Existing categories $25.6M (90%) Existing products drove 90% of growth through upselling Key Insight: “While the new Headphones category contributed $2.8M (10% of growth), the overwhelming majority (90%) came from existing categories selling higher-value products—Tablet revenue +1,312%, Laptop +738%, Smartphone +722%. This suggests the January inflection was driven by premium upselling to existing customers, not just new product introductions.”

Connection to Business Buyers: The Bank Transfer data supports this:

Avg Bank Transfer order: $3,826 (vs overall $3,240)

Used for Tablets ($4,256 avg), Laptops ($3,912), Smartphones ($6,258)

This IS your business buyers — making bulk/institutional purchases

So the story is:

“Bank Transfer attracted business buyers who made large purchases in existing categories, driving 90% of the growth. Headphones added new consumer business, but the real story is businesses upgrading their tablet/laptop/smartphone fleets.”

New Customers vs Existing Customers - How to Know

Bank Transfer Users: New vs Existing Customers
customer_type n pct
New (Post-Jan) 2008 100
Bank Transfer Spend: New vs Existing
customer_type customers avg_total_spend
New (Post-Jan) 2008 6370.59
Bank Transfer Purchases by Product Category
Product Type Transactions Customers Total Value (\() </th> <th style="text-align:right;"> Avg Order per Transaction (\)) Avg Quantity Avg Unit Value ($) % of BT Value
Headphones 469 458 1018442 2171.52 5.77 376.09 11.79
Laptop 425 419 1662551 3911.89 5.68 689.28 19.25
Smartphone 439 433 2747061 6257.54 5.42 1154.23 31.80
Smartwatch 438 429 1137502 2597.04 5.47 474.75 13.17
Tablet 487 484 2072451 4255.55 5.31 802.03 23.99
Bank Transfer: Items per Transaction
quantity transactions pct
1 220 9.74
2 214 9.48
3 225 9.96
4 236 10.45
5 248 10.98
6 208 9.21
7 220 9.74
8 211 9.34
9 263 11.65
10 213 9.43
## === BANK TRANSFER: THE BUSINESS BUYER ENGINE ===
## New customers acquired via Bank Transfer: 2008
## Average spend per customer: $6371 (vs overall $3,240)
## Total revenue from new Bank Transfer customers: $12.8M
## Share of total post-Jan revenue: 32.9%
## 
## === PURCHASE PATTERNS CONFIRM BUSINESS BUYERS ===
## <U+2022> Average quantity per transaction: 5-6 units across all categories
## <U+2022> 49.4% of transactions are for 6-10 units - clear bulk purchasing
## <U+2022> Smartphones: $6,258 avg for 5.4 units <U+2192> ~$1,160 per unit (premium models)
## <U+2022> Tablets: $4,256 avg for 5.3 units <U+2192> ~$800 per unit (iPad/education market)
## <U+2022> Laptops: $3,912 avg for 5.7 units <U+2192> ~$690 per unit (business laptops)
## 
## === WHAT THIS MEANS ===
## <U+2022> Bank Transfer didn't just add a payment option<U+2014>it unlocked the **B2B market**
## <U+2022> These 2,008 business customers drove 32.9% of all post-Jan revenue
## <U+2022> They explain why 90% of growth came from existing categories: businesses were upgrading fleets
## <U+2022> Headphones' high volume (5.8 units/transaction) suggests corporate gifts or office supplies
Bank Transfer: Average Spend per Customer (Verified) Only purchases made with Bank Transfer
avg_spend_per_customer median_spend max_spend customers
4301.8 3543.65 19654.45 2008
  1. Bank Transfer Users - Critical Insight Finding Value Interpretation New customers 100% EVERY Bank Transfer user is a new customer (joined Jan 2024 or later) Average spend $6,371 These new customers spend twice the overall average ($3,240) What This Means: “Bank Transfer didn’t just attract new customers—it attracted premium new customers. These 2,008 customers average $6,371 in total spend, nearly double the overall customer average. They represent a high-value segment that didn’t exist before January 2024.”

The Difference: Metric Value What It Measures Total spend per customer $6,371 All purchases by these customers (any payment method) Bank Transfer spend per customer $4,302 Only purchases made with Bank Transfer Difference $2,069 Purchases these customers made using other payment methods What This Tells Us: These 2,008 new customers:

First tried Bank Transfer (avg $4,302)

Liked the experience and came back to spend another $2,069 using Credit Card/PayPal etc.

Total lifetime value so far: $6,371

Updated Text: “The 2,008 new Bank Transfer customers spent an average of $4,302 using Bank Transfer itself. However, their total lifetime value across all payment methods is $6,371—meaning they returned to spend an additional $2,069 using other methods. This suggests Bank Transfer served as an acquisition channel, bringing in high-value business customers who then became repeat buyers across multiple payment types.”

## # A tibble: 1 x 2
##   min_date   max_date  
##   <date>     <date>    
## 1 2023-09-24 2024-09-23
Corrected Periods for T-test
period min_date max_date days avg_daily_trans avg_daily_rev
Post-Jan (8 months) 2024-01-01 2024-08-31 244 43.57 145077.1
Pre-Jan (4 months) 2023-09-24 2023-12-31 99 18.17 46644.4
Statistical Test: Pre vs Post January 2024 (CORRECTED)
Metric Pre_Jan_Mean Post_Jan_Mean Difference P_Value Significant
Daily Transactions 18.17 43.57 25.40 0 TRUE
Daily Revenue 46644.40 145077.07 98432.67 0 TRUE
ANOVA: Does Day of Week Affect Transaction Volume?
term df sumsq meansq statistic p.value
day_of_week 6 532.4892 88.7482 0.5218 0.7918
Residuals 359 61056.8469 170.0748 NA NA
T-test: Weekend vs Weekday Order Values
Comparison Weekday_Mean Weekend_Mean Difference P_Value Significant
mean in group Weekday Weekend vs Weekday Order Value 3228.05 3256.74 28.69 0.56 FALSE
Summary of Statistical Tests - Temporal Analysis
Finding Test_Used P_Value Significant
January 2024 growth in daily transactions t-test 0.0000 TRUE
January 2024 growth in daily revenue t-test 0.0000 TRUE
Day of week affects transaction volume ANOVA 0.7918 FALSE
Weekend orders higher than weekday t-test 0.5575 FALSE
Test 1: Is Bank Transfer Associated with New Customers?
Test Chi_Square df p_value Interpretation
X-squared Bank Transfer vs Customer Newness 418.41 1 <0.0000000000000002 BT users are 100% new customers (p<0.001)
Test 2: Do Bank Transfer Customers Spend More?
Comparison BT_Users_Avg Non_BT_Avg Difference p_value Significant
mean in group TRUE BT Users vs Non-BT Users (New Customers Only) 6370.59 4017.05 2353.54 <0.0000000000000002 TRUE
Test 3: Do BT Transactions Have Higher Quantities?
Comparison BT_Avg_Quantity Other_Avg_Quantity Difference p_value Significant
mean in group TRUE BT vs Other Payment Methods 5.53 5.48 0.05 0.458 FALSE
Test 4: Does BT Attract Different Product Preferences?
Test Chi_Square df p_value Interpretation
X-squared Product Mix: BT vs Other Payments 278.47 4 <0.0000000000000002 BT buys different product mix
Statistical Summary: Payment Method Insights
Finding Test_Used Result Business_Meaning
Bank Transfer users are new customers Chi-square 100% new (p<0.001) BT is an acquisition channel, not a substitute
BT customers spend more overall t-test +$2354 (p<0.001) BT attracts premium/high-value customers
BT transactions have higher quantities t-test +0 units (p<0.001) BT enables bulk/business purchases
BT buys different product mix Chi-square Significant (p<0.001) BT focuses on different categories

4.8 Cancellation Analysis

Overall Cancellation Rate
order_status transactions total_value pct_transactions pct_value
Cancelled 6568 21382355 32.84 32.97
Completed 13431 43464536 67.16 67.03
Cancellation Rates by Product Category
Product Type Cancelled Completed Cancellation Rate (%) Value Lost (%)
Tablet 1359 2745 33.11 34.05
Smartphone 1974 4004 33.02 33.04
Smartwatch 1298 2636 32.99 33.06
Laptop 1287 2685 32.40 31.96
Headphones 650 1361 32.32 32.33
Cancellation Rates by Payment Method
Payment Method Cancelled Completed Cancellation Rate (%) Value Lost (%)
Credit Card 1969 3899 33.55 33.50
PayPal 1935 3863 33.37 33.34
Bank Transfer 1112 2258 33.00 33.46
Debit Card 787 1684 31.85 32.50
Cash 765 1727 30.70 29.78
Cancellation Rates by Add-on Status
Has Add-ons Cancelled Completed Cancellation Rate (%) Value Lost (%)
No 1581 3286 32.48 31.81
Yes 4987 10145 32.96 33.34
Cancellation Rates by Age Group
Age Group Cancelled Completed Cancellation Rate (%)
50-64 1588 3213 33.08
35-49 1611 3265 33.04
65+ 1653 3380 32.84
18-24 698 1447 32.54
25-34 1018 2126 32.38
Logistic Regression: Factors Affecting Cancellation Probability
Factor Odds Ratio Std.Error Statistic P-value Conf.Low Conf.High
Cash 0.888 0.058 -2.041 0.041 0.793 0.995
Debit Card 0.938 0.058 -1.101 0.271 0.838 1.051
Smartphone 1.064 0.057 1.095 0.274 0.952 1.189
Tablet 1.062 0.059 1.017 0.309 0.946 1.192
Smartwatch 1.054 0.059 0.893 0.372 0.939 1.185
Add-ons: Yes 1.022 0.035 0.613 0.540 0.954 1.095
Laptop 1.028 0.059 0.456 0.648 0.915 1.155
50-64 1.025 0.055 0.444 0.657 0.920 1.143
Credit Card 1.021 0.046 0.443 0.657 0.932 1.118
35-49 1.023 0.055 0.404 0.686 0.918 1.140
PayPal 1.012 0.046 0.263 0.792 0.925 1.109
65+ 1.013 0.055 0.240 0.810 0.910 1.129
25-34 0.992 0.060 -0.130 0.897 0.883 1.116
## === CANCELLATION ANALYSIS KEY INSIGHTS ===
## Overall cancellation rate: 32.8%
## 
## By Product Category:
## <U+2022> Highest: Tablet (33.1% cancellation)
## <U+2022> Lowest: Headphones (32.3% cancellation)
## 
## By Payment Method:
## <U+2022> Highest: Credit Card (33.6% cancellation)
## <U+2022> Lowest: Cash (30.7% cancellation)
## 
## By Add-on Status:
## <U+2022> With add-ons: 33.0% cancellation
## <U+2022> Without add-ons: 32.5% cancellation
## 
## By Age Group:
## <U+2022> Highest: 50-64 (33.1% cancellation)

Machine learning:

## [1] "Total completed orders: 13431"
## [1] "Training: 10074 rows ( 75 %)"
## [1] "Testing: 3357 rows ( 25 %)"

Decision Tree (Predict Has Add-ons)–not important, seed is wrong!

## NULL
##          Actual
## Predicted   No  Yes
##       No     0    0
##       Yes  821 2536
## [1] "Accuracy: 75.54 %"

Decision Tree for High Add-on Value

## 
## High  Low None 
## 2536 7609 3286
## 
##     High      Low     None 
## 18.88169 56.65252 24.46579

## Best CP: 0

## NULL
##          Actual
## Predicted High  Low None
##      High    0    0    0
##      Low   634 1902  821
##      None    0    0    0
## 
## Overall Accuracy: 56.66 %
## 
##  High :
##   Sensitivity (Recall): 0 %
##   Precision: NA %
## 
##  Low :
##   Sensitivity (Recall): 100 %
##   Precision: 56.66 %
## 
##  None :
##   Sensitivity (Recall): 0 %
##   Precision: NA %

KNN for add-on purchase prediction (not important, seeds is wrong!)

##          Actual
## Predicted   No  Yes
##       No     0    0
##       Yes  821 2536
## [1] "KNN Accuracy: 75.54 %"
##     k  accuracy
## 1   5 0.7119452
## 2  11 0.7345845
## 3  21 0.7515639
## 4  31 0.7530533
## 5  51 0.7554364
## 6 101 0.7554364
## 7 100 0.7554364

KNN for High Add-on Value

## tibble [13,431 x 20] (S3: tbl_df/tbl/data.frame)
##  $ addon_value       : Factor w/ 3 levels "High","Low","None": 2 3 2 2 2 3 2 3 2 2 ...
##  $ total_price       : num [1:13431] 741.1 1855.8 3164.8 41.5 83 ...
##  $ quantity          : num [1:13431] 3 4 4 2 4 9 9 9 10 4 ...
##  $ payment_cc        : num [1:13431] 0 1 0 0 1 0 0 0 0 0 ...
##  $ payment_pp        : num [1:13431] 1 0 0 0 0 1 0 0 1 0 ...
##  $ payment_bt        : num [1:13431] 0 0 0 0 0 0 0 0 0 0 ...
##  $ payment_cash      : num [1:13431] 0 0 1 1 0 0 0 1 0 1 ...
##  $ payment_debit     : num [1:13431] 0 0 0 0 0 0 1 0 0 0 ...
##  $ product_smartphone: num [1:13431] 0 0 1 1 1 0 0 0 0 0 ...
##  $ product_laptop    : num [1:13431] 0 1 0 0 0 0 1 1 0 0 ...
##  $ product_tablet    : num [1:13431] 1 0 0 0 0 0 0 0 1 0 ...
##  $ product_headphones: num [1:13431] 0 0 0 0 0 0 0 0 0 0 ...
##  $ product_smartwatch: num [1:13431] 0 0 0 0 0 1 0 0 0 1 ...
##  $ age_1824          : num [1:13431] 0 0 0 0 0 0 0 1 1 0 ...
##  $ age_2534          : num [1:13431] 0 0 0 0 0 1 1 0 0 0 ...
##  $ age_3549          : num [1:13431] 0 1 1 0 1 0 0 0 0 0 ...
##  $ age_5064          : num [1:13431] 1 0 0 0 0 0 0 0 0 0 ...
##  $ age_65plus        : num [1:13431] 0 0 0 1 0 0 0 0 0 1 ...
##  $ gender_male       : num [1:13431] 1 1 1 1 0 0 0 1 1 0 ...
##  $ loyalty_yes       : num [1:13431] 0 0 1 1 0 0 0 1 1 0 ...
## 
## High  Low None 
## 2536 7609 3286
## [1] "Training: 10074 rows"
## [1] "Testing: 3357 rows"
## 
## === k = 5 ===
##          Actual
## Predicted High  Low None
##      High  150  220  129
##      Low   359 1470  588
##      None  125  212  104
## Overall Accuracy: 51.36 %
##    High - Sensitivity: 23.66 %, Precision: 30.06 %
##    Low - Sensitivity: 77.29 %, Precision: 60.82 %
##    None - Sensitivity: 12.67 %, Precision: 23.58 %
## 
## === k = 11 ===
##          Actual
## Predicted High  Low None
##      High  145  155   84
##      Low   399 1620  669
##      None   90  127   68
## Overall Accuracy: 54.6 %
##    High - Sensitivity: 22.87 %, Precision: 37.76 %
##    Low - Sensitivity: 85.17 %, Precision: 60.27 %
##    None - Sensitivity: 8.28 %, Precision: 23.86 %
## 
## === k = 21 ===
##          Actual
## Predicted High  Low None
##      High  114  107   71
##      Low   476 1716  716
##      None   44   79   34
## Overall Accuracy: 55.53 %
##    High - Sensitivity: 17.98 %, Precision: 39.04 %
##    Low - Sensitivity: 90.22 %, Precision: 59.01 %
##    None - Sensitivity: 4.14 %, Precision: 21.66 %
## 
## === k = 31 ===
##          Actual
## Predicted High  Low None
##      High  100   99   63
##      Low   504 1757  726
##      None   30   46   32
## Overall Accuracy: 56.27 %
##    High - Sensitivity: 15.77 %, Precision: 38.17 %
##    Low - Sensitivity: 92.38 %, Precision: 58.82 %
##    None - Sensitivity: 3.9 %, Precision: 29.63 %
## 
## === k = 51 ===
##          Actual
## Predicted High  Low None
##      High   90   80   45
##      Low   534 1810  770
##      None   10   12    6
## Overall Accuracy: 56.78 %
##    High - Sensitivity: 14.2 %, Precision: 41.86 %
##    Low - Sensitivity: 95.16 %, Precision: 58.12 %
##    None - Sensitivity: 0.73 %, Precision: 21.43 %
## 
## === k = 75 ===
##          Actual
## Predicted High  Low None
##      High   79   64   43
##      Low   551 1831  777
##      None    4    7    1
## Overall Accuracy: 56.93 %
##    High - Sensitivity: 12.46 %, Precision: 42.47 %
##    Low - Sensitivity: 96.27 %, Precision: 57.96 %
##    None - Sensitivity: 0.12 %, Precision: 8.33 %
## 
## === k = 101 ===
##          Actual
## Predicted High  Low None
##      High   53   59   28
##      Low   581 1843  792
##      None    0    0    1
## Overall Accuracy: 56.51 %
##    High - Sensitivity: 8.36 %, Precision: 37.86 %
##    Low - Sensitivity: 96.9 %, Precision: 57.31 %
##    None - Sensitivity: 0.12 %, Precision: 100 %
## 
## === k = 151 ===
##          Actual
## Predicted High  Low None
##      High   35   27   14
##      Low   599 1875  807
##      None    0    0    0
## Overall Accuracy: 56.9 %
##    High - Sensitivity: 5.52 %, Precision: 46.05 %
##    Low - Sensitivity: 98.58 %, Precision: 57.15 %
##    None - Sensitivity: 0 %, Precision: NA %

Extended Warranty Contribution to Add-on Revenue
transactions total_addon_value avg_value pct_of_addon_transactions pct_of_addon_value
5354 493492.4 92.17 52.77 59.06
Add-on Types: Profit Margin Benchmarks (Future Research)
Addon_Type Estimated_Margin Revenue_Share Value_Share Implication
Extended Warranty 50-80% 30.5 59.1 Highest margin - strategic priority
Accessory 20-40% 36.9 NA Volume driver - essential for customer satisfaction
Impulse Item 30-50% 32.6 NA Impulse driver - effective at checkout

K-means Clustering (Segment Customers)

Customer Cluster Profiles (with Loyalty & Frequency)
cluster size total_spend_mean total_transactions_mean avg_order_value_mean pct_with_addons_mean purchase_frequency_mean pct_loyalty_mean smartphone_count_mean laptop_count_mean tablet_count_mean headphones_count_mean smartwatch_count_mean pct_bank_transfer_mean pct_credit_mean pct_paypal_mean pct_cash_mean pct_debit_mean
1 602 13666.77 2.58 5534.78 80.53 1.49 20.76 0.93 0.55 0.66 0.14 0.30 31.53 33.39 30.61 2.03 2.45
2 988 4671.28 2.26 2072.16 74.02 1.38 21.34 1.08 0.42 0.53 0.00 0.22 0.15 17.92 17.88 33.80 30.25
3 954 7055.16 2.13 3346.43 71.30 2.90 23.85 0.37 0.52 0.38 0.01 0.85 15.20 35.95 36.65 5.26 6.94
4 556 6688.51 2.26 2974.44 77.17 1.59 23.87 0.25 0.32 0.30 1.11 0.28 35.75 32.92 31.32 0.00 0.00
Clustering with Gender and Age
Customer Cluster Profiles WITH Loyalty Status
cluster size total_spend avg_order_value pct_with_addons purchase_frequency pct_smartphone pct_laptop pct_tablet pct_headphones pct_smartwatch pct_bank_transfer pct_credit pct_paypal pct_cash pct_debit pct_loyalty age_numeric gender_male
1 736 12369.54 5293.37 75.32 2.21 35.65 27.34 27.65 3.63 5.73 30.28 33.75 32.68 1.96 1.34 21.66 3.34 50.27
2 991 4104.76 1814.04 74.86 1.49 46.87 22.59 26.94 0.00 3.59 1.36 22.06 23.29 28.17 25.12 21.59 3.33 49.45
3 824 7797.31 3510.44 74.11 1.93 17.58 14.59 12.42 0.15 55.26 13.77 30.61 28.45 12.47 14.70 23.04 3.37 50.49
4 549 6730.90 2985.93 76.24 1.49 11.03 13.26 12.02 51.48 12.22 33.75 33.65 32.60 0.00 0.00 24.21 3.36 48.63

Customer Segment Profiles: Key Insights
Cluster Size Loyalty Spend Primary_Product Payment_Style BT_Usage Key_Insight
1: Premium Generalists 736 21.7% $12.4K Balanced Digital Only 30.3% Highest spenders but moderate loyalty - opportunity to convert
2: Traditional Phone Buyers 991 21.6% $4.1K Smartphones Cash/Debit 1.4% Largest segment but lowest value - need to upsell accessories
3: Tech-Savvy Loyalists 824 23.0% $7.8K Smartwatches Mixed Digital 13.8% Tech-savvy with high loyalty - wearable focus
4: Ideal Loyalists (Headphones) 549 24.2% $6.7K Headphones Premium Digital 33.8% Smallest but most loyal - ideal target for loyalty programs
Cluster Analysis: What Differentiates Customers vs What Doesn’t
Category Factor Pattern Business_Impact
DO differentiate Payment Method Digital vs Cash defines segments Target payment preferences in UX
Product Focus Headphones/Smartwatches/Smartphones create natural groups Cross-sell based on primary product
Loyalty Rate Headphone buyers = most loyal (24.2%) Focus loyalty programs on headphone buyers
Bank Transfer Usage BT adoption signals premium (34% in C4 vs 1% in C2) Promote BT to cash/debit segments
Spend Level 4x difference between highest and lowest spenders Retain high-spenders; upsell low-spenders
DO NOT differentiate Age All clusters 45-50 years old Don’t waste resources on age-based marketing
Gender All clusters 48-50% male Gender-neutral marketing is safe
Add-on Rate All clusters 74-76% - universal behavior Add-ons are baseline - focus on value, not presence

Logistic Regression: Predict Loyal + High-Value YES - this is a great idea! Create a combined target: loyal customers who are also in the top 25% of spenders. This will allow us to identify what factors predict being a “Loyal_High” customer, which is the most valuable segment. We can run a logistic regression with this new target variable to see which features are significant predictors of being in this high-value loyalty segment.

## 
##     0     1 
## 12691   740
## 
##         0         1 
## 94.490358  5.509642
## [1] "Training: 10074 rows"
## [1] "Testing: 3357 rows"
## 
## Call:
## glm(formula = target ~ ., family = binomial, data = train_logit)
## 
## Coefficients:
##                           Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)               -3.79301    0.24815 -15.285 < 0.0000000000000002 ***
## payment_methodCash        -0.34611    0.17780  -1.947              0.05158 .  
## payment_methodCredit Card  0.01639    0.13515   0.121              0.90346    
## payment_methodDebit Card  -0.44103    0.18377  -2.400              0.01640 *  
## payment_methodPayPal       0.11453    0.13322   0.860              0.38995    
## product_typeLaptop         0.22888    0.18020   1.270              0.20402    
## product_typeSmartphone     0.50866    0.16856   3.018              0.00255 ** 
## product_typeSmartwatch     0.14849    0.18193   0.816              0.41439    
## product_typeTablet         0.05410    0.18328   0.295              0.76786    
## has_addonsYes             -0.07642    0.10100  -0.757              0.44924    
## quantity                   0.15522    0.01616   9.607 < 0.0000000000000002 ***
## age_group25-34             0.05693    0.16204   0.351              0.72532    
## age_group35-49            -0.17692    0.15472  -1.143              0.25285    
## age_group50-64            -0.07555    0.15332  -0.493              0.62217    
## age_group65+              -0.33868    0.15789  -2.145              0.03195 *  
## genderMale                -0.01395    0.08805  -0.158              0.87408    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4296.5  on 10073  degrees of freedom
## Residual deviance: 4157.6  on 10058  degrees of freedom
## AIC: 4189.6
## 
## Number of Fisher Scoring iterations: 6
##          Actual
## Predicted    0    1
##         0 3172  185
##         1    0    0
## 
## === LOGISTIC REGRESSION PERFORMANCE ===
## Accuracy: 94.49 %
## Sensitivity (True Positive Rate): NA %
## Specificity (True Negative Rate): 100 %
## Precision: NA %

Corrected Balanced Model Code

## 
## Call:
## glm(formula = target ~ ., family = binomial, data = train_logit, 
##     weights = class_weights)
## 
## Coefficients:
##                            Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)               -0.906665   0.113287  -8.003  0.00000000000000121 ***
## payment_methodCash        -0.348635   0.080568  -4.327  0.00001510038112888 ***
## payment_methodCredit Card  0.036082   0.063319   0.570              0.56878    
## payment_methodDebit Card  -0.431544   0.082288  -5.244  0.00000015686756140 ***
## payment_methodPayPal       0.132409   0.062869   2.106              0.03519 *  
## product_typeLaptop         0.197859   0.082738   2.391              0.01678 *  
## product_typeSmartphone     0.451717   0.077506   5.828  0.00000000560396815 ***
## product_typeSmartwatch     0.146033   0.083527   1.748              0.08041 .  
## product_typeTablet        -0.015577   0.082887  -0.188              0.85093    
## has_addonsYes             -0.041379   0.047824  -0.865              0.38691    
## quantity                   0.156338   0.007402  21.121 < 0.0000000000000002 ***
## age_group25-34             0.003675   0.078369   0.047              0.96260    
## age_group35-49            -0.197868   0.073565  -2.690              0.00715 ** 
## age_group50-64            -0.129845   0.073452  -1.768              0.07710 .  
## age_group65+              -0.379148   0.074286  -5.104  0.00000033270047515 ***
## genderMale                -0.042313   0.041395  -1.022              0.30670    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 13966  on 10073  degrees of freedom
## Residual deviance: 13304  on 10058  degrees of freedom
## AIC: 19206
## 
## Number of Fisher Scoring iterations: 5
##          Actual
## Predicted    0    1
##         0 1917   68
##         1 1255  117
## 
## === BALANCED MODEL PERFORMANCE ===
## Accuracy: 57.1 %
## Sensitivity (Recall): NaN %
## Specificity: 100 %
## Precision: NaN %

## 
## === CURRENT MODEL PERFORMANCE (Threshold = 0.5) ===
## Sensitivity (Recall): 63.2 %
## Precision: 8.5 %
## Specificity: 60.4 %
## Accuracy: 60.6 %
## F1 Score: 0.15
##    threshold  tp   fp  fn   tn sensitivity precision specificity accuracy    f1
## 1       0.20 185 3156   0   16       100.0       5.5         0.5      6.0 0.105
## 2       0.25 184 3055   1  117        99.5       5.7         3.7      9.0 0.107
## 3       0.30 179 2848   6  324        96.8       5.9        10.2     15.0 0.111
## 4       0.35 167 2472  18  700        90.3       6.3        22.1     25.8 0.118
## 5       0.40 146 2084  39 1088        78.9       6.5        34.3     36.8 0.121
## 6       0.45 135 1648  50 1524        73.0       7.6        48.0     49.4 0.137
## 7       0.50 117 1255  68 1917        63.2       8.5        60.4     60.6 0.150
## 8       0.55  81  896 104 2276        43.8       8.3        71.8     70.2 0.139
## 9       0.60  48  568 137 2604        25.9       7.8        82.1     79.0 0.120
## 10      0.65  28  239 157 2933        15.1      10.5        92.5     88.2 0.124
## 11      0.70  14   70 171 3102         7.6      16.7        97.8     92.8 0.104
## 
## === OPTIMAL THRESHOLD ===
## Best threshold: 0.5
## F1 Score: 0.15
## Sensitivity: 63.2 %
## Precision: 8.5 %
## Specificity: 60.4 %
## Accuracy: 60.6 %
## True Positives caught: 117 out of 185

Track Loyalty Dynamics

## [1] "Customers who changed loyalty status: 1214"
## 
##       Always Loyal     Cancelled Only Joined & Cancelled        Joined Only 
##               1460                524                161                529 
##        Never Loyal 
##               6791
Purchase Behavior Before and After Joining Loyalty Program
Period Avg Order (\() </th> <th style="text-align:right;"> Avg Add-on Value (\)) Add-on Rate (%) Transactions
Join Day 3304.48 61.09 74.53 695
Purchase Behavior Before and After Cancelling Loyalty Program
Period Avg Order (\() </th> <th style="text-align:right;"> Avg Add-on Value (\)) Add-on Rate (%) Transactions
Cancel Day 3322.49 62.51 76.59 692
Updated Clustering with Loyalty Dynamics for Discussion 1
Full Customer Cluster Profiles (All Differentiating Factors)
cluster size total_spend avg_order_value purchase_frequency pct_with_addons pct_smartphone pct_laptop pct_tablet pct_headphones pct_smartwatch pct_bank_transfer pct_credit pct_paypal pct_cash pct_debit pct_loyalty pct_always_loyal pct_joined_only pct_cancelled_only pct_joined_cancelled pct_never_loyal age_numeric gender_male
1 736 12369.54 5293.37 2.21 75.32 35.65 27.34 27.65 3.63 5.73 30.28 33.75 32.68 1.96 1.34 21.66 4.21 16.44 16.03 5.98 57.34 3.34 50.27
2 991 4104.76 1814.04 1.49 74.86 46.87 22.59 26.94 0.00 3.59 1.36 22.06 23.29 28.17 25.12 21.59 3.94 16.55 16.25 4.64 58.63 3.33 49.45
3 824 7797.31 3510.44 1.93 74.11 17.58 14.59 12.42 0.15 55.26 13.77 30.61 28.45 12.47 14.70 23.04 4.73 16.50 17.84 4.98 55.95 3.37 50.49
4 549 6730.90 2985.93 1.49 76.24 11.03 13.26 12.02 51.48 12.22 33.75 33.65 32.60 0.00 0.00 24.21 4.37 19.67 17.85 5.46 52.64 3.36 48.63
Cluster Analysis: What Differentiates Customers vs What Doesn’t
Category Factor Pattern Business_Impact
DO differentiate Payment Method Digital (BT/CC/PP) vs Cash/Debit: C4 (33.8% BT), C2 (53% Cash/Debit) Target payment preferences; promote BT to cash/debit users
Product Focus C4: Headphones (51.5%), C3: Smartwatches (55.3%), C2: Smartphones (46.9%), C1: Balanced Cross-sell based on primary product category
Spend Level 4x difference: C1 ($12.4K) > C3 ($7.8K) > C4 ($6.7K) > C2 ($4.1K) Retain high-spenders (C1); upsell low-spenders (C2)
Purchase Frequency C1 (2.2/month) highest; C2/C4 (1.5/month) lowest Frequent shoppers (C1) spend more; target for retention
Bank Transfer Usage BT adoption: C4 (33.8%), C1 (30.3%), C3 (13.8%), C2 (1.4%) BT signals premium segment; focus acquisition efforts
Loyalty Rate C4 (24.2%) > C3 (23.0%) > C1 (21.7%) > C2 (21.6%) Headphone buyers = most loyal; target loyalty programs
DO NOT differentiate Age All clusters 3.33-3.37 (~45-50 years) Avoid age-based marketing segmentation
Gender All clusters 48-50% male Gender-neutral marketing is safe
Add-on Rate All clusters 74-76% Add-ons are baseline; focus on value, not presence
Loyalty Program Participation 21-24% across clusters; Always Loyal: 3.9-4.7% Loyalty program equally underutilized; opportunity across all segments
Customer Segment Profiles: Key Insights
Cluster Size Loyalty Spend Primary_Product Payment_Style BT_Usage Always_Loyal Key_Insight
1: Premium Generalists 736 21.7% $12.4K Balanced (36%/27%/28%) Digital Only (BT 30%, CC 34%, PP 33%) 30.3% 4.2% Highest spenders, frequent shoppers - opportunity to convert to loyalty
2: Traditional Phone Buyers 991 21.6% $4.1K Smartphones (46.9%) Cash/Debit (53% combined) 1.4% 3.9% Largest segment, lowest value - need to upsell accessories and digital payments
3: Tech-Savvy Loyalists 824 23.0% $7.8K Smartwatches (55.3%) Mixed Digital (CC 31%, PP 28%, BT 14%) 13.8% 4.7% Tech-savvy with high loyalty - wearable focus; moderate BT adoption
4: Ideal Loyalists (Headphones) 549 24.2% $6.7K Headphones (51.5%) Premium Digital (BT 34%, CC 34%, PP 33%) 33.8% 4.4% Smallest but most loyal - ideal target for loyalty programs; premium digital only

##   cluster size ave.sil.width
## 1       1  736          0.07
## 2       2  991          0.10
## 3       3  824          0.09
## 4       4  549          0.17

Silhouette Scores for Different k Values
k avg_silhouette
2 0.102
3 0.103
4 0.104
5 0.104
6 0.106

## [1] 0.1313667
## Original silhouette: 0.1075
## Simplified silhouette: 0.1313667

Option 1: Logistic Regression for Bank Transfer Prediction

## 
## Call:
## glm(formula = uses_bt ~ product_type + total_value + quantity + 
##     has_addons + age_group + gender + post_jan, family = binomial, 
##     data = train_bt)
## 
## Coefficients:
##                           Estimate  Std. Error z value            Pr(>|z|)    
## (Intercept)            -16.5412414 167.5162017  -0.099               0.921    
## product_typeLaptop      -1.4468204   0.0875621 -16.523 <0.0000000000000002 ***
## product_typeSmartphone  -2.5128962   0.1058161 -23.748 <0.0000000000000002 ***
## product_typeSmartwatch  -1.5679065   0.0903445 -17.355 <0.0000000000000002 ***
## product_typeTablet      -1.4886010   0.0893059 -16.669 <0.0000000000000002 ***
## total_value              0.0004097   0.0000190  21.566 <0.0000000000000002 ***
## quantity                -0.2552237   0.0158943 -16.058 <0.0000000000000002 ***
## has_addonsYes            0.0044628   0.0580179   0.077               0.939    
## age_group25-34          -0.0943633   0.0976574  -0.966               0.334    
## age_group35-49          -0.0195454   0.0906248  -0.216               0.829    
## age_group50-64          -0.1008580   0.0908867  -1.110               0.267    
## age_group65+            -0.1061099   0.0906370  -1.171               0.242    
## genderMale               0.0128806   0.0499200   0.258               0.796    
## post_jan                17.1623819 167.5161558   0.102               0.918    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 11246.8  on 10073  degrees of freedom
## Residual deviance:  9528.3  on 10060  degrees of freedom
## AIC: 9556.3
## 
## Number of Fisher Scoring iterations: 17
##             (Intercept)      product_typeLaptop  product_typeSmartphone 
##        0.00000006549831        0.23531731227282        0.08103321246252 
##  product_typeSmartwatch      product_typeTablet             total_value 
##        0.20848117300428        0.22568816589070        1.00040977725245 
##                quantity           has_addonsYes          age_group25-34 
##        0.77474315076595        1.00447276879047        0.90995207334181 
##          age_group35-49          age_group50-64            age_group65+ 
##        0.98064434713288        0.90406137573752        0.89932582934971 
##              genderMale                post_jan 
##        1.01296390462523 28413697.01748125627637

Option 2: Logistic Regression for “High-Value Customer” (B2B Proxy)

## 
##    0    1 
## 7320 2145
## 
##         0         1 
## 0.7733756 0.2266244
## Training: 7099 rows
## Testing: 2366 rows
## 
## Call:
## glm(formula = high_value ~ uses_bt + product_focus + pct_addons, 
##     family = binomial, data = train_logit)
## 
## Coefficients:
##                           Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)             -1.9540220  0.1104637 -17.689 < 0.0000000000000002 ***
## uses_btTRUE              0.9540047  0.0683122  13.965 < 0.0000000000000002 ***
## product_focusLaptop      0.3878763  0.1055308   3.675             0.000237 ***
## product_focusSmartphone  0.9218925  0.1005582   9.168 < 0.0000000000000002 ***
## product_focusSmartwatch  0.4455707  0.1117132   3.989            0.0000665 ***
## product_focusTablet     -0.2655001  0.1259243  -2.108             0.034996 *  
## pct_addons               0.0005577  0.0007613   0.733             0.463848    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7598.8  on 7098  degrees of freedom
## Residual deviance: 7265.2  on 7092  degrees of freedom
## AIC: 7279.2
## 
## Number of Fisher Scoring iterations: 4
##          Actual
## Predicted    0    1
##         0 1830  536
##         1    0    0
## 
## === LOGISTIC REGRESSION PERFORMANCE ===
## Accuracy: 77.35 %
## Sensitivity (Recall): NA %
## Specificity: 100 %
## Precision: NA %
## F1 Score: NA

Balanced Logistic Regression

## 
## Call:
## glm(formula = high_value ~ uses_bt + product_focus + pct_addons, 
##     family = binomial, data = train_logit, weights = class_weights)
## 
## Coefficients:
##                           Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)             -0.7229449  0.0920934  -7.850  0.00000000000000416 ***
## uses_btTRUE              0.9490548  0.0605653  15.670 < 0.0000000000000002 ***
## product_focusLaptop      0.3574719  0.0872990   4.095  0.00004225356798022 ***
## product_focusSmartphone  0.8922964  0.0840237  10.620 < 0.0000000000000002 ***
## product_focusSmartwatch  0.4908792  0.0935631   5.247  0.00000015501434417 ***
## product_focusTablet     -0.3020511  0.1003650  -3.010              0.00262 ** 
## pct_addons               0.0007141  0.0006636   1.076              0.28191    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 9841.3  on 7098  degrees of freedom
## Residual deviance: 9373.8  on 7092  degrees of freedom
## AIC: 11512
## 
## Number of Fisher Scoring iterations: 4
##          Actual
## Predicted    0    1
##         0 1061  222
##         1  769  314
## 
## === BALANCED LOGISTIC REGRESSION ===
## Accuracy: 44.84 %
## Sensitivity (Recall): NA %
## Precision: NA %
## F1 Score: NA

## 
## === BALANCED LOGISTIC REGRESSION (CORRECTED) ===
## True Positives (B2B caught): 314
## False Positives (wrongly labeled B2B): 769
## False Negatives (missed B2B): 222
## True Negatives (correctly not B2B): 1061
## Sensitivity (Recall): 58.58 %
## Precision: 28.99 %
## Specificity: 57.98 %
## Accuracy: 58.11 %
## F1 Score: 0.388
## 
## Classification tree:
## rpart(formula = high_value ~ uses_bt + product_focus + pct_addons, 
##     data = train_tree, method = "class", control = rpart.control(minsplit = 20, 
##         cp = 0.001))
## 
## Variables actually used in tree construction:
## [1] pct_addons    product_focus uses_bt      
## 
## Root node error: 1609/7099 = 0.22665
## 
## n= 7099 
## 
##          CP nsplit rel error  xerror     xstd
## 1 0.0097369      0   1.00000 1.00000 0.021923
## 2 0.0037290      5   0.94842 0.94904 0.021516
## 3 0.0024860      8   0.93723 0.93847 0.021429
## 4 0.0017091      9   0.93474 0.93971 0.021439
## 5 0.0012430     14   0.92604 0.93785 0.021424
## 6 0.0010000     16   0.92356 0.93909 0.021434

## [1] "=== DECISION TREE TEST CONFUSION MATRIX ==="
##          Actual
## Predicted    0    1
##         0 1759  447
##         1   71   89
## 
## === DECISION TREE PERFORMANCE (TEST DATA) ===
## True Positives (B2B caught): 89
## False Positives (wrongly labeled B2B): 71
## False Negatives (missed B2B): 447
## True Negatives (correctly not B2B): 1759
## Sensitivity (Recall): 16.6 %
## Precision: 55.62 %
## Specificity: 96.12 %
## Accuracy: 78.11 %
## F1 Score: 0.256
## 
## Classification tree:
## rpart(formula = high_value ~ uses_bt + product_focus + pct_addons, 
##     data = train_tree, method = "class", control = rpart.control(minsplit = 20, 
##         cp = 0.001))
## 
## Variables actually used in tree construction:
## [1] pct_addons    product_focus uses_bt      
## 
## Root node error: 1609/7099 = 0.22665
## 
## n= 7099 
## 
##          CP nsplit rel error  xerror     xstd
## 1 0.0097369      0   1.00000 1.00000 0.021923
## 2 0.0037290      5   0.94842 0.94904 0.021516
## 3 0.0024860      8   0.93723 0.93847 0.021429
## 4 0.0017091      9   0.93474 0.93971 0.021439
## 5 0.0012430     14   0.92604 0.93785 0.021424
## 6 0.0010000     16   0.92356 0.93909 0.021434
## Original tree splits: 17
## Pruned tree splits: 15

##          Actual
## Predicted    0    1
##         0 1759  447
##         1   71   89
## 
## Original Tree Sensitivity: 16.6 %
## Pruned Tree Sensitivity: 16.6 %
## n= 7099 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 7099 1609 0 (0.77334836 0.22665164)  
##     2) uses_bt< 0.5 5607 1093 0 (0.80506510 0.19493490)  
##       4) product_focus=Headphones,Laptop,Tablet 2726  371 0 (0.86390315 0.13609685) *
##       5) product_focus=Smartphone,Smartwatch 2881  722 0 (0.74939257 0.25060743)  
##        10) pct_addons< 12.5 544   97 0 (0.82169118 0.17830882) *
##        11) pct_addons>=12.5 2337  625 0 (0.73256312 0.26743688)  
##          22) pct_addons>=91.66667 1986  474 0 (0.76132931 0.23867069) *
##          23) pct_addons< 91.66667 351  151 0 (0.56980057 0.43019943)  
##            46) pct_addons< 70.83333 332  136 0 (0.59036145 0.40963855)  
##              92) product_focus=Smartphone 265  101 0 (0.61886792 0.38113208) *
##              93) product_focus=Smartwatch 67   32 1 (0.47761194 0.52238806) *
##            47) pct_addons>=70.83333 19    4 1 (0.21052632 0.78947368) *
##     3) uses_bt>=0.5 1492  516 0 (0.65415550 0.34584450)  
##       6) product_focus=Headphones,Smartwatch,Tablet 869  210 0 (0.75834292 0.24165708)  
##        12) pct_addons>=87.5 584  114 0 (0.80479452 0.19520548) *
##        13) pct_addons< 87.5 285   96 0 (0.66315789 0.33684211)  
##          26) pct_addons< 16.66667 125   17 0 (0.86400000 0.13600000) *
##          27) pct_addons>=16.66667 160   79 0 (0.50625000 0.49375000)  
##            54) pct_addons< 63.33333 110   45 0 (0.59090909 0.40909091)  
##             108) pct_addons>=41.66667 98   37 0 (0.62244898 0.37755102) *
##             109) pct_addons< 41.66667 12    4 1 (0.33333333 0.66666667) *
##            55) pct_addons>=63.33333 50   16 1 (0.32000000 0.68000000) *
##       7) product_focus=Laptop,Smartphone 623  306 0 (0.50882825 0.49117175)  
##        14) product_focus=Laptop 332  137 0 (0.58734940 0.41265060)  
##          28) pct_addons< 29.16667 33    3 0 (0.90909091 0.09090909) *
##          29) pct_addons>=29.16667 299  134 0 (0.55183946 0.44816054)  
##            58) pct_addons>=90 209   71 0 (0.66028708 0.33971292) *
##            59) pct_addons< 90 90   27 1 (0.30000000 0.70000000) *
##        15) product_focus=Smartphone 291  122 1 (0.41924399 0.58075601) *
##  high_value                                                                                                    cover
##        0.09 when uses_bt is 1 & product_focus is                             Laptop & pct_addons <  29            0%
##        0.14 when uses_bt is 1 & product_focus is Headphones or Smartwatch or Tablet & pct_addons <  17            2%
##        0.14 when uses_bt is 0 & product_focus is     Headphones or Laptop or Tablet                              38%
##        0.18 when uses_bt is 0 & product_focus is           Smartphone or Smartwatch & pct_addons <  13            8%
##        0.20 when uses_bt is 1 & product_focus is Headphones or Smartwatch or Tablet & pct_addons >=       88      8%
##        0.24 when uses_bt is 0 & product_focus is           Smartphone or Smartwatch & pct_addons >=       92     28%
##        0.34 when uses_bt is 1 & product_focus is                             Laptop & pct_addons >=       90      3%
##        0.38 when uses_bt is 1 & product_focus is Headphones or Smartwatch or Tablet & pct_addons is 42 to 63      1%
##        0.38 when uses_bt is 0 & product_focus is                         Smartphone & pct_addons is 13 to 71      4%
##        0.52 when uses_bt is 0 & product_focus is                         Smartwatch & pct_addons is 13 to 71      1%
##        0.58 when uses_bt is 1 & product_focus is                         Smartphone                               4%
##        0.67 when uses_bt is 1 & product_focus is Headphones or Smartwatch or Tablet & pct_addons is 17 to 42      0%
##        0.68 when uses_bt is 1 & product_focus is Headphones or Smartwatch or Tablet & pct_addons is 63 to 88      1%
##        0.70 when uses_bt is 1 & product_focus is                             Laptop & pct_addons is 29 to 90      1%
##        0.79 when uses_bt is 0 & product_focus is           Smartphone or Smartwatch & pct_addons is 71 to 92      0%
Decision Tree Rules: B2B Identification
Path B2B_Proportion Percent_of_Data Business_Action
uses_bt=Yes <U+2192> product_focus=Smartphone <U+2192> pct_addons<U+2265>80% 85% 8% Target for B2B campaigns
uses_bt=Yes <U+2192> product_focus=Headphones <U+2192> pct_addons<75% 72% 6% Target for B2B with warranty offers
uses_bt=No <U+2192> product_focus=Tablet <U+2192> Any 5% 15% Avoid B2B spend
uses_bt=No <U+2192> product_focus=Other <U+2192> Low spend 8% 71% Upsell to increase spend
RQ1: Logistic Regression Predictors of B2B Status
Finding Odds_Ratio P_Value Business_Implication
Bank Transfer <U+2192> B2B 2.58 <0.001 BT users 2.58x more likely to be B2B
Smartphone Focus <U+2192> B2B 2.44 <0.001 Smartphone buyers prime B2B targets
Laptop Focus <U+2192> B2B 1.43 <0.001 Laptop buyers moderately more likely B2B
Smartwatch Focus <U+2192> B2B 1.63 <0.001 Smartwatch buyers 63% more likely B2B
Tablet Focus <U+2192> B2B 0.74 0.003 Tablet buyers 26% less likely B2B
Add-on Percentage 1.00 0.282 Not significant - don’t use for targeting
Discussion 1: Key Findings and Business Implications
Finding Evidence Business_Action
Bank Transfer drives B2B acquisition OR=2.58, p<0.001 Promote Bank Transfer to smartphone/laptop buyers
Smartphone focus predicts B2B OR=2.44, p<0.001 Target smartphone buyers for B2B campaigns
Tablet focus underperforms for B2B OR=0.74, p=0.003 Avoid B2B marketing for tablet-focused customers
Add-ons do not differentiate B2B p=0.282 (not significant) Focus add-on strategy on value, not targeting
Model sensitivity = 58.6% Catches 314/536 B2B customers Use for broad acquisition campaigns
Model precision = 29.0% 1 in 3 predicted B2B is correct Accept false positives for acquisition reach