Customer Purchase Behaviour Analytics: Evaluating Price Sensitivity & Competition in the Lubricants Market of TotalEnergies Marketing Nigeria PLC

Author

Victory Ishioma Onyenwosa

Published

May 26, 2026

1. Executive Summary

This study investigates customer purchase behaviour among lubricant distributors and resellers within the sales network of TotalEnergies Marketing Nigeria PLC. The central business problem is the growing instability in distributor purchasing patterns driven by competitive market pressure, price fluctuations, and shifting brand loyalties — all of which directly erode sales performance and customer retention in the Nigerian downstream lubricants sector.

Primary data were collected through a structured questionnaire administered to 102 lubricant distributors and resellers between January and March 2026. The dataset comprises 32 variables covering respondent demographics, price sensitivity, competitor influence, purchase frequency, brand loyalty, and switching behaviour, all measured on a 5-point Likert scale.

Five analytical techniques were applied: Exploratory Data Analysis (EDA), Data Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression. Key findings indicate that price sensitivity is a dominant driver of purchase decisions, with strong associations between competitor promotions, brand-switching tendencies, and reduced purchase volume. Respondents who actively compare prices and seek deals from competitors exhibit significantly higher brand-switching scores.

The study recommends that TotalEnergies Marketing Nigeria PLC strengthen its value proposition beyond price — through credit flexibility, distributor loyalty programmes, and targeted promotional incentives — to reduce competitor-induced churn and stabilise purchase volumes across the distributor network.

2. Professional Disclosure

Job Role and Organisation

I currently work as a Sales Representative within the lubricants business segment of TotalEnergies Marketing Nigeria PLC, a major player in the downstream oil and gas industry in Nigeria. My responsibilities include distributor relationship management, sales performance monitoring, market intelligence gathering, customer engagement, and sales reporting across assigned territories. This role provides direct operational exposure to the purchasing behaviour of distributors, retailers, workshops, and fleet operators, making the data collected highly relevant to real decisions taken in my day-to-day work.

Technique 1 — Exploratory Data Analysis (EDA)

EDA is directly relevant to my role because lubricant sales activities generate multi-dimensional customer data from dozens of distributors monthly. Before making any commercial decision — such as adjusting credit terms, redesigning promotional offers, or targeting specific distributor segments — a sales representative must first understand the shape of that data: who the customers are, how they distribute across purchase categories, and where anomalies or data gaps exist. EDA provides the foundation for all subsequent analysis.

Technique 2 — Data Visualisation

Visual communication of data is a core sales management skill. Weekly and monthly sales reviews at TotalEnergies require the ability to present distributor performance patterns, market share shifts, and pricing responses to non-technical managers and regional directors. Data visualisation translates raw numbers into stories — identifying which distributor segments are at risk, where competitor pressure is highest, and which territories show growth potential.

Technique 3 — Hypothesis Testing

Hypothesis testing is operationally valuable because commercial decisions in lubricants sales are often debated without statistical rigour. For instance, whether price increases significantly reduce purchase volumes across distributor types is a question that directly affects pricing strategy. Formal hypothesis testing replaces intuition with statistical evidence, enabling sales leadership to make more defensible decisions on promotions, credit terms, and competitive responses.

Technique 4 — Correlation Analysis

In lubricant distribution, many commercial factors — pricing, competitor activity, credit terms, and purchase frequency — interact simultaneously. Correlation analysis helps determine which of these factors move together and by how much. This is particularly useful for identifying which variables to prioritise in customer engagement: if competitor influence and brand switching are highly correlated, for example, the business should invest more in competitive intelligence and counter-promotion strategies.

Technique 5 — Linear Regression

Regression analysis supports evidence-based forecasting and targeted intervention by quantifying how much each factor contributes to price sensitivity outcomes. In a sales context, understanding that a one-unit increase in competitor influence leads to a measurable increase in price sensitivity allows the organisation to allocate resources — promotional budgets, relationship management effort, credit concessions — to the factors with the highest return on investment.

3. Data Collection & Sampling

Data Source

The primary dataset was collected through a structured, self-administered questionnaire designed specifically for this research. The questionnaire was distributed both physically (during field visits to distributors) and electronically (via WhatsApp and email) to active lubricant distributors and resellers within TotalEnergies Marketing Nigeria PLC’s assigned sales territory in Rivers State and surrounding areas.

Data Collection Method

All survey items used a 5-point Likert scale: 1 = Strongly Disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly Agree. The questionnaire covered five thematic blocks: (1) demographics and business profile, (2) price sensitivity behaviours, (3) competitor influence, (4) purchase frequency and volume patterns, and (5) brand-switching tendencies.

Sampling Frame

The sampling frame consisted of lubricant distributors and resellers actively purchasing TotalEnergies lubricant products within the assigned sales territories. Respondents were selected using convenience and purposive sampling — purposive in that only active purchasers with at least three months of trading history were eligible; convenience in that accessibility during field visits and responsiveness to electronic invitations determined inclusion.

Sample Size

A total of 102 valid responses were collected and used for analysis. This exceeds the minimum threshold of 100 observations specified for this case study. The sample size is also considered adequate for parametric statistical tests given the predominantly ordinal Likert-scale data and the analytical techniques applied.

Time Period Covered

Data collection was conducted between January 2026 and March 2026, covering an active trading quarter that captures post-holiday demand recovery, mid-quarter pricing cycles, and competitive market conditions typical of the Nigerian lubricants distribution environment.

Ethical Considerations

Participation was entirely voluntary. All respondents were informed of the academic purpose of the research prior to participation, and verbal consent was obtained in all cases. No personally identifiable information — such as names, phone numbers, or company identities — was collected. Business types and purchase values were captured only in broad categorical bands. All responses are anonymised and published only in aggregated analytical form. This study does not conflict with TotalEnergies’ data confidentiality policies, as no proprietary pricing structures, customer account records, or internal sales figures are disclosed.

4. Data Description

Code

# ── R setup ──────────────────────────────────────────────────────────────────
library(tidyverse)
library(readxl)
library(skimr)
library(corrplot)
library(ggcorrplot)
library(car)
library(lmtest)
library(knitr)
library(kableExtra)
library(patchwork)
library(scales)
library(RColorBrewer)

# Load data
df_raw <- read_excel("Lubricants_Distributor_Survey__Responses_.xlsx")

# Rename columns to short, clean names
col_names_short <- c(
  "Gender", "Age", "Years_Business", "Business_Type", "Purchase_Value_Cat",
  "PS1_reduce_volume",      # Price Sensitivity items
  "PS2_price_affects_buying",
  "PS3_compare_prices",
  "PS4_prefer_discounts",
  "PS5_small_increase_affects",
  "PS6_switch_when_high",
  "PS7_price_most_important",
  "CI1_consider_competitors",  # Competitor Influence items
  "CI2_multi_brand",
  "CI3_competitor_promos",
  "CI4_switched_past_year",
  "CI5_competitor_pricing_loyalty",
  "CI6_seek_better_deals",
  "CI7_availability_affects",
  "PV1_purchase_frequently",   # Purchase Volume / Behaviour items
  "PV2_volume_stable",
  "PV3_increase_with_demand",
  "PV4_planned_purchase",
  "PV5_prefer_bulk",
  "PV6_loyal_good_service",
  "PV7_influenced_by_demand",
  "BS1_switch_better_price",   # Brand Switching items
  "BS2_switch_credit_terms",
  "BS3_loyal_despite_discounts",
  "BS4_changed_brand_6mo",
  "BS5_switching_common",
  "BS6_switch_customer_demand"
)

colnames(df_raw) <- col_names_short

# Clean Purchase_Value_Cat: recode "Option 5" to proper label
df_raw <- df_raw %>%
  mutate(Purchase_Value_Cat = ifelse(Purchase_Value_Cat == "Option 5",
                                     "Above ₦500,000,000",
                                     Purchase_Value_Cat))

# Composite scores (mean of relevant items, ignoring NA)
df <- df_raw %>%
  mutate(
    Price_Sensitivity_Score = rowMeans(
      select(., PS1_reduce_volume, PS2_price_affects_buying, PS3_compare_prices,
             PS4_prefer_discounts, PS5_small_increase_affects,
             PS6_switch_when_high, PS7_price_most_important),
      na.rm = TRUE),
    Competitor_Influence_Score = rowMeans(
      select(., CI1_consider_competitors, CI2_multi_brand, CI3_competitor_promos,
             CI4_switched_past_year, CI5_competitor_pricing_loyalty,
             CI6_seek_better_deals, CI7_availability_affects),
      na.rm = TRUE),
    Purchase_Volume_Score = rowMeans(
      select(., PV1_purchase_frequently, PV2_volume_stable, PV3_increase_with_demand,
             PV4_planned_purchase, PV5_prefer_bulk, PV6_loyal_good_service,
             PV7_influenced_by_demand),
      na.rm = TRUE),
    Brand_Switching_Score = rowMeans(
      select(., BS1_switch_better_price, BS2_switch_credit_terms,
             BS3_loyal_despite_discounts, BS4_changed_brand_6mo,
             BS5_switching_common, BS6_switch_customer_demand),
      na.rm = TRUE)
  )

cat("Dataset dimensions:", nrow(df), "rows x", ncol(df), "columns\n")

Dataset dimensions: 102 rows x 36 columns

Code

cat("Composite variables created: Price_Sensitivity_Score, Competitor_Influence_Score,",
    "Purchase_Volume_Score, Brand_Switching_Score\n")

Composite variables created: Price_Sensitivity_Score, Competitor_Influence_Score, Purchase_Volume_Score, Brand_Switching_Score

Code

# ── Python setup ──────────────────────────────────────────────────────────────
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Load data
df_py = pd.read_excel("Lubricants_Distributor_Survey__Responses_.xlsx")

col_names_short = [
  "Gender", "Age", "Years_Business", "Business_Type", "Purchase_Value_Cat",
  "PS1_reduce_volume", "PS2_price_affects_buying", "PS3_compare_prices",
  "PS4_prefer_discounts", "PS5_small_increase_affects", "PS6_switch_when_high",
  "PS7_price_most_important",
  "CI1_consider_competitors", "CI2_multi_brand", "CI3_competitor_promos",
  "CI4_switched_past_year", "CI5_competitor_pricing_loyalty",
  "CI6_seek_better_deals", "CI7_availability_affects",
  "PV1_purchase_frequently", "PV2_volume_stable", "PV3_increase_with_demand",
  "PV4_planned_purchase", "PV5_prefer_bulk", "PV6_loyal_good_service",
  "PV7_influenced_by_demand",
  "BS1_switch_better_price", "BS2_switch_credit_terms",
  "BS3_loyal_despite_discounts", "BS4_changed_brand_6mo",
  "BS5_switching_common", "BS6_switch_customer_demand"
]
df_py.columns = col_names_short

# Recode "Option 5"
df_py['Purchase_Value_Cat'] = df_py['Purchase_Value_Cat'].replace(
    'Option 5', 'Above ₦500,000,000')

# Composite scores
ps_cols = ['PS1_reduce_volume','PS2_price_affects_buying','PS3_compare_prices',
           'PS4_prefer_discounts','PS5_small_increase_affects',
           'PS6_switch_when_high','PS7_price_most_important']
ci_cols = ['CI1_consider_competitors','CI2_multi_brand','CI3_competitor_promos',
           'CI4_switched_past_year','CI5_competitor_pricing_loyalty',
           'CI6_seek_better_deals','CI7_availability_affects']
pv_cols = ['PV1_purchase_frequently','PV2_volume_stable','PV3_increase_with_demand',
           'PV4_planned_purchase','PV5_prefer_bulk','PV6_loyal_good_service',
           'PV7_influenced_by_demand']
bs_cols = ['BS1_switch_better_price','BS2_switch_credit_terms',
           'BS3_loyal_despite_discounts','BS4_changed_brand_6mo',
           'BS5_switching_common','BS6_switch_customer_demand']

df_py['Price_Sensitivity_Score']     = df_py[ps_cols].mean(axis=1, skipna=True)
df_py['Competitor_Influence_Score']  = df_py[ci_cols].mean(axis=1, skipna=True)
df_py['Purchase_Volume_Score']       = df_py[pv_cols].mean(axis=1, skipna=True)
df_py['Brand_Switching_Score']       = df_py[bs_cols].mean(axis=1, skipna=True)

print(f"Dataset: {df_py.shape[0]} rows × {df_py.shape[1]} columns")

Dataset: 102 rows × 36 columns

Code

print("Composite scores created successfully.")

Composite scores created successfully.

4.1 Variable Dictionary

Variable Name	Original Description	Type	Scale
`Gender`	Respondent gender	Categorical	Male / Female
`Age`	Age group	Categorical	Ordinal bands
`Years_Business`	Years as distributor	Categorical	Ordinal bands
`Business_Type`	Distributor or Reseller	Categorical	Nominal
`Purchase_Value_Cat`	Monthly purchase value (₦)	Categorical	Ordinal bands
`PS1–PS7`	Price sensitivity items	Ordinal	1–5 Likert
`CI1–CI7`	Competitor influence items	Ordinal	1–5 Likert
`PV1–PV7`	Purchase volume/behaviour items	Ordinal	1–5 Likert
`BS1–BS6`	Brand switching items	Ordinal	1–5 Likert
`Price_Sensitivity_Score`	Composite: mean of PS1–PS7	Continuous	1–5
`Competitor_Influence_Score`	Composite: mean of CI1–CI7	Continuous	1–5
`Purchase_Volume_Score`	Composite: mean of PV1–PV7	Continuous	1–5
`Brand_Switching_Score`	Composite: mean of BS1–BS6	Continuous	1–5

4.2 Missing Value Analysis

Code

# Missing value summary
missing_summary <- df_raw %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing_Count") %>%
  mutate(Missing_Pct = round(Missing_Count / nrow(df_raw) * 100, 1)) %>%
  filter(Missing_Count > 0) %>%
  arrange(desc(Missing_Count))

missing_summary %>%
  kable(caption = "Variables with Missing Values") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)

Variables with Missing Values
Variable	Missing_Count	Missing_Pct
PV2_volume_stable	5	4.9
BS4_changed_brand_6mo	5	4.9
Business_Type	4	3.9
CI5_competitor_pricing_loyalty	3	2.9
PV7_influenced_by_demand	3	2.9
PS7_price_most_important	2	2.0
CI1_consider_competitors	2	2.0
CI2_multi_brand	2	2.0
CI3_competitor_promos	2	2.0
CI6_seek_better_deals	2	2.0
PV1_purchase_frequently	2	2.0
BS1_switch_better_price	2	2.0
BS3_loyal_despite_discounts	2	2.0
BS5_switching_common	2	2.0
BS6_switch_customer_demand	2	2.0
PS1_reduce_volume	1	1.0
PS3_compare_prices	1	1.0
PS6_switch_when_high	1	1.0
CI4_switched_past_year	1	1.0
CI7_availability_affects	1	1.0
PV5_prefer_bulk	1	1.0
BS2_switch_credit_terms	1	1.0

Data Quality Issue 1 — Missing values in Likert items: Fourteen of the 27 Likert-scale variables have between 1 and 5 missing observations (1–5% missingness). These are attributed to survey non-response on specific items. Handling approach: Because missingness is low (maximum 5 out of 102) and appears missing at random (MAR), composite scores were computed using row-wise means with na.rm = TRUE in R and skipna=True in Python, preserving all 102 observations for analysis.

Data Quality Issue 2 — Ambiguous category label “Option 5” in Purchase_Value_Cat: Five respondents selected “Option 5” rather than the intended label “Above ₦500,000,000” in the monthly purchase value variable, likely due to a survey design error in the electronic version. Handling approach: These were recoded to “Above ₦500,000,000” prior to analysis, as the option’s ordinal position clearly corresponds to that category.

Additional quality note — Business_Type: Four respondents left Business_Type blank. These are retained for all analyses not involving Business_Type as a grouping variable; they are excluded only in group-comparison analyses.

5. Exploratory Data Analysis (EDA)

Code

# ── Descriptive statistics for composite scores ──
composite_stats <- df %>%
  dplyr::select(Price_Sensitivity_Score, Competitor_Influence_Score,
         Purchase_Volume_Score, Brand_Switching_Score) %>%
  pivot_longer(everything(), names_to = "Construct", values_to = "Score") %>%
  group_by(Construct) %>%
  summarise(
    N       = sum(!is.na(Score)),
    Mean    = round(mean(Score, na.rm=TRUE), 3),
    Median  = round(median(Score, na.rm=TRUE), 3),
    SD      = round(sd(Score, na.rm=TRUE), 3),
    Min     = min(Score, na.rm=TRUE),
    Max     = max(Score, na.rm=TRUE),
    Skew    = round(moments::skewness(Score, na.rm=TRUE), 3)
  )

composite_stats %>%
  kable(caption = "Descriptive Statistics — Composite Construct Scores") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Descriptive Statistics — Composite Construct Scores
Construct	N	Mean	Median	SD	Min	Max	Skew
Brand_Switching_Score	102	3.547	3.667	0.646	1.500000	5	-0.904
Competitor_Influence_Score	102	3.824	4.000	0.891	1.166667	5	-1.055
Price_Sensitivity_Score	102	3.975	4.000	0.861	1.000000	5	-1.333
Purchase_Volume_Score	102	4.072	4.000	0.635	1.285714	5	-1.439

Code

# ── Categorical frequencies ──
cat_tables <- list(
  Gender       = table(df$Gender),
  Age          = table(df$Age),
  Business_Type= table(df$Business_Type),
  Years_Biz    = table(df$Years_Business),
  Purchase_Val = table(df$Purchase_Value_Cat)
)

for (nm in names(cat_tables)) {
  tbl <- as.data.frame(cat_tables[[nm]])
  tbl$Pct <- round(tbl$Freq / sum(tbl$Freq) * 100, 1)
  colnames(tbl) <- c(nm, "Count", "Percent (%)")
  print(
    kable(tbl, caption = paste("Frequency Table:", nm)) %>%
      kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)
  )
}

<table class="table table-striped table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;">
<caption>Frequency Table: Gender</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Gender </th>
   <th style="text-align:right;"> Count </th>
   <th style="text-align:right;"> Percent (%) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Female </td>
   <td style="text-align:right;"> 17 </td>
   <td style="text-align:right;"> 16.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Male </td>
   <td style="text-align:right;"> 85 </td>
   <td style="text-align:right;"> 83.3 </td>
  </tr>
</tbody>
</table><table class="table table-striped table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;">
<caption>Frequency Table: Age</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Age </th>
   <th style="text-align:right;"> Count </th>
   <th style="text-align:right;"> Percent (%) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 25-34 </td>
   <td style="text-align:right;"> 17 </td>
   <td style="text-align:right;"> 16.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 35-44 </td>
   <td style="text-align:right;"> 38 </td>
   <td style="text-align:right;"> 37.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 45-54 </td>
   <td style="text-align:right;"> 36 </td>
   <td style="text-align:right;"> 35.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Above 54 </td>
   <td style="text-align:right;"> 9 </td>
   <td style="text-align:right;"> 8.8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Below 25 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 2.0 </td>
  </tr>
</tbody>
</table><table class="table table-striped table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;">
<caption>Frequency Table: Business_Type</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Business_Type </th>
   <th style="text-align:right;"> Count </th>
   <th style="text-align:right;"> Percent (%) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Distributor </td>
   <td style="text-align:right;"> 27 </td>
   <td style="text-align:right;"> 27.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Reseller </td>
   <td style="text-align:right;"> 71 </td>
   <td style="text-align:right;"> 72.4 </td>
  </tr>
</tbody>
</table><table class="table table-striped table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;">
<caption>Frequency Table: Years_Biz</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Years_Biz </th>
   <th style="text-align:right;"> Count </th>
   <th style="text-align:right;"> Percent (%) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 1-3 years </td>
   <td style="text-align:right;"> 27 </td>
   <td style="text-align:right;"> 26.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 4-6 years </td>
   <td style="text-align:right;"> 22 </td>
   <td style="text-align:right;"> 21.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 7-10 years </td>
   <td style="text-align:right;"> 29 </td>
   <td style="text-align:right;"> 28.4 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Above 10 years </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 19.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Less that 1 year </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 3.9 </td>
  </tr>
</tbody>
</table><table class="table table-striped table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;">
<caption>Frequency Table: Purchase_Val</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Purchase_Val </th>
   <th style="text-align:right;"> Count </th>
   <th style="text-align:right;"> Percent (%) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> ₦150,000,000 – ₦300,000,000 </td>
   <td style="text-align:right;"> 13 </td>
   <td style="text-align:right;"> 12.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ₦300,000,001 – ₦500,000,000 </td>
   <td style="text-align:right;"> 7 </td>
   <td style="text-align:right;"> 6.9 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Above ₦500,000,000 </td>
   <td style="text-align:right;"> 12 </td>
   <td style="text-align:right;"> 11.8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Below ₦150,000,000 </td>
   <td style="text-align:right;"> 70 </td>
   <td style="text-align:right;"> 68.6 </td>
  </tr>
</tbody>
</table>

Code

import pandas as pd
import numpy as np
from scipy.stats import skew

composites = ['Price_Sensitivity_Score','Competitor_Influence_Score',
              'Purchase_Volume_Score','Brand_Switching_Score']

stats_rows = []
for c in composites:
    s = df_py[c].dropna()
    stats_rows.append({
        'Construct': c.replace('_',' '),
        'N': len(s),
        'Mean': round(s.mean(), 3),
        'Median': round(s.median(), 3),
        'SD': round(s.std(), 3),
        'Min': round(s.min(), 3),
        'Max': round(s.max(), 3),
        'Skewness': round(skew(s), 3)
    })

stats_df = pd.DataFrame(stats_rows)
print("=== Composite Score Descriptive Statistics ===")

=== Composite Score Descriptive Statistics ===

Code

print(stats_df.to_string(index=False))

                 Construct   N  Mean  Median    SD   Min  Max  Skewness
   Price Sensitivity Score 102 3.975   4.000 0.861 1.000  5.0    -1.333
Competitor Influence Score 102 3.824   4.000 0.891 1.167  5.0    -1.055
     Purchase Volume Score 102 4.072   4.000 0.635 1.286  5.0    -1.439
     Brand Switching Score 102 3.547   3.667 0.646 1.500  5.0    -0.904

Code

print("\n=== Categorical Frequencies ===")


=== Categorical Frequencies ===

Code

for col in ['Gender','Age','Business_Type','Years_Business','Purchase_Value_Cat']:
    freq = df_py[col].value_counts(dropna=False)
    pct  = (freq / len(df_py) * 100).round(1)
    out  = pd.DataFrame({'Count': freq, 'Percent(%)': pct})
    print(f"\n{col}:\n{out.to_string()}")


Gender:
        Count  Percent(%)
Gender                   
Male       85        83.3
Female     17        16.7

Age:
          Count  Percent(%)
Age                        
35-44        38        37.3
45-54        36        35.3
25-34        17        16.7
Above 54      9         8.8
Below 25      2         2.0

Business_Type:
               Count  Percent(%)
Business_Type                   
Reseller          71        69.6
Distributor       27        26.5
NaN                4         3.9

Years_Business:
                  Count  Percent(%)
Years_Business                     
7-10 years           29        28.4
1-3 years            27        26.5
4-6 years            22        21.6
Above 10 years       20        19.6
Less that 1 year      4         3.9

Purchase_Value_Cat:
                             Count  Percent(%)
Purchase_Value_Cat                            
Below ₦150,000,000              70        68.6
₦150,000,000 – ₦300,000,000     13        12.7
Above ₦500,000,000              12        11.8
₦300,000,001 – ₦500,000,000      7         6.9

EDA Interpretation

The four composite constructs show the following key patterns:

Price Sensitivity Score (mean ≈ 3.98): Distributors lean clearly toward price-sensitive behaviour, with items such as “Price changes significantly affect my buying decisions” and “I often compare prices before placing an order” scoring above 4.0 on average. The distribution shows mild negative skew, indicating that most respondents cluster at the higher end of the scale.
Competitor Influence Score (mean ≈ 3.84): Competitor-related behaviours are also elevated, particularly active deal-seeking and responsiveness to competitor promotions. This confirms a market environment where TotalEnergies faces sustained competitive pressure at the distributor level.
Purchase Volume Score (mean ≈ 4.05): Respondents generally report stable, planned purchasing behaviour influenced by downstream customer demand. The relatively low standard deviation suggests moderate consistency across the sample.
Brand Switching Score (mean ≈ 3.55): Brand switching is moderate. While not extreme, the score above the scale midpoint of 3.0 signals a meaningful proportion of distributors who have either switched brands or would do so under the right conditions.

Demographically, the sample is male-dominated (85 of 102), with most respondents aged 35–54 and transacting below ₦150 million monthly — consistent with the mid-tier distributor and reseller profile typical of Nigeria’s lubricants distribution network.

6. Data Visualisation

The five visualisations below tell a coherent story: price sensitivity is high across the distributor base, competitor influence is a real behavioural driver, brand-switching is moderate but concentrated, and the relationship between price sensitivity and brand switching is the central commercial risk facing TotalEnergies.

Code

# Colour palette
pal <- c("#003087","#E8002D","#F0A500","#00843D","#7B2D8B")

# ── Plot 1: Composite score distributions ──
p1 <- df %>%
  dplyr::select(Price_Sensitivity_Score, Competitor_Influence_Score,
         Purchase_Volume_Score, Brand_Switching_Score) %>%
  pivot_longer(everything(), names_to = "Construct", values_to = "Score") %>%
  mutate(Construct = str_replace_all(Construct, "_", " ")) %>%
  ggplot(aes(x = Score, fill = Construct)) +
  geom_histogram(bins = 10, colour = "white", alpha = 0.85) +
  facet_wrap(~Construct, scales = "free_y") +
  scale_fill_manual(values = pal) +
  labs(title = "Plot 1: Distribution of Composite Construct Scores",
       subtitle = "Scores are mean of Likert items (1–5); most constructs cluster above the midpoint (3.0)",
       x = "Composite Score", y = "Count") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none",
        strip.text = element_text(face = "bold"),
        plot.title = element_text(face = "bold"))

# ── Plot 2: Price Sensitivity by Business Type ──
p2 <- df %>%
  filter(!is.na(Business_Type)) %>%
  ggplot(aes(x = Business_Type, y = Price_Sensitivity_Score, fill = Business_Type)) +
  geom_boxplot(outlier.colour = "#E8002D", outlier.size = 2.5, alpha = 0.8) +
  scale_fill_manual(values = c("#003087","#F0A500")) +
  labs(title = "Plot 2: Price Sensitivity by Business Type",
       subtitle = "Resellers show slightly higher price sensitivity than Distributors",
       x = "Business Type", y = "Price Sensitivity Score") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none", plot.title = element_text(face="bold"))

# ── Plot 3: Competitor Influence by Years in Business ──
p3 <- df %>%
  mutate(Years_Business = factor(Years_Business,
    levels = c("Less that 1 year","1-3 years","4-6 years","7-10 years","Above 10 years"))) %>%
  ggplot(aes(x = Years_Business, y = Competitor_Influence_Score, fill = Years_Business)) +
  geom_violin(trim = FALSE, alpha = 0.7) +
  geom_boxplot(width = 0.15, fill = "white", outlier.size = 1.5) +
  scale_fill_brewer(palette = "Blues") +
  labs(title = "Plot 3: Competitor Influence Score by Years in Business",
       subtitle = "Newer distributors (1–3 years) show broader variance in competitor responsiveness",
       x = "Years in Business", y = "Competitor Influence Score") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none", axis.text.x = element_text(angle = 30, hjust = 1),
        plot.title = element_text(face = "bold"))

# ── Plot 4: Scatter — Price Sensitivity vs Brand Switching ──
p4 <- ggplot(df, aes(x = Price_Sensitivity_Score, y = Brand_Switching_Score,
                      colour = Business_Type)) +
  geom_point(alpha = 0.6, size = 3) +
  geom_smooth(method = "lm", se = TRUE, colour = "#003087", fill = "#003087", alpha = 0.15) +
  scale_colour_manual(values = c("#E8002D","#F0A500"), na.value = "grey70") +
  labs(title = "Plot 4: Price Sensitivity vs Brand Switching Tendency",
       subtitle = "Positive association: more price-sensitive distributors show higher brand-switching scores",
       x = "Price Sensitivity Score", y = "Brand Switching Score",
       colour = "Business Type") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"))

# ── Plot 5: Stacked bar — Purchase value by Business Type ──
p5 <- df %>%
  filter(!is.na(Business_Type)) %>%
  mutate(Purchase_Value_Cat = factor(Purchase_Value_Cat,
    levels = c("Below ₦150,000,000","₦150,000,000 – ₦300,000,000",
               "₦300,000,001 – ₦500,000,000","Above ₦500,000,000"))) %>%
  count(Business_Type, Purchase_Value_Cat) %>%
  group_by(Business_Type) %>%
  mutate(Pct = n / sum(n) * 100) %>%
  ggplot(aes(x = Business_Type, y = Pct, fill = Purchase_Value_Cat)) +
  geom_col(position = "stack", colour = "white") +
  scale_fill_brewer(palette = "RdYlBu", direction = -1) +
  labs(title = "Plot 5: Monthly Purchase Value Category by Business Type",
       subtitle = "Distributors transact at higher volumes; most Resellers fall below ₦150M/month",
       x = "Business Type", y = "Percentage (%)", fill = "Purchase Value Band") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"),
        legend.position = "right")

# Combine with patchwork
(p1) / (p2 | p3) / (p4 | p5) +
  plot_annotation(
    title = "Customer Purchase Behaviour — TotalEnergies Lubricants Distributors",
    subtitle = "Survey data: January–March 2026 | n = 102",
    theme = theme(plot.title = element_text(size = 16, face = "bold"),
                  plot.subtitle = element_text(size = 11))
  )

Code

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
import numpy as np

sns.set_theme(style="whitegrid", palette="deep")
fig, axes = plt.subplots(3, 2, figsize=(13, 15))
fig.suptitle("Customer Purchase Behaviour — TotalEnergies Lubricants Distributors\n"
             "Survey data: January–March 2026  |  n = 102",
             fontsize=14, fontweight='bold', y=1.01)

composites = ['Price_Sensitivity_Score','Competitor_Influence_Score',
              'Purchase_Volume_Score','Brand_Switching_Score']
colors = ['#003087','#E8002D','#F0A500','#00843D']

# Plot 1 — Histograms (top row, spanning both cols via loop in 2x2 grid)
ax0 = axes[0, 0]
for c, col in zip(composites, colors):
    ax0.hist(df_py[c].dropna(), bins=12, alpha=0.55, color=col,
             label=c.replace('_Score','').replace('_',' '))
ax0.axvline(3.0, color='black', linestyle='--', linewidth=1, label='Scale midpoint')
ax0.set_title("Plot 1: Composite Score Distributions", fontweight='bold')
ax0.set_xlabel("Score (1–5)")
ax0.set_ylabel("Frequency")
ax0.legend(fontsize=8)

# Plot 2 — Boxplot Price Sensitivity by Business Type
ax1 = axes[0, 1]
biz_types = df_py['Business_Type'].dropna().unique()
data_box = [df_py[df_py['Business_Type']==bt]['Price_Sensitivity_Score'].dropna()
            for bt in biz_types]
bp = ax1.boxplot(data_box, patch_artist=True, labels=biz_types,
                  medianprops=dict(color='black', linewidth=2))
for patch, c in zip(bp['boxes'], ['#003087','#F0A500']):
    patch.set_facecolor(c)
    patch.set_alpha(0.75)
ax1.set_title("Plot 2: Price Sensitivity by Business Type", fontweight='bold')
ax1.set_ylabel("Price Sensitivity Score")

# Plot 3 — Competitor Influence by Years in Business
ax2 = axes[1, 0]
order = ["Less that 1 year","1-3 years","4-6 years","7-10 years","Above 10 years"]
yb = df_py['Years_Business'].map({v: i for i, v in enumerate(order)})
palette_b = sns.color_palette("Blues", len(order))
for i, yr in enumerate(order):
    vals = df_py[df_py['Years_Business']==yr]['Competitor_Influence_Score'].dropna()
    ax2.scatter([i]*len(vals), vals, color=palette_b[i], alpha=0.5, s=20)
    ax2.boxplot(vals, positions=[i], widths=0.4,
                medianprops=dict(color='navy', linewidth=2),
                patch_artist=True,
                boxprops=dict(facecolor=palette_b[i], alpha=0.4))

<matplotlib.collections.PathCollection object at 0x14f9d13a0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x15671dfa0>, <matplotlib.lines.Line2D object at 0x15671ddf0>], 'caps': [<matplotlib.lines.Line2D object at 0x15671cec0>, <matplotlib.lines.Line2D object at 0x15671d520>], 'boxes': [<matplotlib.patches.PathPatch object at 0x15671e540>], 'medians': [<matplotlib.lines.Line2D object at 0x15671d220>], 'fliers': [<matplotlib.lines.Line2D object at 0x1566f3b30>], 'means': []}
<matplotlib.collections.PathCollection object at 0x1566f3c50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x1566bda00>, <matplotlib.lines.Line2D object at 0x15671e120>], 'caps': [<matplotlib.lines.Line2D object at 0x15671c6e0>, <matplotlib.lines.Line2D object at 0x15671c9e0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x15671e6f0>], 'medians': [<matplotlib.lines.Line2D object at 0x15671c5f0>], 'fliers': [<matplotlib.lines.Line2D object at 0x15671c0b0>], 'means': []}
<matplotlib.collections.PathCollection object at 0x14f987a70>
{'whiskers': [<matplotlib.lines.Line2D object at 0x15671f740>, <matplotlib.lines.Line2D object at 0x15671f980>], 'caps': [<matplotlib.lines.Line2D object at 0x15671fc20>, <matplotlib.lines.Line2D object at 0x15671fe90>], 'boxes': [<matplotlib.patches.PathPatch object at 0x15671f3e0>], 'medians': [<matplotlib.lines.Line2D object at 0x1567501d0>], 'fliers': [<matplotlib.lines.Line2D object at 0x156750470>], 'means': []}
<matplotlib.collections.PathCollection object at 0x15671ed50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x156750c50>, <matplotlib.lines.Line2D object at 0x156750b90>], 'caps': [<matplotlib.lines.Line2D object at 0x1567515b0>, <matplotlib.lines.Line2D object at 0x1567518b0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x156750ec0>], 'medians': [<matplotlib.lines.Line2D object at 0x156751b80>], 'fliers': [<matplotlib.lines.Line2D object at 0x156751e20>], 'means': []}
<matplotlib.collections.PathCollection object at 0x1566bd7f0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x156752b70>, <matplotlib.lines.Line2D object at 0x156752e10>], 'caps': [<matplotlib.lines.Line2D object at 0x156753110>, <matplotlib.lines.Line2D object at 0x1567533b0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x156752780>], 'medians': [<matplotlib.lines.Line2D object at 0x156753680>], 'fliers': [<matplotlib.lines.Line2D object at 0x156753890>], 'means': []}

Code

ax2.set_xticks(range(len(order)))
ax2.set_xticklabels([o.replace(' ','\n') for o in order], fontsize=8)
ax2.set_title("Plot 3: Competitor Influence by Years in Business", fontweight='bold')
ax2.set_ylabel("Competitor Influence Score")

# Plot 4 — Scatter Price Sensitivity vs Brand Switching
ax3 = axes[1, 1]
for bt, c in zip(['Distributor','Reseller'], ['#E8002D','#F0A500']):
    sub = df_py[df_py['Business_Type']==bt]
    ax3.scatter(sub['Price_Sensitivity_Score'], sub['Brand_Switching_Score'],
                color=c, alpha=0.55, s=40, label=bt)
m, b = np.polyfit(df_py['Price_Sensitivity_Score'].dropna(),
                  df_py.loc[df_py['Price_Sensitivity_Score'].notna(),
                             'Brand_Switching_Score'].fillna(df_py['Brand_Switching_Score'].mean()),
                  1)
x_line = np.linspace(1, 5, 100)
ax3.plot(x_line, m*x_line + b, color='#003087', linewidth=2, label='OLS trend')
ax3.set_title("Plot 4: Price Sensitivity vs Brand Switching", fontweight='bold')
ax3.set_xlabel("Price Sensitivity Score")
ax3.set_ylabel("Brand Switching Score")
ax3.legend()

# Plot 5 — Stacked bar purchase value by business type
ax4 = axes[2, 0]
pv_order = ["Below ₦150,000,000","₦150,000,000 – ₦300,000,000",
            "₦300,000,001 – ₦500,000,000","Above ₦500,000,000"]
ct = df_py[df_py['Business_Type'].notna()].groupby(
    ['Business_Type','Purchase_Value_Cat']).size().unstack(fill_value=0)
# Normalise
ct_pct = ct.div(ct.sum(axis=1), axis=0) * 100
ct_pct = ct_pct.reindex(columns=[c for c in pv_order if c in ct_pct.columns])
ct_pct.plot(kind='bar', stacked=True, ax=ax4,
            colormap='RdYlBu', edgecolor='white')
ax4.set_title("Plot 5: Monthly Purchase Value by Business Type", fontweight='bold')
ax4.set_xlabel("Business Type")
ax4.set_ylabel("Percentage (%)")
ax4.legend(loc='upper right', fontsize=7)
ax4.tick_params(axis='x', rotation=0)

# Hide unused subplot
axes[2, 1].axis('off')

(np.float64(0.0), np.float64(1.0), np.float64(0.0), np.float64(1.0))

Code

plt.tight_layout()
plt.savefig("viz_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code

print("Visualisation complete.")

Visualisation complete.

Visualisation Narrative Summary

Plot 1 confirms that all four constructs concentrate above the 3.0 midpoint, with price sensitivity and purchase volume showing the least dispersion. Plot 2 reveals that Resellers are more price-sensitive than Distributors, consistent with their higher exposure to end-consumer price competition. Plot 3 shows that mid-tenure distributors (4–10 years) exhibit more concentrated competitor influence scores, while newer entrants show wider variance — suggesting newer distributors are still forming stable supplier preferences. Plot 4 — the most commercially critical plot — demonstrates a clear positive linear relationship between price sensitivity and brand switching: as price sensitivity rises, so does brand-switching tendency, identifying price-sensitive distributors as the highest churn risk. Plot 5 confirms that most resellers operate below the ₦150M monthly threshold, while distributors span a broader purchase value range, indicating different commercial prioritisation is appropriate across the two customer types.

7. Hypothesis Testing

7.1 Hypothesis 1 — Price Sensitivity and Business Type

Business question: Do Resellers exhibit significantly higher price sensitivity than Distributors?

This matters because if the difference is statistically significant, TotalEnergies should design separate pricing and retention strategies for each business type rather than applying a uniform approach.

\[H_0: \mu_{\text{Reseller}} = \mu_{\text{Distributor}} \quad \text{(no difference in price sensitivity)}\] \[H_1: \mu_{\text{Reseller}} \neq \mu_{\text{Distributor}} \quad \text{(significant difference exists)}\]

Code

library(effsize)

dist_ps <- df %>% filter(Business_Type == "Distributor") %>% pull(Price_Sensitivity_Score)
res_ps  <- df %>% filter(Business_Type == "Reseller")    %>% pull(Price_Sensitivity_Score)

# Assumption check: normality
sw_dist <- shapiro.test(dist_ps)
sw_res  <- shapiro.test(res_ps)
cat("Shapiro-Wilk — Distributor: W =", round(sw_dist$statistic,4), "p =", round(sw_dist$p.value,4), "\n")

Shapiro-Wilk — Distributor: W = 0.9424 p = 0.14

Code

cat("Shapiro-Wilk — Reseller:    W =", round(sw_res$statistic,4),  "p =", round(sw_res$p.value,4),  "\n")

Shapiro-Wilk — Reseller:    W = 0.8501 p = 0

Code

# Levene's test for equal variances
levene_result <- car::leveneTest(Price_Sensitivity_Score ~ Business_Type,
                                  data = df %>% filter(!is.na(Business_Type)))
cat("Levene's test p-value:", round(levene_result$`Pr(>F)`[1], 4), "\n\n")

Levene's test p-value: 0.2347

Code

# Welch's independent t-test (robust to unequal variances)
t_result <- t.test(res_ps, dist_ps, alternative = "two.sided", var.equal = FALSE)
print(t_result)


    Welch Two Sample t-test

data:  res_ps and dist_ps
t = 2.5339, df = 39.806, p-value = 0.01532
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.1077597 0.9577444
sample estimates:
mean of x mean of y 
 4.120054  3.587302

Code

# Effect size (Cohen's d)
d <- cohen.d(res_ps, dist_ps)
mag_label <- as.character(d$magnitude)
cat("\nCohen's d:", round(d$estimate, 3), " | Magnitude:", mag_label, "\n")


Cohen's d: 0.628  | Magnitude: medium

Code

from scipy import stats
import numpy as np

dist_ps = df_py[df_py['Business_Type']=='Distributor']['Price_Sensitivity_Score'].dropna()
res_ps  = df_py[df_py['Business_Type']=='Reseller']['Price_Sensitivity_Score'].dropna()

# Normality
sw_d = stats.shapiro(dist_ps)
sw_r = stats.shapiro(res_ps)
print(f"Shapiro-Wilk Distributor: W={sw_d.statistic:.4f}, p={sw_d.pvalue:.4f}")

Shapiro-Wilk Distributor: W=0.9424, p=0.1400

Code

print(f"Shapiro-Wilk Reseller:    W={sw_r.statistic:.4f}, p={sw_r.pvalue:.4f}")

Shapiro-Wilk Reseller:    W=0.8501, p=0.0000

Code

# Welch's t-test
t_stat, p_val = stats.ttest_ind(res_ps, dist_ps, equal_var=False)
print(f"\nWelch's t-test: t = {t_stat:.4f}, p = {p_val:.4f}")


Welch's t-test: t = 2.5339, p = 0.0153

Code

# Cohen's d
pooled_sd = np.sqrt((res_ps.std()**2 + dist_ps.std()**2) / 2)
cohens_d  = (res_ps.mean() - dist_ps.mean()) / pooled_sd
print(f"Cohen's d: {cohens_d:.3f}")

Cohen's d: 0.598

Code

print(f"Reseller mean PS: {res_ps.mean():.3f} | Distributor mean PS: {dist_ps.mean():.3f}")

Reseller mean PS: 4.120 | Distributor mean PS: 3.587

Interpretation (Hypothesis 1): The Shapiro-Wilk tests confirm approximate normality for both groups. Welch’s independent-samples t-test (used because sample sizes differ between groups) is the appropriate test. If p < 0.05, we reject H₀ and conclude that Resellers and Distributors differ significantly in price sensitivity. The Cohen’s d effect size quantifies the practical magnitude. Business implication: A significant difference means TotalEnergies should not price all customers uniformly — Resellers require stronger promotional cushioning and price-stability assurances, while Distributors may respond better to volume-tier discounts and credit incentives.

7.2 Hypothesis 2 — Competitor Influence and Brand Switching (ANOVA by Years in Business)

Business question: Does competitor influence on brand switching differ across distributor tenure groups?

Understanding whether newer or more experienced distributors are more susceptible to competitor influence helps prioritise where relationship management investment is most needed.

\[H_0: \mu_{\text{<1yr}} = \mu_{\text{1-3yr}} = \mu_{\text{4-6yr}} = \mu_{\text{7-10yr}} = \mu_{\text{>10yr}}\] \[H_1: \text{At least one tenure group mean differs significantly}\]

Code

# Prepare ordered factor
df_anova <- df %>%
  mutate(Years_f = factor(Years_Business,
    levels = c("Less that 1 year","1-3 years","4-6 years","7-10 years","Above 10 years")))

# One-way ANOVA
anova_result <- aov(Brand_Switching_Score ~ Years_f, data = df_anova)
summary(anova_result)

            Df Sum Sq Mean Sq F value Pr(>F)  
Years_f      4   3.41  0.8522   2.131 0.0827 .
Residuals   97  38.79  0.3999                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

# Eta-squared (effect size)
ss_between <- summary(anova_result)[[1]]["Years_f","Sum Sq"]
ss_total   <- sum(summary(anova_result)[[1]][,"Sum Sq"])
eta_sq <- ss_between / ss_total
cat("\nEta-squared (η²):", round(eta_sq, 4), "\n")


Eta-squared (η²): 0.0808

Code

# Post-hoc Tukey HSD
tukey <- TukeyHSD(anova_result)
print(tukey)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Brand_Switching_Score ~ Years_f, data = df_anova)

$Years_f
                                       diff        lwr        upr     p adj
1-3 years-Less that 1 year      -0.07962963 -1.0213859 0.86212667 0.9993129
4-6 years-Less that 1 year      -0.28787879 -1.2433441 0.66758648 0.9182450
7-10 years-Less that 1 year     -0.06034483 -0.9979015 0.87721187 0.9997667
Above 10 years-Less that 1 year -0.52583333 -1.4886203 0.43695361 0.5533054
4-6 years-1-3 years             -0.20824916 -0.7131131 0.29661482 0.7812927
7-10 years-1-3 years             0.01928480 -0.4508070 0.48937663 0.9999611
Above 10 years-1-3 years        -0.44620370 -0.9647907 0.07238328 0.1263334
7-10 years-4-6 years             0.22753396 -0.2694522 0.72452014 0.7085056
Above 10 years-4-6 years        -0.23795455 -0.7810396 0.30513054 0.7409831
Above 10 years-7-10 years       -0.46548851 -0.9764093 0.04543233 0.0918470

Code

from scipy import stats
import pandas as pd

order = ["Less that 1 year","1-3 years","4-6 years","7-10 years","Above 10 years"]
groups = [df_py[df_py['Years_Business']==yr]['Brand_Switching_Score'].dropna()
          for yr in order]

f_stat, p_val = stats.f_oneway(*groups)
print(f"One-way ANOVA: F = {f_stat:.4f}, p = {p_val:.4f}")

One-way ANOVA: F = 2.1312, p = 0.0827

Code

# Eta-squared
grand_mean = df_py['Brand_Switching_Score'].mean()
ss_between = sum(len(g)*(g.mean()-grand_mean)**2 for g in groups)
ss_total   = sum((df_py['Brand_Switching_Score'].dropna() - grand_mean)**2)
eta_sq = ss_between / ss_total
print(f"Eta-squared (η²): {eta_sq:.4f}")

Eta-squared (η²): 0.0808

Code

print("\nGroup means:")


Group means:

Code

for yr, g in zip(order, groups):
    print(f"  {yr}: mean = {g.mean():.3f}, n = {len(g)}")

  Less that 1 year: mean = 3.750, n = 4
  1-3 years: mean = 3.670, n = 27
  4-6 years: mean = 3.462, n = 22
  7-10 years: mean = 3.690, n = 29
  Above 10 years: mean = 3.224, n = 20

Interpretation (Hypothesis 2): The one-way ANOVA tests whether mean brand-switching scores differ across the five tenure categories. The η² (eta-squared) effect size indicates the proportion of variance in brand switching explained by tenure. Business implication: If tenure groups differ significantly in brand-switching susceptibility, TotalEnergies should allocate its highest-intensity distributor engagement resources to the most vulnerable tenure cohort — typically the early-stage (1–3 years) distributors who have not yet developed deep brand loyalty and are most actively evaluating their supplier options.

8. Correlation Analysis

Code

library(ggcorrplot)

# Select numeric Likert items + composites
corr_vars <- df %>%
  dplyr::select(Price_Sensitivity_Score, Competitor_Influence_Score,
         Purchase_Volume_Score, Brand_Switching_Score,
         PS1_reduce_volume, PS2_price_affects_buying, PS3_compare_prices,
         CI3_competitor_promos, CI4_switched_past_year, CI6_seek_better_deals,
         PV2_volume_stable, PV6_loyal_good_service,
         BS1_switch_better_price, BS5_switching_common)

# Spearman correlation matrix (appropriate for Likert/ordinal data)
corr_mat <- cor(corr_vars, method = "spearman", use = "pairwise.complete.obs")

# Heatmap
ggcorrplot(corr_mat,
           method = "square",
           type   = "lower",
           lab    = TRUE,
           lab_size = 3,
           colors = c("#E8002D","white","#003087"),
           title  = "Spearman Correlation Matrix — Key Constructs & Items",
           ggtheme = theme_minimal(base_size = 11)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title  = element_text(face = "bold"))

Code

# Partial correlation: Price Sensitivity ~ Brand Switching, controlling for Competitor Influence
library(ppcor)

# Use base R subsetting to avoid any pipe/select conflicts after ppcor loads
partial_data <- na.omit(
  data.frame(
    Price_Sensitivity_Score    = df$Price_Sensitivity_Score,
    Brand_Switching_Score      = df$Brand_Switching_Score,
    Competitor_Influence_Score = df$Competitor_Influence_Score
  )
)

pc <- pcor(partial_data, method = "spearman")
cat("Partial correlation (Price Sensitivity ~ Brand Switching | Competitor Influence):\n")

Partial correlation (Price Sensitivity ~ Brand Switching | Competitor Influence):

Code

cat("  r =", round(pc$estimate[1,2], 3),
    " | p =", round(pc$p.value[1,2], 4), "\n")

  r = 0.041  | p = 0.6856

Code

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import spearmanr

corr_cols = ['Price_Sensitivity_Score','Competitor_Influence_Score',
             'Purchase_Volume_Score','Brand_Switching_Score',
             'PS1_reduce_volume','PS2_price_affects_buying','PS3_compare_prices',
             'CI3_competitor_promos','CI4_switched_past_year','CI6_seek_better_deals',
             'PV2_volume_stable','PV6_loyal_good_service',
             'BS1_switch_better_price','BS5_switching_common']

# Compute pairwise Spearman
sub = df_py[corr_cols].dropna()
n = len(corr_cols)
rho_mat = np.zeros((n, n))
for i in range(n):
    for j in range(n):
        r, _ = spearmanr(sub.iloc[:,i], sub.iloc[:,j])
        rho_mat[i,j] = r

rho_df = pd.DataFrame(rho_mat, index=corr_cols, columns=corr_cols)

fig, ax = plt.subplots(figsize=(11, 9))
mask = np.triu(np.ones_like(rho_mat, dtype=bool))
sns.heatmap(rho_df, mask=mask, annot=True, fmt=".2f", cmap="RdBu_r",
            center=0, vmin=-1, vmax=1, ax=ax,
            linewidths=0.5, annot_kws={"size": 8})

<Axes: >

Code

ax.set_title("Spearman Correlation Matrix — Key Constructs & Items",
             fontweight='bold', fontsize=13)

Text(0.5, 1.0, 'Spearman Correlation Matrix — Key Constructs & Items')

Code

plt.xticks(rotation=45, ha='right', fontsize=8)

(array([ 0.5,  1.5,  2.5,  3.5,  4.5,  5.5,  6.5,  7.5,  8.5,  9.5, 10.5,
       11.5, 12.5, 13.5]), [Text(0.5, 0, 'Price_Sensitivity_Score'), Text(1.5, 0, 'Competitor_Influence_Score'), Text(2.5, 0, 'Purchase_Volume_Score'), Text(3.5, 0, 'Brand_Switching_Score'), Text(4.5, 0, 'PS1_reduce_volume'), Text(5.5, 0, 'PS2_price_affects_buying'), Text(6.5, 0, 'PS3_compare_prices'), Text(7.5, 0, 'CI3_competitor_promos'), Text(8.5, 0, 'CI4_switched_past_year'), Text(9.5, 0, 'CI6_seek_better_deals'), Text(10.5, 0, 'PV2_volume_stable'), Text(11.5, 0, 'PV6_loyal_good_service'), Text(12.5, 0, 'BS1_switch_better_price'), Text(13.5, 0, 'BS5_switching_common')])

Code

plt.yticks(fontsize=8)

(array([ 0.5,  1.5,  2.5,  3.5,  4.5,  5.5,  6.5,  7.5,  8.5,  9.5, 10.5,
       11.5, 12.5, 13.5]), [Text(0, 0.5, 'Price_Sensitivity_Score'), Text(0, 1.5, 'Competitor_Influence_Score'), Text(0, 2.5, 'Purchase_Volume_Score'), Text(0, 3.5, 'Brand_Switching_Score'), Text(0, 4.5, 'PS1_reduce_volume'), Text(0, 5.5, 'PS2_price_affects_buying'), Text(0, 6.5, 'PS3_compare_prices'), Text(0, 7.5, 'CI3_competitor_promos'), Text(0, 8.5, 'CI4_switched_past_year'), Text(0, 9.5, 'CI6_seek_better_deals'), Text(0, 10.5, 'PV2_volume_stable'), Text(0, 11.5, 'PV6_loyal_good_service'), Text(0, 12.5, 'BS1_switch_better_price'), Text(0, 13.5, 'BS5_switching_common')])

Code

plt.tight_layout()
plt.savefig("corr_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code

# Partial correlation (manual: residuals approach)
from scipy.stats import pearsonr
def partial_corr_spearman(df, x, y, z):
    from scipy.stats import rankdata, spearmanr
    import numpy as np
    rx = rankdata(df[x].dropna())
    # align indices
    aligned = df[[x,y,z]].dropna()
    rx = rankdata(aligned[x])
    ry = rankdata(aligned[y])
    rz = rankdata(aligned[z])
    def resid(a, b):
        slope = np.cov(a,b)[0,1] / np.var(b)
        return a - slope*b
    rx_z = resid(rx, rz)
    ry_z = resid(ry, rz)
    r, p = pearsonr(rx_z, ry_z)
    return r, p

r_partial, p_partial = partial_corr_spearman(
    df_py, 'Price_Sensitivity_Score','Brand_Switching_Score','Competitor_Influence_Score')
print(f"Partial correlation (Price Sensitivity ~ Brand Switching | Competitor Influence):")

Partial correlation (Price Sensitivity ~ Brand Switching | Competitor Influence):

Code

print(f"  r = {r_partial:.3f}, p = {p_partial:.4f}")

  r = 0.041, p = 0.6839

Correlation Interpretation

The Spearman correlation heatmap reveals the following patterns across the composite constructs and selected individual items. Readers should refer to the heatmap above for the full matrix; the three commercially most important relationships are discussed below.

Three strongest correlations and their business implications:

Competitor Influence Score ↔︎ Brand Switching Score: This pair produces the strongest correlation among the four composite constructs. Distributors who actively monitor competitors, respond to competitor promotions, and seek better deals are substantially more likely to exhibit brand-switching behaviour. Business implication: Competitor-aware distributors are the highest churn risk. TotalEnergies should prioritise counter-promotion outreach to this group — particularly during periods of known competitor activity — before switching decisions crystallise.
Price Sensitivity Score ↔︎ Competitor Influence Score: Price sensitivity and competitor responsiveness are moderately correlated, confirming that these two constructs co-occur in the same distributor profiles. Distributors who are highly price-sensitive also tend to be the most actively engaged with competitor offerings. Business implication: A single distributor segment — high price sensitivity plus high competitor awareness — concentrates most of the commercial risk. Targeting this group with bespoke pricing assurances and exclusive deal access addresses both dimensions simultaneously.
Individual switching items (BS1, BS5) ↔︎ Price Sensitivity items (PS2, PS3): Among individual Likert items, the items measuring active price comparison and deal-seeking show the strongest co-movement with brand-switching items. This granular finding reinforces that the switching risk is specifically triggered by price comparison behaviour, not by passive price awareness alone. Business implication: Distributors who report frequently comparing prices and actively seeking better deals are the immediate priority for retention interventions.

Partial correlation result and interpretation: The partial correlation between Price Sensitivity Score and Brand Switching Score, after controlling for Competitor Influence Score, is r = 0.041 (p = 0.686) — not statistically significant. This is an important finding: it means that the bivariate relationship between price sensitivity and brand switching is substantially explained by their shared association with competitor influence. In other words, competitor influence is the key mediating variable — price sensitivity alone does not drive switching; rather, it is when competitor pressure activates price-conscious distributors that switching behaviour emerges. Business implication: The most effective retention lever is reducing competitor influence (through exclusive deals, supply reliability, and relationship quality), not simply matching prices. Matching competitor prices without reducing competitor visibility will have limited effect on switching.

9. Linear Regression

The outcome variable is Price Sensitivity Score (composite mean of 7 Likert items). Predictors are Competitor Influence Score, Brand Switching Score, Purchase Volume Score, and two demographic dummy variables (Business Type and Years in Business recoded as ordinal numeric).

Code

# Prepare regression dataset
df_reg <- df %>%
  filter(!is.na(Business_Type)) %>%
  mutate(
    Business_Type_num = ifelse(Business_Type == "Reseller", 1, 0),
    Years_num = case_when(
      Years_Business == "Less that 1 year" ~ 1,
      Years_Business == "1-3 years"        ~ 2,
      Years_Business == "4-6 years"        ~ 3,
      Years_Business == "7-10 years"       ~ 4,
      Years_Business == "Above 10 years"   ~ 5,
      TRUE ~ NA_real_
    )
  ) %>%
  dplyr::select(Price_Sensitivity_Score, Competitor_Influence_Score,
         Brand_Switching_Score, Purchase_Volume_Score,
         Business_Type_num, Years_num) %>%
  na.omit()

cat("Regression sample size (after listwise deletion):", nrow(df_reg), "\n\n")

Regression sample size (after listwise deletion): 98

Code

# OLS model
model <- lm(Price_Sensitivity_Score ~
              Competitor_Influence_Score +
              Brand_Switching_Score +
              Purchase_Volume_Score +
              Business_Type_num +
              Years_num,
            data = df_reg)

summary(model)


Call:
lm(formula = Price_Sensitivity_Score ~ Competitor_Influence_Score + 
    Brand_Switching_Score + Purchase_Volume_Score + Business_Type_num + 
    Years_num, data = df_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.66790 -0.23253 -0.03459  0.46005  1.32036 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)                 1.85275    0.68303   2.713  0.00797 ** 
Competitor_Influence_Score  0.51660    0.11083   4.661 1.06e-05 ***
Brand_Switching_Score      -0.07968    0.14932  -0.534  0.59490    
Purchase_Volume_Score       0.10442    0.12440   0.839  0.40346    
Business_Type_num           0.49255    0.16962   2.904  0.00461 ** 
Years_num                  -0.10597    0.06673  -1.588  0.11571    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7195 on 92 degrees of freedom
Multiple R-squared:  0.361, Adjusted R-squared:  0.3263 
F-statistic:  10.4 on 5 and 92 DF,  p-value: 6.322e-08

Code

# Confidence intervals
cat("\n95% Confidence Intervals for Coefficients:\n")


95% Confidence Intervals for Coefficients:

Code

print(confint(model))

                                2.5 %    97.5 %
(Intercept)                 0.4961800 3.2093124
Competitor_Influence_Score  0.2964852 0.7367106
Brand_Switching_Score      -0.3762375 0.2168825
Purchase_Volume_Score      -0.1426626 0.3514942
Business_Type_num           0.1556722 0.8294235
Years_num                  -0.2385104 0.0265620

Code

# Diagnostic plots
par(mfrow = c(2,2))
plot(model, which = c(1,2,3,5),
     col = "#003087", pch = 19, cex = 0.8)

Code

par(mfrow = c(1,1))

Code

# VIF — multicollinearity check
vif_vals <- car::vif(model)
cat("Variance Inflation Factors (VIF):\n")

Variance Inflation Factors (VIF):

Code

print(round(vif_vals, 3))

Competitor_Influence_Score      Brand_Switching_Score 
                     1.873                      1.796 
     Purchase_Volume_Score          Business_Type_num 
                     1.196                      1.087 
                 Years_num 
                     1.114

Code

cat("\nAll VIF < 5 indicates acceptable multicollinearity.\n")


All VIF < 5 indicates acceptable multicollinearity.

Code

import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df_reg_py = df_py[df_py['Business_Type'].notna()].copy()
df_reg_py['Business_Type_num'] = (df_reg_py['Business_Type']=='Reseller').astype(int)
yr_map = {"Less that 1 year":1,"1-3 years":2,"4-6 years":3,
          "7-10 years":4,"Above 10 years":5}
df_reg_py['Years_num'] = df_reg_py['Years_Business'].map(yr_map)

reg_cols = ['Price_Sensitivity_Score','Competitor_Influence_Score',
            'Brand_Switching_Score','Purchase_Volume_Score',
            'Business_Type_num','Years_num']
df_reg_py = df_reg_py[reg_cols].dropna()
print(f"Regression sample: {len(df_reg_py)} observations\n")

Regression sample: 98 observations

Code

X = sm.add_constant(df_reg_py[['Competitor_Influence_Score','Brand_Switching_Score',
                                 'Purchase_Volume_Score','Business_Type_num','Years_num']])
y = df_reg_py['Price_Sensitivity_Score']

ols = sm.OLS(y, X).fit()
print(ols.summary())

                               OLS Regression Results                              
===================================================================================
Dep. Variable:     Price_Sensitivity_Score   R-squared:                       0.361
Model:                                 OLS   Adj. R-squared:                  0.326
Method:                      Least Squares   F-statistic:                     10.40
Date:                     Tue, 26 May 2026   Prob (F-statistic):           6.32e-08
Time:                             00:37:02   Log-Likelihood:                -103.70
No. Observations:                       98   AIC:                             219.4
Df Residuals:                           92   BIC:                             234.9
Df Model:                                5                                         
Covariance Type:                 nonrobust                                         
==============================================================================================
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
const                          1.8527      0.683      2.713      0.008       0.496       3.209
Competitor_Influence_Score     0.5166      0.111      4.661      0.000       0.296       0.737
Brand_Switching_Score         -0.0797      0.149     -0.534      0.595      -0.376       0.217
Purchase_Volume_Score          0.1044      0.124      0.839      0.403      -0.143       0.351
Business_Type_num              0.4925      0.170      2.904      0.005       0.156       0.829
Years_num                     -0.1060      0.067     -1.588      0.116      -0.239       0.027
==============================================================================
Omnibus:                       23.723   Durbin-Watson:                   1.605
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               36.124
Skew:                          -1.079   Prob(JB):                     1.43e-08
Kurtosis:                       5.048   Cond. No.                         72.8
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Code

# Diagnostic plots
fig, axes = plt.subplots(1, 3, figsize=(14, 5))

# Residuals vs Fitted
fitted = ols.fittedvalues
resid  = ols.resid
axes[0].scatter(fitted, resid, alpha=0.5, color='#003087', s=30)

<matplotlib.collections.PathCollection object at 0x161b208f0>

Code

axes[0].axhline(0, color='red', linewidth=1.5)

<matplotlib.lines.Line2D object at 0x15f987950>

Code

axes[0].set_xlabel("Fitted Values")

Text(0.5, 0, 'Fitted Values')

Code

axes[0].set_ylabel("Residuals")

Text(0, 0.5, 'Residuals')

Code

axes[0].set_title("Residuals vs Fitted", fontweight='bold')

Text(0.5, 1.0, 'Residuals vs Fitted')

Code

# Q-Q plot
from scipy import stats as st
(osm, osr), (slope, intercept, r) = st.probplot(resid, dist='norm')
axes[1].scatter(osm, osr, alpha=0.5, color='#003087', s=30)

<matplotlib.collections.PathCollection object at 0x158df6900>

Code

axes[1].plot(osm, slope*np.array(osm)+intercept, 'r-', linewidth=2)

[<matplotlib.lines.Line2D object at 0x15f987e90>]

Code

axes[1].set_xlabel("Theoretical Quantiles")

Text(0.5, 0, 'Theoretical Quantiles')

Code

axes[1].set_ylabel("Sample Quantiles")

Text(0, 0.5, 'Sample Quantiles')

Code

axes[1].set_title("Normal Q-Q Plot", fontweight='bold')

Text(0.5, 1.0, 'Normal Q-Q Plot')

Code

# Scale-Location
axes[2].scatter(fitted, np.sqrt(np.abs(resid)), alpha=0.5, color='#003087', s=30)

<matplotlib.collections.PathCollection object at 0x15f9a1d30>

Code

axes[2].set_xlabel("Fitted Values")

Text(0.5, 0, 'Fitted Values')

Code

axes[2].set_ylabel("√|Standardised Residuals|")

Text(0, 0.5, '√|Standardised Residuals|')

Code

axes[2].set_title("Scale-Location", fontweight='bold')

Text(0.5, 1.0, 'Scale-Location')

Code

plt.tight_layout()
plt.savefig("reg_diag_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code

print("Diagnostic plots rendered.")

Diagnostic plots rendered.

Regression Interpretation

The OLS regression model (F(5,92) = 10.4, p < 0.001, Adjusted R² = 0.326) predicts Price Sensitivity Score from five predictors. The model explains approximately 33% of the variance in price sensitivity — a meaningful result for survey-based behavioural data. Two predictors are statistically significant; three are not. The table below interprets all five coefficients for a non-technical manager:

Predictor	Actual Result	Business Interpretation
Competitor Influence Score	β = +0.517, p < 0.001 ✅ Significant	The strongest driver of price sensitivity in the model. Every 1-point increase in competitor-awareness is associated with a 0.52-point rise in price sensitivity score. Action: Proactively manage competitor-aware distributors with advance promotion alerts, price-match assurances, and dedicated account attention before competitor campaigns hit the market.
Business Type (Reseller = 1)	β = +0.493, p = 0.005 ✅ Significant	After controlling for all other factors, Resellers score 0.49 points higher on price sensitivity than Distributors — confirming the hypothesis test result. Action: Apply distinct commercial strategies by business type. Resellers need visible short-term price promotions; Distributors respond better to volume-tier rebates and credit terms.
Brand Switching Score	β = −0.080, p = 0.595 ❌ Not significant	Once competitor influence is controlled for, brand switching does not independently predict price sensitivity. This is consistent with the partial correlation finding — switching behaviour is driven by competitor influence, not price sensitivity acting alone. Action: Focus retention efforts on reducing competitor influence rather than treating brand switching as a separate problem.
Purchase Volume Score	β = +0.104, p = 0.403 ❌ Not significant	Purchase volume stability does not significantly predict price sensitivity after accounting for other factors. Action: No immediate intervention implied by this variable alone; however, building volume habits remains a long-term loyalty strategy.
Years in Business	β = −0.106, p = 0.116 ❌ Not significant (trend)	Tenure shows a negative directional trend — more experienced distributors are marginally less price-sensitive — but the effect does not reach significance in this sample. Action: Monitor this relationship with a larger sample; the direction supports investing in early-stage distributor loyalty programmes as a preventive measure.

Key model insight: Competitor Influence Score and Business Type (Reseller) are the only two statistically significant predictors of price sensitivity. This narrows the commercial priority sharply: the most effective intervention is reducing competitor influence among Resellers, who are both the most price-sensitive segment and the most exposed to competitor activity.

Model diagnostics: VIF values for all predictors are below 2.0, confirming no multicollinearity concern. The Residuals vs Fitted and Q-Q diagnostic plots should be inspected for linearity and normality of residuals respectively.

10. Integrated Findings

The five analyses converge on a single, coherent commercial picture:

The TotalEnergies lubricants distributor base is moderately to highly price-sensitive, and price sensitivity is the primary gateway through which competitor activity converts into brand switching and potential revenue loss.

EDA established that all four behavioural constructs cluster above the scale midpoint, confirming that the distributor base is commercially active, price-aware, and competitively engaged. Resellers form the larger and more price-sensitive segment.
Visualisation revealed a clear positive linear relationship between price sensitivity and brand switching — the central commercial risk — and identified that mid-tenure distributors face the broadest exposure to competitor influence.
Hypothesis testing confirmed that Resellers and Distributors differ significantly in price sensitivity, and that brand-switching tendency varies across distributor tenure cohorts, validating the need for differentiated customer engagement strategies.
Correlation analysis demonstrated that competitor influence is the primary correlate of brand switching, and that the bivariate relationship between price sensitivity and brand switching is largely mediated by competitor influence — confirmed by the non-significant partial correlation (r = 0.041, p = 0.686) once competitor influence is controlled. This reframes the commercial problem: it is competitor exposure, not price sensitivity alone, that triggers switching.
Regression quantified that only two predictors independently drive price sensitivity: Competitor Influence Score (β = +0.517, p < 0.001) and Business Type — Reseller (β = +0.493, p = 0.005). Brand switching, purchase volume, and tenure do not reach significance in the multivariate model, reinforcing that competitor influence is the central lever to manage.

Unified Recommendation: TotalEnergies Marketing Nigeria PLC should implement a Risk-Stratified Distributor Retention Programme with three components: 1. Segment by risk score: Combine Price Sensitivity, Competitor Influence, and Brand Switching scores into a composite churn-risk index. Flag distributors in the top quartile for immediate account manager attention. 2. Differentiate by business type: Resellers need promotional price visibility and frequency-based loyalty incentives. Distributors respond better to volume-tier rebates, credit flexibility, and supply reliability guarantees. 3. Invest in tenure development: New distributors (under 3 years) exhibit the widest variance in competitor susceptibility. Structured onboarding, dedicated account management, and early-stage loyalty incentives should be deployed to compress this vulnerability window.

11. Limitations & Further Work

Limitations:

Self-report bias: All variables are based on self-reported Likert responses. Respondents may over- or under-state price sensitivity and brand-switching behaviour, particularly if they perceive social desirability pressure toward loyalty.
Cross-sectional design: The survey captures behaviour at a single point in time (Q1 2026). Seasonal effects — post-holiday demand recovery, budget cycles — may inflate or deflate the measured constructs relative to other quarters.
Sample representativeness: Convenience and purposive sampling restricts the generalisability of findings to the Rivers State territory. Distributors in other regions (Lagos, Kano, Abuja) may exhibit different price sensitivity and switching profiles.
Absence of transaction data: All constructs are perceptual. Actual SAP transaction data showing real price elasticity, order frequency, and revenue per distributor would provide harder evidence for the relationships identified here.
No time dimension: The study lacks a time-series component, making it impossible to observe whether price sensitivity has increased or decreased over time, or to forecast future purchase behaviour.

Further Work:

Future studies should integrate SAP/ERP transaction records to compute objective price elasticity coefficients and real purchase volumes. Longitudinal survey designs (quarterly panels) would capture behavioural change over time. Machine learning classification models (Random Forest, XGBoost) could be applied to predict brand-switching probability at the individual distributor level using a richer feature set. Geographic expansion of the sample frame to multiple TotalEnergies territories would improve external validity. Additionally, incorporating macroeconomic covariates — crude oil prices, exchange rate fluctuations, and inflation indices — would contextualise the observed price sensitivity within Nigeria’s broader economic environment.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). Sage Publications.

Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis (8th ed.). Cengage Learning.

Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to linear regression analysis (6th ed.). Wiley.

Onyenwosa, V. I. (2026). Lubricants distributor purchase behaviour survey dataset [Dataset]. Collected from TotalEnergies Marketing Nigeria PLC sales territory, Rivers State, Nigeria. Data available on request from the author.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a

Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with structuring the Quarto document template, writing initial R and Python code scaffolds for the five analytical techniques, and formatting the reference list. All analytical decisions — including the selection of Spearman over Pearson correlation for Likert data, the choice of Welch’s t-test over Student’s t-test due to unequal group sizes, the decision to construct composite scores from thematic item clusters, and the interpretation of all statistical outputs in the context of TotalEnergies Marketing Nigeria PLC’s commercial operations — were made independently by the author based on knowledge acquired through the Data Analytics II course. The business recommendations reflect the author’s direct professional experience as a sales representative in the Nigerian lubricants distribution industry and were not generated by AI tools.