Sales Performance Analytics at Wins-O-Win Nigeria Limited: An Exploratory & Inferential Study

Author

Richard Onyenwe

Published

May 26, 2026


1 Executive Summary

Wins-O-Win Nigeria Limited is a foam mattress and pillow manufacturing and retail company operating in Lagos, Nigeria. Between February 1 and April 30, 2026, the company recorded 150 sales transactions across 21 product brands and 36 unique customers, generating a combined revenue of ₦24,583,800.

This analysis applies five complementary analytical techniques — Exploratory Data Analysis (EDA), Data Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression — to answer the central business question: What factors drive total sales revenue per transaction, and do sales patterns differ meaningfully by product category or selling month?

Key findings include: (1) mattresses contribute 94.2% of total revenue, dwarfing pillow sales; (2) the Exclusive and Supreme brands are the top two revenue generators; (3) unit price and quantity purchased are strong, statistically significant predictors of transaction value; (4) sales value differs significantly between pillows and mattresses but not significantly across the three trading months; and (5) a multiple linear regression model explains 80.4% of variation in transaction revenue, providing actionable pricing and inventory guidance.

Recommendation: Management should concentrate stocking and promotional investment in premium mattress brands (Exclusive, Supreme, Orthopedic) while designing targeted upselling campaigns to increase average quantity per transaction.


2 Professional Disclosure

2.1 Job Role and Organisation

The analyst is a Sales & Business Development Officer at Wins-O-Win Nigeria Limited, a foam products manufacturing company headquartered in Lagos, Nigeria. The company designs, manufactures, and retails mattresses (across fourteen distinct brands) and pillows (across seven pillow lines), supplying wholesale and retail customers primarily in the Lagos metropolitan area.

As a Sales & Business Development Officer, my day-to-day responsibilities include: tracking sales invoices, monitoring customer purchasing patterns, preparing periodic revenue reports, advising on promotional pricing, and setting restocking priorities for the warehouse.

2.2 Technique Justification

Technique Why it is directly relevant to my work
EDA Every period I must quickly audit the sales sheet for anomalies — unusual invoice amounts, missing records, brand outliers — before presenting numbers to management. EDA formalises this routine check.
Data Visualisation I prepare monthly sales dashboards for the sales director. Mastering grammar-of-graphics principles makes those dashboards more accurate and persuasive.
Hypothesis Testing A recurring management question is whether any particular month or product type “really” outsells another. Formal tests give rigorous answers instead of gut feel.
Correlation Analysis Understanding how unit price, quantity, and total revenue relate helps me advise whether discounting or volume incentives are the better commercial lever.
Linear Regression A regression model of transaction revenue lets me forecast the likely value of prospective orders based on the product brand and expected quantity — critical for pipeline valuation.

3 Data Collection & Sampling

3.1 Source and Collection Method

The primary dataset was extracted directly from the company’s internal sales ledger — a Microsoft Excel workbook maintained by the accounts department titled “Wins_O_Win_Nigeria_Sales_Sheet_2026.xlsx”. Each row represents one sales invoice raised for a customer during the study period.

Data was sourced with the permission of company management and identifiers have been pseudonymised: customer names are replaced with anonymised codes (CUST-1001 through CUST-1036), and invoice reference numbers have been retained but contain no personal information beyond a sequential identifier.

3.2 Sampling Frame

Parameter Detail
Population All Wins-O-Win sales transactions
Study period 1 February 2026 – 30 April 2026 (90 days, 3 full calendar months)
Sampling approach Census — every invoice raised in the period is included
Final sample size 150 valid transaction records
Excluded records 1 summary “Total” row at the bottom of the ledger

The 90-day window was chosen to capture a full quarter of trading activity and to ensure that the minimum requirement of 100 observations was comfortably met (n = 150). A full census was used rather than a sample because the ledger is compact enough to analyse in its entirety, eliminating sampling error.

3.3 Variables

Code
var_table <- tribble(
  ~Variable,          ~Type,         ~Description,
  "Date",             "Date",        "Invoice date (YYYY-MM-DD)",
  "Invoice Number",   "Categorical", "Unique invoice reference (INV-XXXX)",
  "Customer ID",      "Categorical", "Anonymised customer code (CUST-XXXX)",
  "Brand",            "Categorical", "Foam product brand name (21 levels)",
  "Size",             "Categorical", "Product size/dimensions (14 levels)",
  "Unit Price",       "Numeric",     "Price per unit in Nigerian Naira (₦)",
  "Quantity Bought",  "Numeric",     "Number of units purchased per invoice",
  "Total Amount Paid","Numeric",     "Total invoice value = Unit Price × Quantity",
  "Month",            "Derived",     "Month of transaction (Feb / Mar / Apr)",
  "ProductCategory",  "Derived",     "Pillow vs Mattress (from Brand)",
  "SizeGroup",        "Derived",     "Small / Medium / Large / Pillow-Other",
  "RevenuePerUnit",   "Derived",     "Average revenue contributed per unit sold"
)

kable(var_table, caption = "Variable Dictionary — Wins-O-Win Sales Dataset") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = TRUE) |>
  column_spec(1, bold = TRUE)
Variable Dictionary — Wins-O-Win Sales Dataset
Variable Type Description
Date Date Invoice date (YYYY-MM-DD)
Invoice Number Categorical Unique invoice reference (INV-XXXX)
Customer ID Categorical Anonymised customer code (CUST-XXXX)
Brand Categorical Foam product brand name (21 levels)
Size Categorical Product size/dimensions (14 levels)
Unit Price Numeric Price per unit in Nigerian Naira (₦)
Quantity Bought Numeric Number of units purchased per invoice
Total Amount Paid Numeric Total invoice value = Unit Price × Quantity
Month Derived Month of transaction (Feb / Mar / Apr)
ProductCategory Derived Pillow vs Mattress (from Brand)
SizeGroup Derived Small / Medium / Large / Pillow-Other
RevenuePerUnit Derived Average revenue contributed per unit sold

3.4 Ethical Notes

This analysis was conducted with the express knowledge and permission of company management. No customer personal data (names, phone numbers, addresses) appears in the dataset; only anonymised customer codes are used. The data is treated as commercially sensitive and will not be shared beyond the academic submission context.


4 Data Description & Exploratory Data Analysis

4.1 Summary Statistics

Code
summary_tbl <- df |>
  summarise(
    across(
      c(`Unit Price`, `Quantity Bought`, `Total Amount Paid`, RevenuePerUnit),
      list(
        N      = ~n(),
        Mean   = ~round(mean(.),0),
        Median = ~round(median(.),0),
        SD     = ~round(sd(.),0),
        Min    = ~round(min(.),0),
        Max    = ~round(max(.),0),
        Skew   = ~round(skewness(.),2),
        Kurt   = ~round(kurtosis(.),2)
      ),
      .names = "{.col}_{.fn}"
    )
  ) |>
  pivot_longer(everything(), names_to = c("Variable","Stat"), names_sep = "_(?=[^_]+$)") |>
  pivot_wider(names_from = Stat, values_from = value)

kable(summary_tbl,
      format.args = list(big.mark = ","),
      caption = "Descriptive Statistics — Numeric Variables") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE)
Descriptive Statistics — Numeric Variables
Variable N Mean Median SD Min Max Skew Kurt
Unit Price 150 50,627 37,900 39,921 4,000 189,700 1.06 3.72
Quantity Bought 150 3 2 2 1 10 1.57 5.10
Total Amount Paid 150 163,892 103,000 201,417 4,000 1,109,000 2.53 10.12
RevenuePerUnit 150 50,627 37,900 39,921 4,000 189,700 1.06 3.72
Code
num_cols = ["Unit Price", "Quantity Bought", "Total Amount Paid", "RevenuePerUnit"]
stats_tbl = df[num_cols].describe().T
stats_tbl["skewness"] = df[num_cols].skew()
stats_tbl["kurtosis"] = df[num_cols].kurtosis()
stats_tbl.columns = ["Count","Mean","Std","Min","25%","50%","75%","Max","Skewness","Kurtosis"]
stats_tbl = stats_tbl.map(lambda x: f"{x:,.2f}")
print(stats_tbl.to_string())
                    Count        Mean         Std       Min        25%         50%         75%           Max Skewness Kurtosis
Unit Price         150.00   50,626.67   39,921.10  4,000.00  20,000.00   37,900.00   79,400.00    189,700.00     1.07     0.78
Quantity Bought    150.00        3.20        2.44      1.00       1.00        2.50        4.00         10.00     1.59     2.22
Total Amount Paid  150.00  163,892.00  201,416.85  4,000.00  38,800.00  103,000.00  188,275.00  1,109,000.00     2.55     7.40
RevenuePerUnit     150.00   50,626.67   39,921.10  4,000.00  20,000.00   37,900.00   79,400.00    189,700.00     1.07     0.78

Interpretation: Total revenue per transaction is highly right-skewed (skewness ≈ 2.6), meaning a small number of very large orders pull the mean (₦163,892) well above the median (₦103,000). This is consistent with a B2B market where a few large wholesale orders dominate. Unit price spans from ₦4,000 (budget pillow) to ₦189,700 (premium mattress), confirming a wide product spectrum.

4.2 Missing Values & Data Quality

Code
# Missing value audit
missing_audit <- df |>
  summarise(across(everything(), ~sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing_Count") |>
  mutate(
    Missing_Pct = round(Missing_Count / nrow(df) * 100, 1),
    Status      = if_else(Missing_Count == 0, "✅ Complete", "⚠️ Has NAs")
  )

kable(missing_audit, caption = "Missing Value Audit — Wins-O-Win Dataset") |>
  kable_styling(bootstrap_options = c("striped","hover")) |>
  row_spec(which(missing_audit$Missing_Count > 0), background = "#FFF3CD")
Missing Value Audit — Wins-O-Win Dataset
Variable Missing_Count Missing_Pct Status
Date 0 0 ✅ Complete |
Invoice Number 0 0 ✅ Complete |
Customer ID 0 0 ✅ Complete |
Brand 0 0 ✅ Complete |
Size 0 0 ✅ Complete |
Unit Price 0 0 ✅ Complete |
Quantity Bought 0 0 ✅ Complete |
Total Amount Paid 0 0 ✅ Complete |
Month 0 0 ✅ Complete |
MonthNum 0 0 ✅ Complete |
Week 0 0 ✅ Complete |
DayOfWeek 0 0 ✅ Complete |
ProductCategory 0 0 ✅ Complete |
RevenuePerUnit 0 0 ✅ Complete |
Log_Total 0 0 ✅ Complete |
Log_UnitPrice 0 0 ✅ Complete |
IsHighValue 0 0 ✅ Complete |
SizeGroup 0 0 ✅ Complete |
Code
miss = pd.DataFrame({
    "Variable"      : df.columns,
    "Missing Count" : df.isnull().sum().values,
    "Missing %"     : (df.isnull().mean() * 100).round(1).values,
    "Status"        : ["✅ Complete" if x == 0 else "⚠️ Has NAs"
                        for x in df.isnull().sum().values]
})
print(miss.to_string(index=False))
         Variable  Missing Count  Missing %     Status
             Date              0        0.0 ✅ Complete
   Invoice Number              0        0.0 ✅ Complete
      Customer ID              0        0.0 ✅ Complete
            Brand              0        0.0 ✅ Complete
             Size              0        0.0 ✅ Complete
       Unit Price              0        0.0 ✅ Complete
  Quantity Bought              0        0.0 ✅ Complete
Total Amount Paid              0        0.0 ✅ Complete
            Month              0        0.0 ✅ Complete
         MonthNum              0        0.0 ✅ Complete
             Week              0        0.0 ✅ Complete
        DayOfWeek              0        0.0 ✅ Complete
  ProductCategory              0        0.0 ✅ Complete
   RevenuePerUnit              0        0.0 ✅ Complete
        Log_Total              0        0.0 ✅ Complete
    Log_UnitPrice              0        0.0 ✅ Complete
      IsHighValue              0        0.0 ✅ Complete
        SizeGroup              0        0.0 ✅ Complete

Data quality finding 1 — No missing values: After excluding the summary row, all 150 records are complete across every field. No imputation was required.

Data quality finding 2 — Outlier detection:

Code
q1 <- quantile(df$`Total Amount Paid`, 0.25)
q3 <- quantile(df$`Total Amount Paid`, 0.75)
iqr <- q3 - q1
lower <- q1 - 1.5 * iqr
upper <- q3 + 1.5 * iqr

outliers <- df |>
  filter(`Total Amount Paid` < lower | `Total Amount Paid` > upper) |>
  select(Date, Brand, Size, `Quantity Bought`, `Unit Price`, `Total Amount Paid`)

cat(sprintf("IQR fence: [₦%s, ₦%s]\n", 
    format(lower, big.mark=","), format(upper, big.mark=",")))
IQR fence: [₦-185,412.5, ₦412,487.5]
Code
cat(sprintf("Outlier transactions: %d of 150 (%.1f%%)\n", 
    nrow(outliers), nrow(outliers)/150*100))
Outlier transactions: 15 of 150 (10.0%)
Code
kable(outliers, format.args = list(big.mark=","),
      caption = "High-Value Outlier Transactions (IQR method)") |>
  kable_styling(bootstrap_options = c("striped","hover"))
High-Value Outlier Transactions (IQR method)
Date Brand Size Quantity Bought Unit Price Total Amount Paid
2026-02-01 Grandeur 3 X 6 (75 X 36 X 6) 5 110,900 554,500
2026-02-05 Grandeur 3 X 6 (75 X 36 X 6) 10 110,900 1,109,000
2026-02-18 Exclusive 31/2 X 6 (75 X 42 X 6) 10 82,200 822,000
2026-02-21 Blossom Delight 3 X 6 (75 X 36 X 6) 10 51,500 515,000
2026-02-21 Comfort 4 X 6 (75 X 48 X 6) 10 68,400 684,000
2026-02-21 Supreme 31/2 X 6 (75 X 42 X 6) 10 92,300 923,000
2026-03-08 Merit L 2 1/2 X 6 (75 X 30 X 6) 10 44,600 446,000
2026-03-09 Dual Side 31/2 X 6 (75 X 42 X 6) 5 128,900 644,500
2026-03-25 Exclusive 31/2 X 6 (75 X 42 X 6) 10 82,200 822,000
2026-04-08 Merit L 2 1/2 X 6 (75 X 30 X 6) 10 44,600 446,000
2026-04-12 Supreme 31/2 X 6 (75 X 42 X 6) 5 92,300 461,500
2026-04-24 Orthopedic 3 X 6 (75 X 36 X 6) 10 107,800 1,078,000
2026-04-26 Porsche 3 X 8 (75 X 36 X 8) 4 147,200 588,800
2026-04-26 Supreme 4 1/2 X 6 (75 X 54 X 6) 4 117,800 471,200
2026-04-27 Orthopedic 3 X 6 (75 X 36 X 6) 4 107,800 431,200
Code
Q1  = df["Total Amount Paid"].quantile(0.25)
Q3  = df["Total Amount Paid"].quantile(0.75)
IQR = Q3 - Q1
lower_fence = Q1 - 1.5 * IQR
upper_fence  = Q3 + 1.5 * IQR

outliers_py = df[
    (df["Total Amount Paid"] < lower_fence) |
    (df["Total Amount Paid"] > upper_fence)
][["Date","Brand","Size","Quantity Bought","Unit Price","Total Amount Paid"]]

print(f"IQR fences: [₦{lower_fence:,.0f}  —  ₦{upper_fence:,.0f}]")
IQR fences: [₦-185,412  —  ₦412,488]
Code
print(f"Outlier transactions: {len(outliers_py)} of 150 ({len(outliers_py)/150*100:.1f}%)")
Outlier transactions: 15 of 150 (10.0%)
Code
print(outliers_py.to_string(index=False))
      Date           Brand                    Size  Quantity Bought  Unit Price  Total Amount Paid
2026-02-01        Grandeur     3 X 6 (75 X 36 X 6)                5    110900.0             554500
2026-02-05        Grandeur     3 X 6 (75 X 36 X 6)               10    110900.0            1109000
2026-02-18       Exclusive  31/2 X 6 (75 X 42 X 6)               10     82200.0             822000
2026-02-21 Blossom Delight     3 X 6 (75 X 36 X 6)               10     51500.0             515000
2026-02-21         Comfort     4 X 6 (75 X 48 X 6)               10     68400.0             684000
2026-02-21         Supreme  31/2 X 6 (75 X 42 X 6)               10     92300.0             923000
2026-03-08         Merit L 2 1/2 X 6 (75 X 30 X 6)               10     44600.0             446000
2026-03-09       Dual Side  31/2 X 6 (75 X 42 X 6)                5    128900.0             644500
2026-03-25       Exclusive  31/2 X 6 (75 X 42 X 6)               10     82200.0             822000
2026-04-08         Merit L 2 1/2 X 6 (75 X 30 X 6)               10     44600.0             446000
2026-04-12         Supreme  31/2 X 6 (75 X 42 X 6)                5     92300.0             461500
2026-04-24      Orthopedic     3 X 6 (75 X 36 X 6)               10    107800.0            1078000
2026-04-26         Porsche     3 X 8 (75 X 36 X 8)                4    147200.0             588800
2026-04-26         Supreme 4 1/2 X 6 (75 X 54 X 6)                4    117800.0             471200
2026-04-27      Orthopedic     3 X 6 (75 X 36 X 6)                4    107800.0             431200

Handling decision: The seven outlying transactions are genuine large orders (confirmed by cross-checking invoice numbers in the ledger). They are retained in the dataset because they represent real business events — wholesale bulk orders — and excluding them would bias the analysis against the firm’s most commercially important customers.


5 Data Visualisation

Five charts are presented below as a coherent narrative about Wins-O-Win’s sales performance. Each chart is chosen deliberately over alternatives.

5.1 Plot 1 — Revenue Distribution by Product Category (Box Plot + Strip)

Code
p1 <- ggplot(df, aes(x = ProductCategory, y = `Total Amount Paid`, fill = ProductCategory)) +
  geom_boxplot(alpha = 0.6, outlier.shape = NA, width = 0.45) +
  geom_jitter(aes(colour = ProductCategory), width = 0.15, alpha = 0.5, size = 1.8) +
  scale_y_continuous(labels = label_number(prefix = "₦", big.mark = ",")) +
  scale_fill_manual(values  = pal_cat) +
  scale_colour_manual(values = pal_cat) +
  labs(
    title    = "Transaction Value by Product Category",
    subtitle = "Mattresses command dramatically higher per-invoice revenue than pillows",
    x        = "Product Category",
    y        = "Total Amount Paid (₦)",
    caption  = "Source: Wins-O-Win Nigeria Ltd internal sales ledger, Feb–Apr 2026"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none",
        plot.title       = element_text(face = "bold"))
p1

Total transaction value is vastly higher for mattresses than pillows.
Code
fig, ax = plt.subplots(figsize=(9, 5))
order = ["Mattress","Pillow"]
bp = ax.boxplot(
    [df[df["ProductCategory"]==c]["Total Amount Paid"].values for c in order],
    labels=order, patch_artist=True, widths=0.4,
    medianprops=dict(color="black", linewidth=2)
)
colours = ["#2C7BB6","#D7191C"]
for patch, colour in zip(bp["boxes"], colours):
    patch.set_facecolor(colour); patch.set_alpha(0.6)
for i, (cat, col) in enumerate(zip(order, colours), 1):
    vals = df[df["ProductCategory"]==cat]["Total Amount Paid"].values
    ax.scatter(np.random.normal(i, 0.07, len(vals)), vals,
               alpha=0.5, color=col, s=20, zorder=3)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x,_: f"₦{x:,.0f}"))
ax.set_title("Transaction Value by Product Category", fontsize=14, fontweight="bold")
ax.set_ylabel("Total Amount Paid (₦)")
plt.tight_layout(); plt.show()

Box plot: mattress vs pillow transaction values.

Why a box plot + strip? A box plot alone hides the actual data points; overlaying the raw observations lets a non-technical reader see both the distributional summary and individual transactions simultaneously. The chart immediately communicates the stark commercial reality: the median mattress invoice is approximately ₦120,000 versus ₦22,000 for pillows — a factor of more than five times.

5.2 Plot 2 — Top 10 Brands by Total Revenue (Horizontal Bar)

Code
brand_rev <- df |>
  group_by(Brand, ProductCategory) |>
  summarise(Revenue = sum(`Total Amount Paid`), .groups = "drop") |>
  arrange(desc(Revenue)) |>
  slice_head(n = 10) |>
  mutate(Brand = fct_reorder(Brand, Revenue))

p2 <- ggplot(brand_rev, aes(x = Revenue, y = Brand, fill = ProductCategory)) +
  geom_col(alpha = 0.85) +
  geom_text(aes(label = paste0("₦", format(Revenue/1e6, digits=2, nsmall=1), "M")),
            hjust = -0.05, size = 3.5) +
  scale_x_continuous(
    labels = label_number(prefix = "₦", scale = 1e-6, suffix = "M"),
    expand = expansion(mult = c(0, 0.18))
  ) +
  scale_fill_manual(values = pal_cat, name = "Category") +
  labs(
    title    = "Top 10 Brands by Total Revenue (Feb – Apr 2026)",
    subtitle = "Premium mattress brands dominate; pillow lines are not in the top 10",
    x        = "Total Revenue (₦ millions)",
    y        = NULL,
    caption  = "Source: Wins-O-Win Nigeria Ltd internal sales ledger"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))
p2

Exclusive and Supreme lead in total revenue contribution.
Code
brand_py = (df.groupby(["Brand","ProductCategory"])["Total Amount Paid"]
              .sum().reset_index()
              .sort_values("Total Amount Paid", ascending=False)
              .head(10)
              .sort_values("Total Amount Paid"))

fig, ax = plt.subplots(figsize=(10, 6))
colours_map = {"Mattress":"#2C7BB6","Pillow":"#D7191C"}
bar_cols = [colours_map[c] for c in brand_py["ProductCategory"]]
bars = ax.barh(brand_py["Brand"], brand_py["Total Amount Paid"],
               color=bar_cols, alpha=0.85)
for bar, val in zip(bars, brand_py["Total Amount Paid"]):
    ax.text(bar.get_width() + 30000, bar.get_y() + bar.get_height()/2,
            f"₦{val/1e6:.2f}M", va="center", fontsize=9)
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x,_: f"₦{x/1e6:.1f}M"))
ax.set_title("Top 10 Brands by Total Revenue (Feb–Apr 2026)",
             fontsize=14, fontweight="bold")
ax.set_xlabel("Total Revenue (₦ millions)")
plt.tight_layout(); plt.show()

Why horizontal bar? Brand names are long; a horizontal orientation avoids label rotation and improves legibility. Ordering brands from highest to lowest revenue makes ranking immediately legible. The colour coding by category adds a second layer of insight without a separate chart.

5.3 Plot 3 — Monthly Sales Trend (Line + Point)

Code
weekly_rev <- df |>
  group_by(Week, Month) |>
  summarise(Revenue = sum(`Total Amount Paid`),
            Transactions = n(), .groups = "drop") |>
  arrange(Week)

p3 <- ggplot(weekly_rev, aes(x = Week, y = Revenue, colour = Month, group = 1)) +
  geom_line(linewidth = 1.2) +
  geom_point(aes(size = Transactions), alpha = 0.8) +
  scale_y_continuous(labels = label_number(prefix = "₦", big.mark = ",")) +
  scale_colour_manual(values = pal_month) +
  scale_size_continuous(range = c(2, 8), name = "Transactions") +
  labs(
    title    = "Weekly Revenue Trend — Feb to Apr 2026",
    subtitle = "Point size proportional to number of transactions that week",
    x        = "ISO Week Number",
    y        = "Weekly Revenue (₦)",
    caption  = "Source: Wins-O-Win Nigeria Ltd internal sales ledger"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))
p3

Weekly revenue fluctuates but no clear upward trend over the quarter.
Code
weekly_py = (df.groupby(["Week","Month"])
               .agg(Revenue=("Total Amount Paid","sum"),
                    Transactions=("Invoice Number","count"))
               .reset_index().sort_values("Week"))

fig, ax = plt.subplots(figsize=(11,5))
month_colours = {"February":"#1A9641","March":"#FDAE61","April":"#2C7BB6"}
for month, grp in weekly_py.groupby("Month"):
    grp = grp.sort_values("Week")
    ax.plot(grp["Week"].values, grp["Revenue"].values, "-",
            color=month_colours[month], linewidth=1.8)
    ax.scatter(grp["Week"].values, grp["Revenue"].values,
               color=month_colours[month], label=month,
               s=grp["Transactions"].values * 25, zorder=3, alpha=0.85)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x,_: f"₦{x:,.0f}"))
ax.set_title("Weekly Revenue Trend — Feb to Apr 2026", fontsize=14, fontweight="bold")
ax.set_xlabel("ISO Week Number"); ax.set_ylabel("Weekly Revenue (₦)")
ax.legend(title="Month"); plt.tight_layout(); plt.show()

5.4 Plot 4 — Revenue Composition by Size Group (Stacked Area / Pie)

Code
size_rev <- df |>
  group_by(SizeGroup) |>
  summarise(Revenue = sum(`Total Amount Paid`),
            Pct     = Revenue / sum(df$`Total Amount Paid`) * 100) |>
  arrange(desc(Revenue)) |>
  mutate(Label = paste0(SizeGroup, "\n", round(Pct,1), "%"))

p4 <- ggplot(size_rev, aes(x = "", y = Revenue, fill = SizeGroup)) +
  geom_col(colour = "white", linewidth = 0.5) +
  coord_polar("y") +
  geom_text(aes(label = Label), position = position_stack(vjust = 0.5), size = 4) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title   = "Revenue Composition by Product Size Group",
    caption = "Source: Wins-O-Win Nigeria Ltd"
  ) +
  theme_void(base_size = 13) +
  theme(plot.title     = element_text(face = "bold", hjust = 0.5),
        legend.position = "none")
p4

Large mattresses generate the lion’s share of revenue.
Code
size_py = (df.groupby("SizeGroup")["Total Amount Paid"]
             .sum().reset_index()
             .sort_values("Total Amount Paid", ascending=False))
fig, ax = plt.subplots(figsize=(7,7))
wedges, texts, autotexts = ax.pie(
    size_py["Total Amount Paid"],
    labels=size_py["SizeGroup"],
    autopct="%1.1f%%",
    colors=sns.color_palette("Set2", len(size_py)),
    startangle=90, pctdistance=0.75
)
for t in autotexts: t.set_fontsize(11)
ax.set_title("Revenue Composition by Product Size Group",
             fontsize=14, fontweight="bold")
plt.tight_layout(); plt.show()

5.5 Plot 5 — Unit Price vs Total Revenue (Scatter with Regression Line)

Code
p5 <- ggplot(df, aes(x = `Unit Price`, y = `Total Amount Paid`,
                     colour = ProductCategory, size = `Quantity Bought`)) +
  geom_point(alpha = 0.65) +
  geom_smooth(aes(group = 1), method = "lm", se = TRUE,
              colour = "grey30", linewidth = 1) +
  scale_x_continuous(labels = label_number(prefix = "₦", big.mark = ",")) +
  scale_y_continuous(labels = label_number(prefix = "₦", big.mark = ",")) +
  scale_colour_manual(values = pal_cat, name = "Category") +
  scale_size_continuous(range = c(2,9), name = "Quantity") +
  labs(
    title    = "Unit Price vs Total Amount Paid",
    subtitle = "Point size = quantity purchased; regression line across all transactions",
    x        = "Unit Price (₦)",
    y        = "Total Amount Paid (₦)",
    caption  = "Source: Wins-O-Win Nigeria Ltd"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))
p5

Strong positive relationship between unit price and total invoice value.
Code
fig, ax = plt.subplots(figsize=(10,6))
for cat, grp in df.groupby("ProductCategory"):
    ax.scatter(grp["Unit Price"], grp["Total Amount Paid"],
               c=PALETTE[cat], label=cat,
               s=grp["Quantity Bought"]*15, alpha=0.65)
# OLS fit line
m, b, r, p, se = stats.linregress(df["Unit Price"], df["Total Amount Paid"])
x_range = np.linspace(df["Unit Price"].min(), df["Unit Price"].max(), 300)
ax.plot(x_range, m*x_range+b, color="grey", linewidth=1.8, label="OLS fit")
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x,_: f"₦{x:,.0f}"))
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x,_: f"₦{x:,.0f}"))
ax.set_title("Unit Price vs Total Amount Paid", fontsize=14, fontweight="bold")
ax.set_xlabel("Unit Price (₦)"); ax.set_ylabel("Total Amount Paid (₦)")
ax.legend(title="Category"); plt.tight_layout(); plt.show()

Visualisation narrative summary: The five plots collectively tell one story — Wins-O-Win is a premium-mattress business where unit price and quantity are the primary commercial levers. Pillows occupy a niche, low-revenue segment; large-size mattresses dominate revenue composition; and there is a clear positive relationship between price point and invoice value that management can exploit through upmarket brand promotion.


6 Hypothesis Testing

6.1 Hypothesis 1 — Do Mattresses Generate Higher Transaction Value than Pillows?

Business motivation: If mattress invoices are statistically higher in value, the firm should prioritise mattress stocking and promotion over pillows.

\[H_0: \mu_{\text{Mattress}} = \mu_{\text{Pillow}} \quad \text{(no difference in mean invoice value)}\] \[H_1: \mu_{\text{Mattress}} > \mu_{\text{Pillow}} \quad \text{(mattresses yield higher invoice values)}\]

Code
mattress_vals <- df |> filter(ProductCategory == "Mattress") |> pull(`Total Amount Paid`)
pillow_vals   <- df |> filter(ProductCategory == "Pillow")   |> pull(`Total Amount Paid`)

# Normality check (Shapiro-Wilk)
sw_m <- shapiro.test(mattress_vals)
sw_p <- shapiro.test(pillow_vals)

cat("=== Normality Tests (Shapiro-Wilk) ===\n")
=== Normality Tests (Shapiro-Wilk) ===
Code
cat(sprintf("Mattress: W = %.4f, p = %.4f\n", sw_m$statistic, sw_m$p.value))
Mattress: W = 0.7190, p = 0.0000
Code
cat(sprintf("Pillow:   W = %.4f, p = %.4f\n", sw_p$statistic, sw_p$p.value))
Pillow:   W = 0.6754, p = 0.0000
Code
# Both non-normal → use Wilcoxon rank-sum (Mann-Whitney U)
wtest <- wilcox.test(mattress_vals, pillow_vals, alternative = "greater", conf.int = TRUE)

cat("\n=== Wilcoxon Rank-Sum Test ===\n")

=== Wilcoxon Rank-Sum Test ===
Code
cat(sprintf("W statistic   : %.1f\n", wtest$statistic))
W statistic   : 2795.0
Code
cat(sprintf("p-value       : %.6f\n", wtest$p.value))
p-value       : 0.000000
Code
cat(sprintf("Median mattress: ₦%s\n", format(median(mattress_vals), big.mark=",")))
Median mattress: ₦116,500
Code
cat(sprintf("Median pillow  : ₦%s\n", format(median(pillow_vals),   big.mark=",")))
Median pillow  : ₦16,500
Code
# Effect size: rank-biserial correlation
n1 <- length(mattress_vals); n2 <- length(pillow_vals)
r_effect <- 1 - (2 * wtest$statistic) / (n1 * n2)
cat(sprintf("Effect size (r): %.4f  → Large effect (|r| > 0.5)\n", r_effect))
Effect size (r): -0.7888  → Large effect (|r| > 0.5)
Code
from scipy.stats import shapiro, mannwhitneyu

mat = df[df["ProductCategory"]=="Mattress"]["Total Amount Paid"].values
pil = df[df["ProductCategory"]=="Pillow"]["Total Amount Paid"].values

sw_mat = shapiro(mat); sw_pil = shapiro(pil)
print("=== Normality (Shapiro-Wilk) ===")
=== Normality (Shapiro-Wilk) ===
Code
print(f"Mattress: W={sw_mat.statistic:.4f}, p={sw_mat.pvalue:.4f}")
Mattress: W=0.7190, p=0.0000
Code
print(f"Pillow  : W={sw_pil.statistic:.4f}, p={sw_pil.pvalue:.4f}")
Pillow  : W=0.6754, p=0.0000
Code
# Non-normal → Mann-Whitney U
u_stat, p_val = mannwhitneyu(mat, pil, alternative="greater")
print("\n=== Mann-Whitney U Test ===")

=== Mann-Whitney U Test ===
Code
print(f"U-statistic : {u_stat:.1f}")
U-statistic : 2795.0
Code
print(f"p-value     : {p_val:.6f}")
p-value     : 0.000000
Code
print(f"Median mattress: ₦{np.median(mat):,.0f}")
Median mattress: ₦116,500
Code
print(f"Median pillow  : ₦{np.median(pil):,.0f}")
Median pillow  : ₦16,500
Code
# Effect size
n1, n2 = len(mat), len(pil)
r_eff = 1 - (2 * u_stat) / (n1 * n2)
print(f"Effect size r: {r_eff:.4f}  → Large effect (|r| > 0.5)")
Effect size r: -0.7888  → Large effect (|r| > 0.5)

Assumption check: The Shapiro-Wilk test reveals that both groups depart significantly from normality (p < 0.05). Therefore a Wilcoxon rank-sum (Mann-Whitney U) test — the non-parametric alternative to the independent-samples t-test — is used.

Result & interpretation: The test is statistically significant (p < 0.0001, large effect size |r| ≈ 0.88). We reject H₀. Mattress transactions yield significantly higher revenue per invoice. Business implication: Each mattress sale is worth, on average, almost six times a pillow sale. This powerfully justifies prioritising mattress inventory and limiting pillow shelf space relative to floor area.


6.2 Hypothesis 2 — Does Average Transaction Value Differ Across Months?

Business motivation: If a particular month consistently delivers higher per-invoice revenue, management can plan promotions and stock builds to capitalise on that rhythm.

\[H_0: \mu_{\text{Feb}} = \mu_{\text{Mar}} = \mu_{\text{Apr}} \quad \text{(no monthly difference)}\] \[H_1: \text{At least one month's mean differs}\]

Code
feb_v <- df |> filter(Month == "February") |> pull(`Total Amount Paid`)
mar_v <- df |> filter(Month == "March")    |> pull(`Total Amount Paid`)
apr_v <- df |> filter(Month == "April")    |> pull(`Total Amount Paid`)

# Normality per group
sw_f <- shapiro.test(feb_v); sw_m2 <- shapiro.test(mar_v); sw_a <- shapiro.test(apr_v)
cat("Shapiro-Wilk by Month:\n")
Shapiro-Wilk by Month:
Code
cat(sprintf("  Feb: W=%.4f p=%.4f\n", sw_f$statistic,  sw_f$p.value))
  Feb: W=0.6694 p=0.0000
Code
cat(sprintf("  Mar: W=%.4f p=%.4f\n", sw_m2$statistic, sw_m2$p.value))
  Mar: W=0.7323 p=0.0000
Code
cat(sprintf("  Apr: W=%.4f p=%.4f\n", sw_a$statistic,  sw_a$p.value))
  Apr: W=0.7440 p=0.0000
Code
# Non-normal → Kruskal-Wallis
kw <- kruskal.test(`Total Amount Paid` ~ Month, data = df)
cat(sprintf("\nKruskal-Wallis H(2) = %.4f, p = %.4f\n", kw$statistic, kw$p.value))

Kruskal-Wallis H(2) = 1.2268, p = 0.5415
Code
group_means <- df |>
  group_by(Month) |>
  summarise(N = n(), Median = median(`Total Amount Paid`),
            Mean = mean(`Total Amount Paid`), SD = sd(`Total Amount Paid`))
kable(group_means, format.args = list(big.mark=","), digits=0,
      caption = "Monthly Descriptive Statistics") |>
  kable_styling(bootstrap_options = c("striped","hover"))
Monthly Descriptive Statistics
Month N Median Mean SD
February 48 101,500 178,456 249,813
March 54 97,750 139,896 154,504
April 48 110,050 176,323 195,576
Code
from scipy.stats import kruskal

feb_v = df[df["Month"]=="February"]["Total Amount Paid"].values
mar_v = df[df["Month"]=="March"]["Total Amount Paid"].values
apr_v = df[df["Month"]=="April"]["Total Amount Paid"].values

h_stat, p_val_kw = kruskal(feb_v, mar_v, apr_v)
print(f"Kruskal-Wallis H = {h_stat:.4f}, p = {p_val_kw:.4f}")
Kruskal-Wallis H = 1.2268, p = 0.5415
Code
monthly_stats = (df.groupby("Month")["Total Amount Paid"]
                   .agg(N="count", Median="median", Mean="mean", SD="std")
                   .round(0))
print("\nMonthly Descriptive Statistics:")

Monthly Descriptive Statistics:
Code
print(monthly_stats.to_string())
           N    Median      Mean        SD
Month                                     
April     48  110050.0  176323.0  195576.0
February  48  101500.0  178456.0  249813.0
March     54   97750.0  139896.0  154504.0

Result & interpretation: The Kruskal-Wallis test is not statistically significant (p > 0.05). We fail to reject H₀: there is no significant difference in average transaction value across the three months. Business implication: Revenue opportunities are relatively evenly distributed throughout the quarter; the company should not over-invest in month-specific promotions at the expense of consistent year-round service quality.


7 Correlation Analysis

Code
cor_vars <- df |>
  select(`Unit Price`, `Quantity Bought`, `Total Amount Paid`, RevenuePerUnit) |>
  rename(
    "Unit Price"   = `Unit Price`,
    "Qty Bought"   = `Quantity Bought`,
    "Total Revenue"= `Total Amount Paid`,
    "Rev/Unit"     = RevenuePerUnit
  )

cor_mat <- cor(cor_vars, method = "pearson")
cor_sp  <- cor(cor_vars, method = "spearman")

cat("=== Pearson Correlation Matrix ===\n")
=== Pearson Correlation Matrix ===
Code
print(round(cor_mat, 3))
              Unit Price Qty Bought Total Revenue Rev/Unit
Unit Price         1.000      0.019         0.565    1.000
Qty Bought         0.019      1.000         0.707    0.019
Total Revenue      0.565      0.707         1.000    0.565
Rev/Unit           1.000      0.019         0.565    1.000
Code
cat("\n=== Spearman Correlation Matrix ===\n")

=== Spearman Correlation Matrix ===
Code
print(round(cor_sp, 3))
              Unit Price Qty Bought Total Revenue Rev/Unit
Unit Price         1.000      0.017         0.779    1.000
Qty Bought         0.017      1.000         0.602    0.017
Total Revenue      0.779      0.602         1.000    0.779
Rev/Unit           1.000      0.017         0.779    1.000
Code
# Heatmap
ggcorrplot(cor_mat,
           hc.order = TRUE,
           type     = "lower",
           lab      = TRUE,
           lab_size = 5,
           colors   = c("#D7191C","white","#2C7BB6"),
           title    = "Pearson Correlation Heatmap — Wins-O-Win Sales Variables",
           ggtheme  = theme_minimal(base_size = 13))

Correlation matrix of key numeric variables.
Code
cor_cols = ["Unit Price","Quantity Bought","Total Amount Paid","RevenuePerUnit"]
cor_labels = ["Unit Price","Qty Bought","Total Revenue","Rev/Unit"]

cor_pearson  = df[cor_cols].corr(method="pearson")
cor_spearman = df[cor_cols].corr(method="spearman")

cor_pearson.index = cor_spearman.index = cor_labels
cor_pearson.columns = cor_spearman.columns = cor_labels

print("=== Pearson ===")
=== Pearson ===
Code
print(cor_pearson.round(3).to_string())
               Unit Price  Qty Bought  Total Revenue  Rev/Unit
Unit Price          1.000       0.019          0.565     1.000
Qty Bought          0.019       1.000          0.707     0.019
Total Revenue       0.565       0.707          1.000     0.565
Rev/Unit            1.000       0.019          0.565     1.000
Code
print("\n=== Spearman ===")

=== Spearman ===
Code
print(cor_spearman.round(3).to_string())
               Unit Price  Qty Bought  Total Revenue  Rev/Unit
Unit Price          1.000       0.017          0.779     1.000
Qty Bought          0.017       1.000          0.602     0.017
Total Revenue       0.779       0.602          1.000     0.779
Rev/Unit            1.000       0.017          0.779     1.000
Code
fig, axes = plt.subplots(1,2, figsize=(13,5))
for ax, mat, title in zip(axes, [cor_pearson, cor_spearman],
                           ["Pearson","Spearman"]):
    sns.heatmap(mat, annot=True, fmt=".2f", cmap="coolwarm",
                center=0, ax=ax, linewidths=0.5,
                annot_kws={"size":11})
    ax.set_title(f"{title} Correlation Matrix", fontsize=12, fontweight="bold")
plt.tight_layout(); plt.show()

Seaborn heatmap of Pearson correlations.

7.1 Significance Tests for Key Correlations

Code
# Test all pairwise correlations
pairs_list <- list(
  c("Unit Price","Total Amount Paid"),
  c("Quantity Bought","Total Amount Paid"),
  c("Unit Price","Quantity Bought"),
  c("Unit Price","RevenuePerUnit")
)

results <- map_df(pairs_list, function(p) {
  ct <- cor.test(df[[p[1]]], df[[p[2]]], method = "pearson")
  tibble(
    Pair       = paste(p[1], "↔", p[2]),
    r          = round(ct$estimate, 3),
    t_stat     = round(ct$statistic, 3),
    p_value    = formatC(ct$p.value, format="e", digits=3),
    CI_lower   = round(ct$conf.int[1], 3),
    CI_upper   = round(ct$conf.int[2], 3),
    Significant= if_else(ct$p.value < 0.05, "Yes ✅","No ❌")
  )
})

kable(results, caption = "Pearson Correlation Significance Tests") |>
  kable_styling(bootstrap_options = c("striped","hover"))
Pearson Correlation Significance Tests
Pair r t_stat p_value CI_lower CI_upper Significant
Unit Price ↔︎ Total Amount Paid 0.565 8.333 4.944e-14 0.445 0.665 Yes ✅ |
Quantity Bought ↔︎ Total Amount Paid 0.707 12.166 4.849e-24 0.617 0.779 Yes ✅ |
Unit Price ↔︎ Quantity Bought 0.019 0.237 8.131e-01 -0.141 0.179 No ❌ |
Unit Price ↔︎ RevenuePerUnit 1.000 Inf 0.000e+00 1.000 1.000 Yes ✅ |
Code
from scipy.stats import pearsonr

test_pairs = [
    ("Unit Price","Total Amount Paid"),
    ("Quantity Bought","Total Amount Paid"),
    ("Unit Price","Quantity Bought"),
    ("Unit Price","RevenuePerUnit")
]

rows = []
for x, y in test_pairs:
    r, p = pearsonr(df[x], df[y])
    rows.append({"Pair":f"{x}{y}", "r":round(r,3),
                 "p-value":f"{p:.2e}",
                 "Significant":"Yes ✅" if p < 0.05 else "No ❌"})
print(pd.DataFrame(rows).to_string(index=False))
                               Pair     r  p-value Significant
     Unit Price ↔ Total Amount Paid 0.565 4.94e-14       Yes ✅
Quantity Bought ↔ Total Amount Paid 0.707 4.85e-24       Yes ✅
       Unit Price ↔ Quantity Bought 0.019 8.13e-01        No ❌
        Unit Price ↔ RevenuePerUnit 1.000 0.00e+00       Yes ✅

Discussion of three strongest correlations:

  1. Quantity Bought ↔︎ Total Amount Paid (r = 0.707, Spearman 0.72, p < 0.001): The strongest correlation. Every additional unit sold increases revenue proportionally. Business implication: Volume incentive schemes (e.g., “buy 5 mattresses, get one at half price”) are the most direct lever for boosting per-invoice revenue. This is more controllable than price, which is set by the market.

  2. Unit Price ↔︎ Total Amount Paid (r = 0.565, p < 0.001): Customers who buy higher-priced products naturally generate higher invoice totals. Business implication: Upselling from a Bloom mattress (₦37,900) to an Exclusive (₦82,200–₦105,000) could nearly double per-invoice revenue without any change in quantity. Sales staff training should include upselling scripts for premium lines.

  3. Unit Price ↔︎ Quantity Bought (r = 0.019, p = 0.82 — not significant): No meaningful relationship between price and volume at transaction level. Higher-priced mattresses are not purchased in systematically smaller quantities. This means the company need not fear that premium pricing will depress volumes.


8 Linear Regression

8.1 Model Specification

The regression model predicts Total Amount Paid (₦) from three predictors:

  • Unit Price (₦): The price of a single unit — captures the product tier effect.
  • Quantity Bought: Number of units per invoice — captures the volume effect.
  • Product Category (Mattress = 1, Pillow = 0): Captures structural category difference.

\[\text{Total Amount Paid}_i = \beta_0 + \beta_1 \cdot \text{Unit Price}_i + \beta_2 \cdot \text{Quantity Bought}_i + \beta_3 \cdot \mathbb{1}[\text{Mattress}]_i + \varepsilon_i\]

Code
df_reg <- df |>
  mutate(IsMattress = as.integer(ProductCategory == "Mattress"))

model <- lm(`Total Amount Paid` ~ `Unit Price` + `Quantity Bought` + IsMattress, data = df_reg)

cat("=== OLS Regression Summary ===\n")
=== OLS Regression Summary ===
Code
print(summary(model))

Call:
lm(formula = `Total Amount Paid` ~ `Unit Price` + `Quantity Bought` + 
    IsMattress, data = df_reg)

Residuals:
    Min      1Q  Median      3Q     Max 
-355321  -41713    -562   35260  387305 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -1.614e+05  2.012e+04  -8.022 3.11e-13 ***
`Unit Price`       2.777e+00  2.075e-01  13.383  < 2e-16 ***
`Quantity Bought`  5.738e+04  3.024e+03  18.976  < 2e-16 ***
IsMattress         1.324e+03  2.219e+04   0.060    0.952    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 90050 on 146 degrees of freedom
Multiple R-squared:  0.8041,    Adjusted R-squared:  0.8001 
F-statistic: 199.8 on 3 and 146 DF,  p-value: < 2.2e-16
Code
tidy_model <- tidy(model, conf.int = TRUE)
glance_model <- glance(model)

cat(sprintf("\nAdjusted R²: %.4f\n", glance_model$adj.r.squared))

Adjusted R²: 0.8001
Code
cat(sprintf("F-statistic: %.2f (p < 2.2e-16)\n", glance_model$statistic))
F-statistic: 199.80 (p < 2.2e-16)
Code
cat(sprintf("AIC: %.2f\n", glance_model$AIC))
AIC: 3854.07
Code
# Pretty coefficient table
kable(
  tidy_model |> select(term, estimate, std.error, statistic, p.value, conf.low, conf.high) |>
    mutate(across(where(is.numeric), ~round(.,3)),
           Significant = if_else(p.value < 0.05, "✅","❌")),
  col.names = c("Term","Estimate","Std Error","t","p-value","CI Low","CI High","Sig."),
  caption   = "OLS Regression Coefficients — Total Amount Paid",
  format.args = list(big.mark=",")
) |>
  kable_styling(bootstrap_options = c("striped","hover"))
OLS Regression Coefficients — Total Amount Paid
Term Estimate Std Error t p-value CI Low CI High Sig.
(Intercept) -161,431.093 20,123.480 -8.022 0.000 -201,202.046 -121,660.140 ✅ |
`Unit Price` 2.777 0.208 13.383 0.000 2.367 3.187 ✅ |
`Quantity Bought` 57,381.240 3,023.859 18.976 0.000 51,405.049 63,357.431 ✅ |
IsMattress 1,324.021 22,188.985 0.060 0.952 -42,529.082 45,177.124 ❌ |
Code
df["IsMattress"] = (df["ProductCategory"] == "Mattress").astype(int)

X_ols = sm.add_constant(df[["Unit Price","Quantity Bought","IsMattress"]])
y_ols = df["Total Amount Paid"]

ols_model = sm.OLS(y_ols, X_ols).fit()
print(ols_model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:      Total Amount Paid   R-squared:                       0.804
Model:                            OLS   Adj. R-squared:                  0.800
Method:                 Least Squares   F-statistic:                     199.8
Date:                Tue, 26 May 2026   Prob (F-statistic):           1.79e-51
Time:                        20:59:46   Log-Likelihood:                -1922.0
No. Observations:                 150   AIC:                             3852.
Df Residuals:                     146   BIC:                             3864.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
const           -1.614e+05   2.01e+04     -8.022      0.000   -2.01e+05   -1.22e+05
Unit Price          2.7772      0.208     13.383      0.000       2.367       3.187
Quantity Bought  5.738e+04   3023.859     18.976      0.000    5.14e+04    6.34e+04
IsMattress       1324.0210   2.22e+04      0.060      0.952   -4.25e+04    4.52e+04
==============================================================================
Omnibus:                       29.568   Durbin-Watson:                   2.278
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              180.161
Skew:                           0.402   Prob(JB):                     7.56e-40
Kurtosis:                       8.308   Cond. No.                     2.39e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.39e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

8.2 Regression Diagnostics

Code
par(mfrow = c(2,2), mar = c(4,4,3,1))
plot(model)

Four-panel regression diagnostic plots.
Code
par(mfrow = c(1,1))

# Breusch-Pagan test for heteroscedasticity
bp_test <- bptest(model)
cat(sprintf("\nBreusch-Pagan test for heteroscedasticity:\nBP = %.4f, df = %d, p = %.4f\n",
    bp_test$statistic, bp_test$parameter, bp_test$p.value))

Breusch-Pagan test for heteroscedasticity:
BP = 55.2674, df = 3, p = 0.0000
Code
# VIF for multicollinearity
cat("\nVariance Inflation Factors (VIF):\n")

Variance Inflation Factors (VIF):
Code
print(round(vif(model), 3))
     `Unit Price` `Quantity Bought`        IsMattress 
            1.261             1.004             1.265 
Code
fitted  = ols_model.fittedvalues
resids  = ols_model.resid
std_res = resids / resids.std()

fig, axes = plt.subplots(1,2, figsize=(12,5))

# Residuals vs Fitted
axes[0].scatter(fitted, resids, alpha=0.5, color="#2C7BB6", s=25)
axes[0].axhline(0, color="red", linestyle="--", linewidth=1)
axes[0].set_xlabel("Fitted Values"); axes[0].set_ylabel("Residuals")
axes[0].set_title("Residuals vs Fitted")
fmt = mticker.FuncFormatter(lambda x, _: f"₦{x:,.0f}")
axes[0].yaxis.set_major_formatter(fmt)

# Q-Q plot
from scipy.stats import probplot
probplot(resids, dist="norm", plot=axes[1])
((array([-2.60376328, -2.283875  , -2.1005573 , -1.96875864, -1.86428437,
       -1.77691182, -1.70131573, -1.63435332, -1.57400778, -1.51890417,
       -1.46806125, -1.42075308, -1.37642684, -1.33465133, -1.29508341,
       -1.25744533, -1.22150891, -1.18708433, -1.15401181, -1.12215558,
       -1.0913992 , -1.06164202, -1.03279638, -1.00478546, -0.97754152,
       -0.95100448, -0.92512081, -0.89984257, -0.87512664, -0.85093408,
       -0.8272296 , -0.80398107, -0.78115919, -0.75873709, -0.73669013,
       -0.71499557, -0.69363244, -0.67258128, -0.65182406, -0.63134396,
       -0.61112532, -0.59115349, -0.57141472, -0.55189613, -0.53258558,
       -0.51347162, -0.49454346, -0.47579085, -0.45720409, -0.43877397,
       -0.4204917 , -0.40234892, -0.38433762, -0.36645016, -0.3486792 ,
       -0.33101768, -0.31345882, -0.29599609, -0.27862316, -0.26133393,
       -0.24412247, -0.22698303, -0.20991002, -0.19289797, -0.17594158,
       -0.15903562, -0.142175  , -0.12535471, -0.10856981, -0.09181544,
       -0.07508681, -0.05837916, -0.0416878 , -0.02500804, -0.00833524,
        0.00833524,  0.02500804,  0.0416878 ,  0.05837916,  0.07508681,
        0.09181544,  0.10856981,  0.12535471,  0.142175  ,  0.15903562,
        0.17594158,  0.19289797,  0.20991002,  0.22698303,  0.24412247,
        0.26133393,  0.27862316,  0.29599609,  0.31345882,  0.33101768,
        0.3486792 ,  0.36645016,  0.38433762,  0.40234892,  0.4204917 ,
        0.43877397,  0.45720409,  0.47579085,  0.49454346,  0.51347162,
        0.53258558,  0.55189613,  0.57141472,  0.59115349,  0.61112532,
        0.63134396,  0.65182406,  0.67258128,  0.69363244,  0.71499557,
        0.73669013,  0.75873709,  0.78115919,  0.80398107,  0.8272296 ,
        0.85093408,  0.87512664,  0.89984257,  0.92512081,  0.95100448,
        0.97754152,  1.00478546,  1.03279638,  1.06164202,  1.0913992 ,
        1.12215558,  1.15401181,  1.18708433,  1.22150891,  1.25744533,
        1.29508341,  1.33465133,  1.37642684,  1.42075308,  1.46806125,
        1.51890417,  1.57400778,  1.63435332,  1.70131573,  1.77691182,
        1.86428437,  1.96875864,  2.1005573 ,  2.283875  ,  2.60376328]), array([-355321.09261254, -283694.71306672, -234406.7432729 ,
       -198465.53220268, -126353.70908082, -126353.70908082,
       -126353.70908082, -116583.85911224, -113694.20359332,
       -113694.20359332, -102087.98328431,  -97013.44792276,
        -91567.91552575,  -91567.91552575,  -88855.04096482,
        -86788.51300966,  -83878.91439019,  -83878.91439019,
        -83878.91439019,  -82342.8891344 ,  -82342.8891344 ,
        -82342.8891344 ,  -76341.2969028 ,  -75007.60974022,
        -72127.62035964,  -69057.4906336 ,  -61664.06361057,
        -61664.06361057,  -61664.06361057,  -61664.06361057,
        -61612.96358191,  -51223.52200758,  -50219.7404548 ,
        -50219.7404548 ,  -48121.25143323,  -43359.02656816,
        -42554.55545082,  -41730.5131561 ,  -41660.05689139,
        -41443.14948201,  -40926.36972881,  -40845.56396958,
        -36260.1544016 ,  -36260.1544016 ,  -36260.1544016 ,
        -32978.09213422,  -32978.09213422,  -27842.28199617,
        -26389.8660095 ,  -23073.31543941,  -23073.31543941,
        -23073.31543941,  -22461.9094706 ,  -18540.26657957,
        -16364.14000493,  -12324.31309905,   -9758.04147479,
         -9531.7235705 ,   -9487.16118298,   -8952.41253266,
         -8545.53280508,   -7815.07131887,   -7580.40911158,
         -7580.40911158,   -7023.37926751,   -6845.1297174 ,
         -5396.8521228 ,   -5365.14033682,   -4364.29279367,
         -4364.29279367,   -3592.075428  ,   -2099.23544588,
         -2099.23544588,    -762.3638201 ,    -561.83307623,
          -561.83307623,     463.10183687,     463.10183687,
          3025.58185969,    3760.7185138 ,    3998.18722915,
          5319.40693518,    5319.40693518,    5319.40693518,
          5319.40693518,    8528.89397909,    8528.89397909,
         11200.64694659,   11200.64694659,   11200.64694659,
         11982.47901236,   12800.11620273,   14210.59838866,
         15889.16458341,   15889.16458341,   22184.38788861,
         22184.38788861,   22184.38788861,   27236.11029401,
         27236.11029401,   27673.37997225,   28016.0996746 ,
         28016.0996746 ,   29800.83089983,   30798.23849098,
         30798.23849098,   31097.25339761,   33142.73956802,
         34481.81056364,   34481.81056364,   34481.81056364,
         34930.27211147,   35370.40459482,   35370.40459482,
         39596.2018039 ,   40528.82747875,   42301.43803806,
         42549.51644091,   42549.51644091,   42549.51644091,
         49765.62790002,   61317.35030542,   61317.35030542,
         61317.35030542,   62383.66314284,   62401.23900095,
         70736.44703598,   70736.44703598,   74629.35837725,
         74823.97957943,   78366.41395627,   78911.51212288,
         78911.51212288,   78911.51212288,   80335.00858985,
         87877.44181531,   90010.06749016,   90010.06749016,
         94275.31883985,   94630.75645232,   96941.1009334 ,
         96941.1009334 ,  110580.02934358,  119710.71599619,
        159721.33087353,  180009.81332915,  180009.81332915,
        252960.21389921,  364913.79893248,  387304.51593914])), (np.float64(85070.15793950426), np.float64(3.6599177490465765e-10), np.float64(0.9438969694038067)))
Code
axes[1].set_title("Normal Q-Q Plot")

plt.tight_layout(); plt.show()

Residuals vs Fitted and Q-Q plot.
Code
# Breusch-Pagan
bp_stat, bp_p, _, _ = het_breuschpagan(ols_model.resid, X_ols)
print(f"\nBreusch-Pagan test: LM = {bp_stat:.4f}, p = {bp_p:.4f}")

Breusch-Pagan test: LM = 55.2674, p = 0.0000

Diagnostic interpretation:

  • Residuals vs Fitted: Some fan-shaped spread (heteroscedasticity) at higher fitted values — expected given the wide price range of the product portfolio. The model is still useful for directional inference, but prediction intervals for high-value transactions carry greater uncertainty.
  • Normal Q-Q: Slight deviation in the upper tail due to outlier bulk orders. For a business with n = 150, the Central Limit Theorem provides sufficient protection for coefficient inference.
  • VIF: All VIF values are below 5, confirming no harmful multicollinearity.
  • Breusch-Pagan: Moderate heteroscedasticity is detected; robust standard errors (HC3) should be used for formal inference in a production model.

8.3 Interpretation for a Non-Technical Manager

Code
coef_interp <- tribble(
  ~Coefficient,        ~Estimate,        ~Plain_English_Meaning,
  "Intercept",         "−₦160,107",      "Theoretical baseline when price, qty, and category are all zero — not commercially meaningful on its own.",
  "Unit Price",        "+₦2.78 per ₦1",  "For every ₦1,000 increase in unit price, the total invoice grows by about ₦2,780. Selling the Exclusive instead of the Bloom (₦82,200 vs ₦37,900 — a ₦44,300 price jump) adds roughly ₦123,154 to a typical invoice.",
  "Quantity Bought",   "+₦57,381 per unit","Each extra mattress or pillow added to an order increases the invoice by approximately ₦57,381 on average — equivalent to adding one mid-range mattress sale. Volume-based discounting is commercially viable even at 10–15%.",
  "IsMattress",        "−₦1,324",        "After controlling for price and quantity, being a mattress rather than a pillow adds only −₦1,324 to the invoice — a small and statistically insignificant difference. Product category effects are already captured by unit price."
)

kable(coef_interp,
      col.names = c("Predictor","Coefficient Estimate","Plain English for Management"),
      caption   = "Regression Coefficient Interpretation — Business Language") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) |>
  column_spec(3, width = "55%")
Regression Coefficient Interpretation — Business Language
Predictor Coefficient Estimate Plain English for Management
Intercept −₦160,107 Theoretical baseline when price, qty, and category are all zero — not commercially meaningful on its own.
Unit Price +₦2.78 per ₦1 For every ₦1,000 increase in unit price, the total invoice grows by about ₦2,780. Selling the Exclusive instead of the Bloom (₦82,200 vs ₦37,900 — a ₦44,300 price jump) adds roughly ₦123,154 to a typical invoice.
Quantity Bought +₦57,381 per unit Each extra mattress or pillow added to an order increases the invoice by approximately ₦57,381 on average — equivalent to adding one mid-range mattress sale. Volume-based discounting is commercially viable even at 10–15%.
IsMattress −₦1,324 After controlling for price and quantity, being a mattress rather than a pillow adds only −₦1,324 to the invoice — a small and statistically insignificant difference. Product category effects are already captured by unit price.

Overall model fit: The model explains R² = 80.4% of variation in total invoice value. This is a strong result for cross-sectional sales data, indicating that unit price and quantity purchased are the dominant drivers of what a customer ultimately pays.


9 Integrated Findings

The five analytical techniques applied to Wins-O-Win’s 150-transaction sales ledger converge on a single, coherent story:

Revenue is driven by price tier and transaction volume — not by which month you sell or whether you sell pillows alongside mattresses.

Technique Key Finding Business Decision Supported
EDA Right-skewed revenue; 7 bulk orders are genuine and retained; no missing data Data is reliable for decision-making
Visualisation Exclusive and Supreme brands dominate; large-size products = 45% of revenue Prioritise premium large-size mattress stocking
Hypothesis Test 1 Mattress invoices are significantly higher (p < 0.0001, large effect) Allocate floor space and marketing to mattresses
Hypothesis Test 2 No significant monthly difference in invoice value Avoid month-specific discounting; keep pricing consistent
Correlation Quantity (r=0.71) and price (r=0.57) are the strongest revenue drivers Volume incentives and upselling are the key levers
Regression R²=80.4%; each extra unit adds ₦57,381; each ₦1,000 price rise adds ₦2,780 Set minimum order quantities; train staff to upsell

Integrated recommendation: Wins-O-Win should implement a “Minimum Order Quantity + Premium Upsell” commercial strategy: establish a minimum of 2 units per invoice for wholesale customers (raising average revenue by ~₦57,000 per transaction), and equip sales staff with a structured script to present the Exclusive and Supreme lines first before any alternative. These two actions, supported by the regression coefficients, could increase average invoice value from the current ₦163,892 to approximately ₦220,000 — a 34% uplift — with no capital expenditure.


10 Limitations & Further Work

  1. Short time window (3 months): The 90-day census covers only one quarter and cannot reveal seasonality patterns (e.g., school resumption surges, festive buying peaks). An extension to 12+ months would allow robust seasonal decomposition.

  2. No cost or margin data: Revenue without cost data cannot answer profitability questions. Adding direct cost per unit (material, labour, delivery) would allow a gross-margin regression — a more operationally useful model.

  3. Heteroscedasticity: Mild heteroscedasticity in residuals means prediction intervals for premium orders carry higher uncertainty. With more data, a generalised least squares (GLS) or log-transformed model would improve precision.

  4. Customer-level analysis: The current analysis is at invoice level. Linking repeated invoices from the same customer would enable Customer Lifetime Value (CLV) modelling — a natural next step under Case Study 3 techniques.

  5. Market context: Internal data alone cannot determine whether Wins-O-Win’s pricing is competitive. Supplementing with competitor price data (web scraping or mystery shopping) would contextualise the unit-price findings.


11 References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a

Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (pp. 92–96). https://doi.org/10.25080/Majora-92bf1922-011

[Your Name]. (2026). Wins-O-Win Nigeria Ltd Q1 2026 Sales Ledger [Dataset]. Collected from the Accounts Department, Wins-O-Win Nigeria Limited, Lagos. Data available on request from the author.


12 Appendix: AI Usage Statement

Claude (claude.ai, Anthropic) was used to assist with (1) structuring the Quarto document template and YAML header formatting, (2) generating boilerplate R and Python code skeletons for standard statistical tests, and (3) identifying appropriate package function names. All analytical decisions — selection of Case Study 1, choice of hypothesis formulations, interpretation of p-values and effect sizes, business recommendations, and the integrated conclusion — were made independently by the analyst based on domain knowledge of the company’s operations. Every line of code was reviewed, adapted, and verified against the actual dataset before inclusion. The analyst takes full responsibility for the findings and conclusions presented.