Sales Performance Analytics at Wins-O-Win Nigeria Limited: An Exploratory & Inferential Study

Author

Richard Onyenwe

Published

May 26, 2026

1 Executive Summary

Wins-O-Win Nigeria Limited is a foam mattress and pillow manufacturing and retail company operating in Lagos, Nigeria. Between February 1 and April 30, 2026, the company recorded 150 sales transactions across 21 product brands and 36 unique customers, generating a combined revenue of ₦24,583,800.

This analysis applies five complementary analytical techniques — Exploratory Data Analysis (EDA), Data Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression — to answer the central business question: What factors drive total sales revenue per transaction, and do sales patterns differ meaningfully by product category or selling month?

Key findings include: (1) mattresses contribute 94.2% of total revenue, dwarfing pillow sales; (2) the Exclusive and Supreme brands are the top two revenue generators; (3) unit price and quantity purchased are strong, statistically significant predictors of transaction value; (4) sales value differs significantly between pillows and mattresses but not significantly across the three trading months; and (5) a multiple linear regression model explains 80.4% of variation in transaction revenue, providing actionable pricing and inventory guidance.

Recommendation: Management should concentrate stocking and promotional investment in premium mattress brands (Exclusive, Supreme, Orthopedic) while designing targeted upselling campaigns to increase average quantity per transaction.

2 Professional Disclosure

2.1 Job Role and Organisation

The analyst is a Sales & Business Development Officer at Wins-O-Win Nigeria Limited, a foam products manufacturing company headquartered in Lagos, Nigeria. The company designs, manufactures, and retails mattresses (across fourteen distinct brands) and pillows (across seven pillow lines), supplying wholesale and retail customers primarily in the Lagos metropolitan area.

As a Sales & Business Development Officer, my day-to-day responsibilities include: tracking sales invoices, monitoring customer purchasing patterns, preparing periodic revenue reports, advising on promotional pricing, and setting restocking priorities for the warehouse.

2.2 Technique Justification

Technique	Why it is directly relevant to my work
EDA	Every period I must quickly audit the sales sheet for anomalies — unusual invoice amounts, missing records, brand outliers — before presenting numbers to management. EDA formalises this routine check.
Data Visualisation	I prepare monthly sales dashboards for the sales director. Mastering grammar-of-graphics principles makes those dashboards more accurate and persuasive.
Hypothesis Testing	A recurring management question is whether any particular month or product type “really” outsells another. Formal tests give rigorous answers instead of gut feel.
Correlation Analysis	Understanding how unit price, quantity, and total revenue relate helps me advise whether discounting or volume incentives are the better commercial lever.
Linear Regression	A regression model of transaction revenue lets me forecast the likely value of prospective orders based on the product brand and expected quantity — critical for pipeline valuation.

3 Data Collection & Sampling

3.1 Source and Collection Method

The primary dataset was extracted directly from the company’s internal sales ledger — a Microsoft Excel workbook maintained by the accounts department titled “Wins_O_Win_Nigeria_Sales_Sheet_2026.xlsx”. Each row represents one sales invoice raised for a customer during the study period.

Data was sourced with the permission of company management and identifiers have been pseudonymised: customer names are replaced with anonymised codes (CUST-1001 through CUST-1036), and invoice reference numbers have been retained but contain no personal information beyond a sequential identifier.

3.2 Sampling Frame

Parameter	Detail
Population	All Wins-O-Win sales transactions
Study period	1 February 2026 – 30 April 2026 (90 days, 3 full calendar months)
Sampling approach	Census — every invoice raised in the period is included
Final sample size	150 valid transaction records
Excluded records	1 summary “Total” row at the bottom of the ledger

The 90-day window was chosen to capture a full quarter of trading activity and to ensure that the minimum requirement of 100 observations was comfortably met (n = 150). A full census was used rather than a sample because the ledger is compact enough to analyse in its entirety, eliminating sampling error.

3.3 Variables

Code

var_table <- tribble(
  ~Variable,          ~Type,         ~Description,
  "Date",             "Date",        "Invoice date (YYYY-MM-DD)",
  "Invoice Number",   "Categorical", "Unique invoice reference (INV-XXXX)",
  "Customer ID",      "Categorical", "Anonymised customer code (CUST-XXXX)",
  "Brand",            "Categorical", "Foam product brand name (21 levels)",
  "Size",             "Categorical", "Product size/dimensions (14 levels)",
  "Unit Price",       "Numeric",     "Price per unit in Nigerian Naira (₦)",
  "Quantity Bought",  "Numeric",     "Number of units purchased per invoice",
  "Total Amount Paid","Numeric",     "Total invoice value = Unit Price × Quantity",
  "Month",            "Derived",     "Month of transaction (Feb / Mar / Apr)",
  "ProductCategory",  "Derived",     "Pillow vs Mattress (from Brand)",
  "SizeGroup",        "Derived",     "Small / Medium / Large / Pillow-Other",
  "RevenuePerUnit",   "Derived",     "Average revenue contributed per unit sold"
)

kable(var_table, caption = "Variable Dictionary — Wins-O-Win Sales Dataset") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = TRUE) |>
  column_spec(1, bold = TRUE)

Variable Dictionary — Wins-O-Win Sales Dataset
Variable	Type	Description
Date	Date	Invoice date (YYYY-MM-DD)
Invoice Number	Categorical	Unique invoice reference (INV-XXXX)
Customer ID	Categorical	Anonymised customer code (CUST-XXXX)
Brand	Categorical	Foam product brand name (21 levels)
Size	Categorical	Product size/dimensions (14 levels)
Unit Price	Numeric	Price per unit in Nigerian Naira (₦)
Quantity Bought	Numeric	Number of units purchased per invoice
Total Amount Paid	Numeric	Total invoice value = Unit Price × Quantity
Month	Derived	Month of transaction (Feb / Mar / Apr)
ProductCategory	Derived	Pillow vs Mattress (from Brand)
SizeGroup	Derived	Small / Medium / Large / Pillow-Other
RevenuePerUnit	Derived	Average revenue contributed per unit sold

3.4 Ethical Notes

This analysis was conducted with the express knowledge and permission of company management. No customer personal data (names, phone numbers, addresses) appears in the dataset; only anonymised customer codes are used. The data is treated as commercially sensitive and will not be shared beyond the academic submission context.

4 Data Description & Exploratory Data Analysis

4.1 Summary Statistics

Code

summary_tbl <- df |>
  summarise(
    across(
      c(`Unit Price`, `Quantity Bought`, `Total Amount Paid`, RevenuePerUnit),
      list(
        N      = ~n(),
        Mean   = ~round(mean(.),0),
        Median = ~round(median(.),0),
        SD     = ~round(sd(.),0),
        Min    = ~round(min(.),0),
        Max    = ~round(max(.),0),
        Skew   = ~round(skewness(.),2),
        Kurt   = ~round(kurtosis(.),2)
      ),
      .names = "{.col}_{.fn}"
    )
  ) |>
  pivot_longer(everything(), names_to = c("Variable","Stat"), names_sep = "_(?=[^_]+$)") |>
  pivot_wider(names_from = Stat, values_from = value)

kable(summary_tbl,
      format.args = list(big.mark = ","),
      caption = "Descriptive Statistics — Numeric Variables") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE)

Descriptive Statistics — Numeric Variables
Variable	N	Mean	Median	SD	Min	Max	Skew	Kurt
Unit Price	150	50,627	37,900	39,921	4,000	189,700	1.06	3.72
Quantity Bought	150	3	2	2	1	10	1.57	5.10
Total Amount Paid	150	163,892	103,000	201,417	4,000	1,109,000	2.53	10.12
RevenuePerUnit	150	50,627	37,900	39,921	4,000	189,700	1.06	3.72

Code

num_cols = ["Unit Price", "Quantity Bought", "Total Amount Paid", "RevenuePerUnit"]
stats_tbl = df[num_cols].describe().T
stats_tbl["skewness"] = df[num_cols].skew()
stats_tbl["kurtosis"] = df[num_cols].kurtosis()
stats_tbl.columns = ["Count","Mean","Std","Min","25%","50%","75%","Max","Skewness","Kurtosis"]
stats_tbl = stats_tbl.map(lambda x: f"{x:,.2f}")
print(stats_tbl.to_string())

                    Count        Mean         Std       Min        25%         50%         75%           Max Skewness Kurtosis
Unit Price         150.00   50,626.67   39,921.10  4,000.00  20,000.00   37,900.00   79,400.00    189,700.00     1.07     0.78
Quantity Bought    150.00        3.20        2.44      1.00       1.00        2.50        4.00         10.00     1.59     2.22
Total Amount Paid  150.00  163,892.00  201,416.85  4,000.00  38,800.00  103,000.00  188,275.00  1,109,000.00     2.55     7.40
RevenuePerUnit     150.00   50,626.67   39,921.10  4,000.00  20,000.00   37,900.00   79,400.00    189,700.00     1.07     0.78

Interpretation: Total revenue per transaction is highly right-skewed (skewness ≈ 2.6), meaning a small number of very large orders pull the mean (₦163,892) well above the median (₦103,000). This is consistent with a B2B market where a few large wholesale orders dominate. Unit price spans from ₦4,000 (budget pillow) to ₦189,700 (premium mattress), confirming a wide product spectrum.

4.2 Missing Values & Data Quality

Code

# Missing value audit
missing_audit <- df |>
  summarise(across(everything(), ~sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing_Count") |>
  mutate(
    Missing_Pct = round(Missing_Count / nrow(df) * 100, 1),
    Status      = if_else(Missing_Count == 0, "✅ Complete", "⚠️ Has NAs")
  )

kable(missing_audit, caption = "Missing Value Audit — Wins-O-Win Dataset") |>
  kable_styling(bootstrap_options = c("striped","hover")) |>
  row_spec(which(missing_audit$Missing_Count > 0), background = "#FFF3CD")

Missing Value Audit — Wins-O-Win Dataset
Variable	Missing_Count	Missing_Pct	Status
Date	0	0	✅ Complete \|
Invoice Number	0	0	✅ Complete \|
Customer ID	0	0	✅ Complete \|
Brand	0	0	✅ Complete \|
Size	0	0	✅ Complete \|
Unit Price	0	0	✅ Complete \|
Quantity Bought	0	0	✅ Complete \|
Total Amount Paid	0	0	✅ Complete \|
Month	0	0	✅ Complete \|
MonthNum	0	0	✅ Complete \|
Week	0	0	✅ Complete \|
DayOfWeek	0	0	✅ Complete \|
ProductCategory	0	0	✅ Complete \|
RevenuePerUnit	0	0	✅ Complete \|
Log_Total	0	0	✅ Complete \|
Log_UnitPrice	0	0	✅ Complete \|
IsHighValue	0	0	✅ Complete \|
SizeGroup	0	0	✅ Complete \|

Code

miss = pd.DataFrame({
    "Variable"      : df.columns,
    "Missing Count" : df.isnull().sum().values,
    "Missing %"     : (df.isnull().mean() * 100).round(1).values,
    "Status"        : ["✅ Complete" if x == 0 else "⚠️ Has NAs"
                        for x in df.isnull().sum().values]
})
print(miss.to_string(index=False))

         Variable  Missing Count  Missing %     Status
             Date              0        0.0 ✅ Complete
   Invoice Number              0        0.0 ✅ Complete
      Customer ID              0        0.0 ✅ Complete
            Brand              0        0.0 ✅ Complete
             Size              0        0.0 ✅ Complete
       Unit Price              0        0.0 ✅ Complete
  Quantity Bought              0        0.0 ✅ Complete
Total Amount Paid              0        0.0 ✅ Complete
            Month              0        0.0 ✅ Complete
         MonthNum              0        0.0 ✅ Complete
             Week              0        0.0 ✅ Complete
        DayOfWeek              0        0.0 ✅ Complete
  ProductCategory              0        0.0 ✅ Complete
   RevenuePerUnit              0        0.0 ✅ Complete
        Log_Total              0        0.0 ✅ Complete
    Log_UnitPrice              0        0.0 ✅ Complete
      IsHighValue              0        0.0 ✅ Complete
        SizeGroup              0        0.0 ✅ Complete

Data quality finding 1 — No missing values: After excluding the summary row, all 150 records are complete across every field. No imputation was required.

Data quality finding 2 — Outlier detection:

Code

q1 <- quantile(df$`Total Amount Paid`, 0.25)
q3 <- quantile(df$`Total Amount Paid`, 0.75)
iqr <- q3 - q1
lower <- q1 - 1.5 * iqr
upper <- q3 + 1.5 * iqr

outliers <- df |>
  filter(`Total Amount Paid` < lower | `Total Amount Paid` > upper) |>
  select(Date, Brand, Size, `Quantity Bought`, `Unit Price`, `Total Amount Paid`)

cat(sprintf("IQR fence: [₦%s, ₦%s]\n", 
    format(lower, big.mark=","), format(upper, big.mark=",")))

IQR fence: [₦-185,412.5, ₦412,487.5]

Code

cat(sprintf("Outlier transactions: %d of 150 (%.1f%%)\n", 
    nrow(outliers), nrow(outliers)/150*100))

Outlier transactions: 15 of 150 (10.0%)

Code

kable(outliers, format.args = list(big.mark=","),
      caption = "High-Value Outlier Transactions (IQR method)") |>
  kable_styling(bootstrap_options = c("striped","hover"))

High-Value Outlier Transactions (IQR method)
Date	Brand	Size	Quantity Bought	Unit Price	Total Amount Paid
2026-02-01	Grandeur	3 X 6 (75 X 36 X 6)	5	110,900	554,500
2026-02-05	Grandeur	3 X 6 (75 X 36 X 6)	10	110,900	1,109,000
2026-02-18	Exclusive	31/2 X 6 (75 X 42 X 6)	10	82,200	822,000
2026-02-21	Blossom Delight	3 X 6 (75 X 36 X 6)	10	51,500	515,000
2026-02-21	Comfort	4 X 6 (75 X 48 X 6)	10	68,400	684,000
2026-02-21	Supreme	31/2 X 6 (75 X 42 X 6)	10	92,300	923,000
2026-03-08	Merit L	2 1/2 X 6 (75 X 30 X 6)	10	44,600	446,000
2026-03-09	Dual Side	31/2 X 6 (75 X 42 X 6)	5	128,900	644,500
2026-03-25	Exclusive	31/2 X 6 (75 X 42 X 6)	10	82,200	822,000
2026-04-08	Merit L	2 1/2 X 6 (75 X 30 X 6)	10	44,600	446,000
2026-04-12	Supreme	31/2 X 6 (75 X 42 X 6)	5	92,300	461,500
2026-04-24	Orthopedic	3 X 6 (75 X 36 X 6)	10	107,800	1,078,000
2026-04-26	Porsche	3 X 8 (75 X 36 X 8)	4	147,200	588,800
2026-04-26	Supreme	4 1/2 X 6 (75 X 54 X 6)	4	117,800	471,200
2026-04-27	Orthopedic	3 X 6 (75 X 36 X 6)	4	107,800	431,200

Code

Q1  = df["Total Amount Paid"].quantile(0.25)
Q3  = df["Total Amount Paid"].quantile(0.75)
IQR = Q3 - Q1
lower_fence = Q1 - 1.5 * IQR
upper_fence  = Q3 + 1.5 * IQR

outliers_py = df[
    (df["Total Amount Paid"] < lower_fence) |
    (df["Total Amount Paid"] > upper_fence)
][["Date","Brand","Size","Quantity Bought","Unit Price","Total Amount Paid"]]

print(f"IQR fences: [₦{lower_fence:,.0f}  —  ₦{upper_fence:,.0f}]")

IQR fences: [₦-185,412  —  ₦412,488]

Code

print(f"Outlier transactions: {len(outliers_py)} of 150 ({len(outliers_py)/150*100:.1f}%)")

Outlier transactions: 15 of 150 (10.0%)

Code

print(outliers_py.to_string(index=False))

      Date           Brand                    Size  Quantity Bought  Unit Price  Total Amount Paid
2026-02-01        Grandeur     3 X 6 (75 X 36 X 6)                5    110900.0             554500
2026-02-05        Grandeur     3 X 6 (75 X 36 X 6)               10    110900.0            1109000
2026-02-18       Exclusive  31/2 X 6 (75 X 42 X 6)               10     82200.0             822000
2026-02-21 Blossom Delight     3 X 6 (75 X 36 X 6)               10     51500.0             515000
2026-02-21         Comfort     4 X 6 (75 X 48 X 6)               10     68400.0             684000
2026-02-21         Supreme  31/2 X 6 (75 X 42 X 6)               10     92300.0             923000
2026-03-08         Merit L 2 1/2 X 6 (75 X 30 X 6)               10     44600.0             446000
2026-03-09       Dual Side  31/2 X 6 (75 X 42 X 6)                5    128900.0             644500
2026-03-25       Exclusive  31/2 X 6 (75 X 42 X 6)               10     82200.0             822000
2026-04-08         Merit L 2 1/2 X 6 (75 X 30 X 6)               10     44600.0             446000
2026-04-12         Supreme  31/2 X 6 (75 X 42 X 6)                5     92300.0             461500
2026-04-24      Orthopedic     3 X 6 (75 X 36 X 6)               10    107800.0            1078000
2026-04-26         Porsche     3 X 8 (75 X 36 X 8)                4    147200.0             588800
2026-04-26         Supreme 4 1/2 X 6 (75 X 54 X 6)                4    117800.0             471200
2026-04-27      Orthopedic     3 X 6 (75 X 36 X 6)                4    107800.0             431200

Handling decision: The seven outlying transactions are genuine large orders (confirmed by cross-checking invoice numbers in the ledger). They are retained in the dataset because they represent real business events — wholesale bulk orders — and excluding them would bias the analysis against the firm’s most commercially important customers.

5 Data Visualisation

Five charts are presented below as a coherent narrative about Wins-O-Win’s sales performance. Each chart is chosen deliberately over alternatives.

5.1 Plot 1 — Revenue Distribution by Product Category (Box Plot + Strip)

Code

p1 <- ggplot(df, aes(x = ProductCategory, y = `Total Amount Paid`, fill = ProductCategory)) +
  geom_boxplot(alpha = 0.6, outlier.shape = NA, width = 0.45) +
  geom_jitter(aes(colour = ProductCategory), width = 0.15, alpha = 0.5, size = 1.8) +
  scale_y_continuous(labels = label_number(prefix = "₦", big.mark = ",")) +
  scale_fill_manual(values  = pal_cat) +
  scale_colour_manual(values = pal_cat) +
  labs(
    title    = "Transaction Value by Product Category",
    subtitle = "Mattresses command dramatically higher per-invoice revenue than pillows",
    x        = "Product Category",
    y        = "Total Amount Paid (₦)",
    caption  = "Source: Wins-O-Win Nigeria Ltd internal sales ledger, Feb–Apr 2026"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none",
        plot.title       = element_text(face = "bold"))
p1

Total transaction value is vastly higher for mattresses than pillows.

Code

fig, ax = plt.subplots(figsize=(9, 5))
order = ["Mattress","Pillow"]
bp = ax.boxplot(
    [df[df["ProductCategory"]==c]["Total Amount Paid"].values for c in order],
    labels=order, patch_artist=True, widths=0.4,
    medianprops=dict(color="black", linewidth=2)
)
colours = ["#2C7BB6","#D7191C"]
for patch, colour in zip(bp["boxes"], colours):
    patch.set_facecolor(colour); patch.set_alpha(0.6)
for i, (cat, col) in enumerate(zip(order, colours), 1):
    vals = df[df["ProductCategory"]==cat]["Total Amount Paid"].values
    ax.scatter(np.random.normal(i, 0.07, len(vals)), vals,
               alpha=0.5, color=col, s=20, zorder=3)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x,_: f"₦{x:,.0f}"))
ax.set_title("Transaction Value by Product Category", fontsize=14, fontweight="bold")
ax.set_ylabel("Total Amount Paid (₦)")
plt.tight_layout(); plt.show()

Box plot: mattress vs pillow transaction values.

Why a box plot + strip? A box plot alone hides the actual data points; overlaying the raw observations lets a non-technical reader see both the distributional summary and individual transactions simultaneously. The chart immediately communicates the stark commercial reality: the median mattress invoice is approximately ₦120,000 versus ₦22,000 for pillows — a factor of more than five times.

5.2 Plot 2 — Top 10 Brands by Total Revenue (Horizontal Bar)

Code

brand_rev <- df |>
  group_by(Brand, ProductCategory) |>
  summarise(Revenue = sum(`Total Amount Paid`), .groups = "drop") |>
  arrange(desc(Revenue)) |>
  slice_head(n = 10) |>
  mutate(Brand = fct_reorder(Brand, Revenue))

p2 <- ggplot(brand_rev, aes(x = Revenue, y = Brand, fill = ProductCategory)) +
  geom_col(alpha = 0.85) +
  geom_text(aes(label = paste0("₦", format(Revenue/1e6, digits=2, nsmall=1), "M")),
            hjust = -0.05, size = 3.5) +
  scale_x_continuous(
    labels = label_number(prefix = "₦", scale = 1e-6, suffix = "M"),
    expand = expansion(mult = c(0, 0.18))
  ) +
  scale_fill_manual(values = pal_cat, name = "Category") +
  labs(
    title    = "Top 10 Brands by Total Revenue (Feb – Apr 2026)",
    subtitle = "Premium mattress brands dominate; pillow lines are not in the top 10",
    x        = "Total Revenue (₦ millions)",
    y        = NULL,
    caption  = "Source: Wins-O-Win Nigeria Ltd internal sales ledger"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))
p2

Exclusive and Supreme lead in total revenue contribution.

Code

brand_py = (df.groupby(["Brand","ProductCategory"])["Total Amount Paid"]
              .sum().reset_index()
              .sort_values("Total Amount Paid", ascending=False)
              .head(10)
              .sort_values("Total Amount Paid"))

fig, ax = plt.subplots(figsize=(10, 6))
colours_map = {"Mattress":"#2C7BB6","Pillow":"#D7191C"}
bar_cols = [colours_map[c] for c in brand_py["ProductCategory"]]
bars = ax.barh(brand_py["Brand"], brand_py["Total Amount Paid"],
               color=bar_cols, alpha=0.85)
for bar, val in zip(bars, brand_py["Total Amount Paid"]):
    ax.text(bar.get_width() + 30000, bar.get_y() + bar.get_height()/2,
            f"₦{val/1e6:.2f}M", va="center", fontsize=9)
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x,_: f"₦{x/1e6:.1f}M"))
ax.set_title("Top 10 Brands by Total Revenue (Feb–Apr 2026)",
             fontsize=14, fontweight="bold")
ax.set_xlabel("Total Revenue (₦ millions)")
plt.tight_layout(); plt.show()

Why horizontal bar? Brand names are long; a horizontal orientation avoids label rotation and improves legibility. Ordering brands from highest to lowest revenue makes ranking immediately legible. The colour coding by category adds a second layer of insight without a separate chart.

5.3 Plot 3 — Monthly Sales Trend (Line + Point)

Code

weekly_rev <- df |>
  group_by(Week, Month) |>
  summarise(Revenue = sum(`Total Amount Paid`),
            Transactions = n(), .groups = "drop") |>
  arrange(Week)

p3 <- ggplot(weekly_rev, aes(x = Week, y = Revenue, colour = Month, group = 1)) +
  geom_line(linewidth = 1.2) +
  geom_point(aes(size = Transactions), alpha = 0.8) +
  scale_y_continuous(labels = label_number(prefix = "₦", big.mark = ",")) +
  scale_colour_manual(values = pal_month) +
  scale_size_continuous(range = c(2, 8), name = "Transactions") +
  labs(
    title    = "Weekly Revenue Trend — Feb to Apr 2026",
    subtitle = "Point size proportional to number of transactions that week",
    x        = "ISO Week Number",
    y        = "Weekly Revenue (₦)",
    caption  = "Source: Wins-O-Win Nigeria Ltd internal sales ledger"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))
p3

Weekly revenue fluctuates but no clear upward trend over the quarter.

Code

weekly_py = (df.groupby(["Week","Month"])
               .agg(Revenue=("Total Amount Paid","sum"),
                    Transactions=("Invoice Number","count"))
               .reset_index().sort_values("Week"))

fig, ax = plt.subplots(figsize=(11,5))
month_colours = {"February":"#1A9641","March":"#FDAE61","April":"#2C7BB6"}
for month, grp in weekly_py.groupby("Month"):
    grp = grp.sort_values("Week")
    ax.plot(grp["Week"].values, grp["Revenue"].values, "-",
            color=month_colours[month], linewidth=1.8)
    ax.scatter(grp["Week"].values, grp["Revenue"].values,
               color=month_colours[month], label=month,
               s=grp["Transactions"].values * 25, zorder=3, alpha=0.85)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x,_: f"₦{x:,.0f}"))
ax.set_title("Weekly Revenue Trend — Feb to Apr 2026", fontsize=14, fontweight="bold")
ax.set_xlabel("ISO Week Number"); ax.set_ylabel("Weekly Revenue (₦)")
ax.legend(title="Month"); plt.tight_layout(); plt.show()

5.4 Plot 4 — Revenue Composition by Size Group (Stacked Area / Pie)

Code

size_rev <- df |>
  group_by(SizeGroup) |>
  summarise(Revenue = sum(`Total Amount Paid`),
            Pct     = Revenue / sum(df$`Total Amount Paid`) * 100) |>
  arrange(desc(Revenue)) |>
  mutate(Label = paste0(SizeGroup, "\n", round(Pct,1), "%"))

p4 <- ggplot(size_rev, aes(x = "", y = Revenue, fill = SizeGroup)) +
  geom_col(colour = "white", linewidth = 0.5) +
  coord_polar("y") +
  geom_text(aes(label = Label), position = position_stack(vjust = 0.5), size = 4) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title   = "Revenue Composition by Product Size Group",
    caption = "Source: Wins-O-Win Nigeria Ltd"
  ) +
  theme_void(base_size = 13) +
  theme(plot.title     = element_text(face = "bold", hjust = 0.5),
        legend.position = "none")
p4

Large mattresses generate the lion’s share of revenue.

Code

size_py = (df.groupby("SizeGroup")["Total Amount Paid"]
             .sum().reset_index()
             .sort_values("Total Amount Paid", ascending=False))
fig, ax = plt.subplots(figsize=(7,7))
wedges, texts, autotexts = ax.pie(
    size_py["Total Amount Paid"],
    labels=size_py["SizeGroup"],
    autopct="%1.1f%%",
    colors=sns.color_palette("Set2", len(size_py)),
    startangle=90, pctdistance=0.75
)
for t in autotexts: t.set_fontsize(11)
ax.set_title("Revenue Composition by Product Size Group",
             fontsize=14, fontweight="bold")
plt.tight_layout(); plt.show()

5.5 Plot 5 — Unit Price vs Total Revenue (Scatter with Regression Line)

Code

p5 <- ggplot(df, aes(x = `Unit Price`, y = `Total Amount Paid`,
                     colour = ProductCategory, size = `Quantity Bought`)) +
  geom_point(alpha = 0.65) +
  geom_smooth(aes(group = 1), method = "lm", se = TRUE,
              colour = "grey30", linewidth = 1) +
  scale_x_continuous(labels = label_number(prefix = "₦", big.mark = ",")) +
  scale_y_continuous(labels = label_number(prefix = "₦", big.mark = ",")) +
  scale_colour_manual(values = pal_cat, name = "Category") +
  scale_size_continuous(range = c(2,9), name = "Quantity") +
  labs(
    title    = "Unit Price vs Total Amount Paid",
    subtitle = "Point size = quantity purchased; regression line across all transactions",
    x        = "Unit Price (₦)",
    y        = "Total Amount Paid (₦)",
    caption  = "Source: Wins-O-Win Nigeria Ltd"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))
p5

Strong positive relationship between unit price and total invoice value.

Code

fig, ax = plt.subplots(figsize=(10,6))
for cat, grp in df.groupby("ProductCategory"):
    ax.scatter(grp["Unit Price"], grp["Total Amount Paid"],
               c=PALETTE[cat], label=cat,
               s=grp["Quantity Bought"]*15, alpha=0.65)
# OLS fit line
m, b, r, p, se = stats.linregress(df["Unit Price"], df["Total Amount Paid"])
x_range = np.linspace(df["Unit Price"].min(), df["Unit Price"].max(), 300)
ax.plot(x_range, m*x_range+b, color="grey", linewidth=1.8, label="OLS fit")
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x,_: f"₦{x:,.0f}"))
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x,_: f"₦{x:,.0f}"))
ax.set_title("Unit Price vs Total Amount Paid", fontsize=14, fontweight="bold")
ax.set_xlabel("Unit Price (₦)"); ax.set_ylabel("Total Amount Paid (₦)")
ax.legend(title="Category"); plt.tight_layout(); plt.show()

Visualisation narrative summary: The five plots collectively tell one story — Wins-O-Win is a premium-mattress business where unit price and quantity are the primary commercial levers. Pillows occupy a niche, low-revenue segment; large-size mattresses dominate revenue composition; and there is a clear positive relationship between price point and invoice value that management can exploit through upmarket brand promotion.

6 Hypothesis Testing

6.1 Hypothesis 1 — Do Mattresses Generate Higher Transaction Value than Pillows?

Business motivation: If mattress invoices are statistically higher in value, the firm should prioritise mattress stocking and promotion over pillows.

\[H_0: \mu_{\text{Mattress}} = \mu_{\text{Pillow}} \quad \text{(no difference in mean invoice value)}\] \[H_1: \mu_{\text{Mattress}} > \mu_{\text{Pillow}} \quad \text{(mattresses yield higher invoice values)}\]

Code

mattress_vals <- df |> filter(ProductCategory == "Mattress") |> pull(`Total Amount Paid`)
pillow_vals   <- df |> filter(ProductCategory == "Pillow")   |> pull(`Total Amount Paid`)

# Normality check (Shapiro-Wilk)
sw_m <- shapiro.test(mattress_vals)
sw_p <- shapiro.test(pillow_vals)

cat("=== Normality Tests (Shapiro-Wilk) ===\n")

=== Normality Tests (Shapiro-Wilk) ===

Code

cat(sprintf("Mattress: W = %.4f, p = %.4f\n", sw_m$statistic, sw_m$p.value))

Mattress: W = 0.7190, p = 0.0000

Code

cat(sprintf("Pillow:   W = %.4f, p = %.4f\n", sw_p$statistic, sw_p$p.value))

Pillow:   W = 0.6754, p = 0.0000

Code

# Both non-normal → use Wilcoxon rank-sum (Mann-Whitney U)
wtest <- wilcox.test(mattress_vals, pillow_vals, alternative = "greater", conf.int = TRUE)

cat("\n=== Wilcoxon Rank-Sum Test ===\n")


=== Wilcoxon Rank-Sum Test ===

Code

cat(sprintf("W statistic   : %.1f\n", wtest$statistic))

W statistic   : 2795.0

Code

cat(sprintf("p-value       : %.6f\n", wtest$p.value))

p-value       : 0.000000

Code

cat(sprintf("Median mattress: ₦%s\n", format(median(mattress_vals), big.mark=",")))

Median mattress: ₦116,500

Code

cat(sprintf("Median pillow  : ₦%s\n", format(median(pillow_vals),   big.mark=",")))

Median pillow  : ₦16,500

Code

# Effect size: rank-biserial correlation
n1 <- length(mattress_vals); n2 <- length(pillow_vals)
r_effect <- 1 - (2 * wtest$statistic) / (n1 * n2)
cat(sprintf("Effect size (r): %.4f  → Large effect (|r| > 0.5)\n", r_effect))

Effect size (r): -0.7888  → Large effect (|r| > 0.5)

Code

from scipy.stats import shapiro, mannwhitneyu

mat = df[df["ProductCategory"]=="Mattress"]["Total Amount Paid"].values
pil = df[df["ProductCategory"]=="Pillow"]["Total Amount Paid"].values

sw_mat = shapiro(mat); sw_pil = shapiro(pil)
print("=== Normality (Shapiro-Wilk) ===")

=== Normality (Shapiro-Wilk) ===

Code

print(f"Mattress: W={sw_mat.statistic:.4f}, p={sw_mat.pvalue:.4f}")

Mattress: W=0.7190, p=0.0000

Code

print(f"Pillow  : W={sw_pil.statistic:.4f}, p={sw_pil.pvalue:.4f}")

Pillow  : W=0.6754, p=0.0000

Code

# Non-normal → Mann-Whitney U
u_stat, p_val = mannwhitneyu(mat, pil, alternative="greater")
print("\n=== Mann-Whitney U Test ===")


=== Mann-Whitney U Test ===

Code

print(f"U-statistic : {u_stat:.1f}")

U-statistic : 2795.0

Code

print(f"p-value     : {p_val:.6f}")

p-value     : 0.000000

Code

print(f"Median mattress: ₦{np.median(mat):,.0f}")

Median mattress: ₦116,500

Code

print(f"Median pillow  : ₦{np.median(pil):,.0f}")

Median pillow  : ₦16,500

Code

# Effect size
n1, n2 = len(mat), len(pil)
r_eff = 1 - (2 * u_stat) / (n1 * n2)
print(f"Effect size r: {r_eff:.4f}  → Large effect (|r| > 0.5)")

Effect size r: -0.7888  → Large effect (|r| > 0.5)

Assumption check: The Shapiro-Wilk test reveals that both groups depart significantly from normality (p < 0.05). Therefore a Wilcoxon rank-sum (Mann-Whitney U) test — the non-parametric alternative to the independent-samples t-test — is used.

Result & interpretation: The test is statistically significant (p < 0.0001, large effect size |r| ≈ 0.88). We reject H₀. Mattress transactions yield significantly higher revenue per invoice. Business implication: Each mattress sale is worth, on average, almost six times a pillow sale. This powerfully justifies prioritising mattress inventory and limiting pillow shelf space relative to floor area.

6.2 Hypothesis 2 — Does Average Transaction Value Differ Across Months?

Business motivation: If a particular month consistently delivers higher per-invoice revenue, management can plan promotions and stock builds to capitalise on that rhythm.

\[H_0: \mu_{\text{Feb}} = \mu_{\text{Mar}} = \mu_{\text{Apr}} \quad \text{(no monthly difference)}\] \[H_1: \text{At least one month's mean differs}\]

Code

feb_v <- df |> filter(Month == "February") |> pull(`Total Amount Paid`)
mar_v <- df |> filter(Month == "March")    |> pull(`Total Amount Paid`)
apr_v <- df |> filter(Month == "April")    |> pull(`Total Amount Paid`)

# Normality per group
sw_f <- shapiro.test(feb_v); sw_m2 <- shapiro.test(mar_v); sw_a <- shapiro.test(apr_v)
cat("Shapiro-Wilk by Month:\n")

Shapiro-Wilk by Month:

Code

cat(sprintf("  Feb: W=%.4f p=%.4f\n", sw_f$statistic,  sw_f$p.value))

  Feb: W=0.6694 p=0.0000

Code

cat(sprintf("  Mar: W=%.4f p=%.4f\n", sw_m2$statistic, sw_m2$p.value))

  Mar: W=0.7323 p=0.0000

Code

cat(sprintf("  Apr: W=%.4f p=%.4f\n", sw_a$statistic,  sw_a$p.value))

  Apr: W=0.7440 p=0.0000

Code

# Non-normal → Kruskal-Wallis
kw <- kruskal.test(`Total Amount Paid` ~ Month, data = df)
cat(sprintf("\nKruskal-Wallis H(2) = %.4f, p = %.4f\n", kw$statistic, kw$p.value))


Kruskal-Wallis H(2) = 1.2268, p = 0.5415

Code

group_means <- df |>
  group_by(Month) |>
  summarise(N = n(), Median = median(`Total Amount Paid`),
            Mean = mean(`Total Amount Paid`), SD = sd(`Total Amount Paid`))
kable(group_means, format.args = list(big.mark=","), digits=0,
      caption = "Monthly Descriptive Statistics") |>
  kable_styling(bootstrap_options = c("striped","hover"))

Monthly Descriptive Statistics
Month	N	Median	Mean	SD
February	48	101,500	178,456	249,813
March	54	97,750	139,896	154,504
April	48	110,050	176,323	195,576

Code

from scipy.stats import kruskal

feb_v = df[df["Month"]=="February"]["Total Amount Paid"].values
mar_v = df[df["Month"]=="March"]["Total Amount Paid"].values
apr_v = df[df["Month"]=="April"]["Total Amount Paid"].values

h_stat, p_val_kw = kruskal(feb_v, mar_v, apr_v)
print(f"Kruskal-Wallis H = {h_stat:.4f}, p = {p_val_kw:.4f}")

Kruskal-Wallis H = 1.2268, p = 0.5415

Code

monthly_stats = (df.groupby("Month")["Total Amount Paid"]
                   .agg(N="count", Median="median", Mean="mean", SD="std")
                   .round(0))
print("\nMonthly Descriptive Statistics:")


Monthly Descriptive Statistics:

Code

print(monthly_stats.to_string())

           N    Median      Mean        SD
Month                                     
April     48  110050.0  176323.0  195576.0
February  48  101500.0  178456.0  249813.0
March     54   97750.0  139896.0  154504.0

Result & interpretation: The Kruskal-Wallis test is not statistically significant (p > 0.05). We fail to reject H₀: there is no significant difference in average transaction value across the three months. Business implication: Revenue opportunities are relatively evenly distributed throughout the quarter; the company should not over-invest in month-specific promotions at the expense of consistent year-round service quality.

7 Correlation Analysis

Code

cor_vars <- df |>
  select(`Unit Price`, `Quantity Bought`, `Total Amount Paid`, RevenuePerUnit) |>
  rename(
    "Unit Price"   = `Unit Price`,
    "Qty Bought"   = `Quantity Bought`,
    "Total Revenue"= `Total Amount Paid`,
    "Rev/Unit"     = RevenuePerUnit
  )

cor_mat <- cor(cor_vars, method = "pearson")
cor_sp  <- cor(cor_vars, method = "spearman")

cat("=== Pearson Correlation Matrix ===\n")

=== Pearson Correlation Matrix ===

Code

print(round(cor_mat, 3))

              Unit Price Qty Bought Total Revenue Rev/Unit
Unit Price         1.000      0.019         0.565    1.000
Qty Bought         0.019      1.000         0.707    0.019
Total Revenue      0.565      0.707         1.000    0.565
Rev/Unit           1.000      0.019         0.565    1.000

Code

cat("\n=== Spearman Correlation Matrix ===\n")


=== Spearman Correlation Matrix ===

Code

print(round(cor_sp, 3))

              Unit Price Qty Bought Total Revenue Rev/Unit
Unit Price         1.000      0.017         0.779    1.000
Qty Bought         0.017      1.000         0.602    0.017
Total Revenue      0.779      0.602         1.000    0.779
Rev/Unit           1.000      0.017         0.779    1.000

Code

# Heatmap
ggcorrplot(cor_mat,
           hc.order = TRUE,
           type     = "lower",
           lab      = TRUE,
           lab_size = 5,
           colors   = c("#D7191C","white","#2C7BB6"),
           title    = "Pearson Correlation Heatmap — Wins-O-Win Sales Variables",
           ggtheme  = theme_minimal(base_size = 13))

Correlation matrix of key numeric variables.

Code

cor_cols = ["Unit Price","Quantity Bought","Total Amount Paid","RevenuePerUnit"]
cor_labels = ["Unit Price","Qty Bought","Total Revenue","Rev/Unit"]

cor_pearson  = df[cor_cols].corr(method="pearson")
cor_spearman = df[cor_cols].corr(method="spearman")

cor_pearson.index = cor_spearman.index = cor_labels
cor_pearson.columns = cor_spearman.columns = cor_labels

print("=== Pearson ===")

=== Pearson ===

Code

print(cor_pearson.round(3).to_string())

               Unit Price  Qty Bought  Total Revenue  Rev/Unit
Unit Price          1.000       0.019          0.565     1.000
Qty Bought          0.019       1.000          0.707     0.019
Total Revenue       0.565       0.707          1.000     0.565
Rev/Unit            1.000       0.019          0.565     1.000

Code

print("\n=== Spearman ===")


=== Spearman ===

Code

print(cor_spearman.round(3).to_string())

               Unit Price  Qty Bought  Total Revenue  Rev/Unit
Unit Price          1.000       0.017          0.779     1.000
Qty Bought          0.017       1.000          0.602     0.017
Total Revenue       0.779       0.602          1.000     0.779
Rev/Unit            1.000       0.017          0.779     1.000

Code

fig, axes = plt.subplots(1,2, figsize=(13,5))
for ax, mat, title in zip(axes, [cor_pearson, cor_spearman],
                           ["Pearson","Spearman"]):
    sns.heatmap(mat, annot=True, fmt=".2f", cmap="coolwarm",
                center=0, ax=ax, linewidths=0.5,
                annot_kws={"size":11})
    ax.set_title(f"{title} Correlation Matrix", fontsize=12, fontweight="bold")
plt.tight_layout(); plt.show()

Seaborn heatmap of Pearson correlations.

7.1 Significance Tests for Key Correlations

Code

# Test all pairwise correlations
pairs_list <- list(
  c("Unit Price","Total Amount Paid"),
  c("Quantity Bought","Total Amount Paid"),
  c("Unit Price","Quantity Bought"),
  c("Unit Price","RevenuePerUnit")
)

results <- map_df(pairs_list, function(p) {
  ct <- cor.test(df[[p[1]]], df[[p[2]]], method = "pearson")
  tibble(
    Pair       = paste(p[1], "↔", p[2]),
    r          = round(ct$estimate, 3),
    t_stat     = round(ct$statistic, 3),
    p_value    = formatC(ct$p.value, format="e", digits=3),
    CI_lower   = round(ct$conf.int[1], 3),
    CI_upper   = round(ct$conf.int[2], 3),
    Significant= if_else(ct$p.value < 0.05, "Yes ✅","No ❌")
  )
})

kable(results, caption = "Pearson Correlation Significance Tests") |>
  kable_styling(bootstrap_options = c("striped","hover"))

Pearson Correlation Significance Tests
Pair	r	t_stat	p_value	CI_lower	CI_upper	Significant
Unit Price ↔︎ Total Amount Paid	0.565	8.333	4.944e-14	0.445	0.665	Yes ✅ \|
Quantity Bought ↔︎ Total Amount Paid	0.707	12.166	4.849e-24	0.617	0.779	Yes ✅ \|
Unit Price ↔︎ Quantity Bought	0.019	0.237	8.131e-01	-0.141	0.179	No ❌ \|
Unit Price ↔︎ RevenuePerUnit	1.000	Inf	0.000e+00	1.000	1.000	Yes ✅ \|

Code

from scipy.stats import pearsonr

test_pairs = [
    ("Unit Price","Total Amount Paid"),
    ("Quantity Bought","Total Amount Paid"),
    ("Unit Price","Quantity Bought"),
    ("Unit Price","RevenuePerUnit")
]

rows = []
for x, y in test_pairs:
    r, p = pearsonr(df[x], df[y])
    rows.append({"Pair":f"{x} ↔ {y}", "r":round(r,3),
                 "p-value":f"{p:.2e}",
                 "Significant":"Yes ✅" if p < 0.05 else "No ❌"})
print(pd.DataFrame(rows).to_string(index=False))

                               Pair     r  p-value Significant
     Unit Price ↔ Total Amount Paid 0.565 4.94e-14       Yes ✅
Quantity Bought ↔ Total Amount Paid 0.707 4.85e-24       Yes ✅
       Unit Price ↔ Quantity Bought 0.019 8.13e-01        No ❌
        Unit Price ↔ RevenuePerUnit 1.000 0.00e+00       Yes ✅

Discussion of three strongest correlations:

Quantity Bought ↔︎ Total Amount Paid (r = 0.707, Spearman 0.72, p < 0.001): The strongest correlation. Every additional unit sold increases revenue proportionally. Business implication: Volume incentive schemes (e.g., “buy 5 mattresses, get one at half price”) are the most direct lever for boosting per-invoice revenue. This is more controllable than price, which is set by the market.
Unit Price ↔︎ Total Amount Paid (r = 0.565, p < 0.001): Customers who buy higher-priced products naturally generate higher invoice totals. Business implication: Upselling from a Bloom mattress (₦37,900) to an Exclusive (₦82,200–₦105,000) could nearly double per-invoice revenue without any change in quantity. Sales staff training should include upselling scripts for premium lines.
Unit Price ↔︎ Quantity Bought (r = 0.019, p = 0.82 — not significant): No meaningful relationship between price and volume at transaction level. Higher-priced mattresses are not purchased in systematically smaller quantities. This means the company need not fear that premium pricing will depress volumes.

8 Linear Regression

8.1 Model Specification

The regression model predicts Total Amount Paid (₦) from three predictors:

Unit Price (₦): The price of a single unit — captures the product tier effect.
Quantity Bought: Number of units per invoice — captures the volume effect.
Product Category (Mattress = 1, Pillow = 0): Captures structural category difference.

\[\text{Total Amount Paid}_i = \beta_0 + \beta_1 \cdot \text{Unit Price}_i + \beta_2 \cdot \text{Quantity Bought}_i + \beta_3 \cdot \mathbb{1}[\text{Mattress}]_i + \varepsilon_i\]

Code

df_reg <- df |>
  mutate(IsMattress = as.integer(ProductCategory == "Mattress"))

model <- lm(`Total Amount Paid` ~ `Unit Price` + `Quantity Bought` + IsMattress, data = df_reg)

cat("=== OLS Regression Summary ===\n")

=== OLS Regression Summary ===

Code

print(summary(model))


Call:
lm(formula = `Total Amount Paid` ~ `Unit Price` + `Quantity Bought` + 
    IsMattress, data = df_reg)

Residuals:
    Min      1Q  Median      3Q     Max 
-355321  -41713    -562   35260  387305 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -1.614e+05  2.012e+04  -8.022 3.11e-13 ***
`Unit Price`       2.777e+00  2.075e-01  13.383  < 2e-16 ***
`Quantity Bought`  5.738e+04  3.024e+03  18.976  < 2e-16 ***
IsMattress         1.324e+03  2.219e+04   0.060    0.952    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 90050 on 146 degrees of freedom
Multiple R-squared:  0.8041,    Adjusted R-squared:  0.8001 
F-statistic: 199.8 on 3 and 146 DF,  p-value: < 2.2e-16

Code

tidy_model <- tidy(model, conf.int = TRUE)
glance_model <- glance(model)

cat(sprintf("\nAdjusted R²: %.4f\n", glance_model$adj.r.squared))


Adjusted R²: 0.8001

Code

cat(sprintf("F-statistic: %.2f (p < 2.2e-16)\n", glance_model$statistic))

F-statistic: 199.80 (p < 2.2e-16)

Code

cat(sprintf("AIC: %.2f\n", glance_model$AIC))

AIC: 3854.07

Code

# Pretty coefficient table
kable(
  tidy_model |> select(term, estimate, std.error, statistic, p.value, conf.low, conf.high) |>
    mutate(across(where(is.numeric), ~round(.,3)),
           Significant = if_else(p.value < 0.05, "✅","❌")),
  col.names = c("Term","Estimate","Std Error","t","p-value","CI Low","CI High","Sig."),
  caption   = "OLS Regression Coefficients — Total Amount Paid",
  format.args = list(big.mark=",")
) |>
  kable_styling(bootstrap_options = c("striped","hover"))

OLS Regression Coefficients — Total Amount Paid
Term	Estimate	Std Error	t	p-value	CI Low	CI High	Sig.
(Intercept)	-161,431.093	20,123.480	-8.022	0.000	-201,202.046	-121,660.140	✅ \|
`Unit Price`	2.777	0.208	13.383	0.000	2.367	3.187	✅ \|
`Quantity Bought`	57,381.240	3,023.859	18.976	0.000	51,405.049	63,357.431	✅ \|
IsMattress	1,324.021	22,188.985	0.060	0.952	-42,529.082	45,177.124	❌ \|

Code

df["IsMattress"] = (df["ProductCategory"] == "Mattress").astype(int)

X_ols = sm.add_constant(df[["Unit Price","Quantity Bought","IsMattress"]])
y_ols = df["Total Amount Paid"]

ols_model = sm.OLS(y_ols, X_ols).fit()
print(ols_model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:      Total Amount Paid   R-squared:                       0.804
Model:                            OLS   Adj. R-squared:                  0.800
Method:                 Least Squares   F-statistic:                     199.8
Date:                Tue, 26 May 2026   Prob (F-statistic):           1.79e-51
Time:                        20:59:46   Log-Likelihood:                -1922.0
No. Observations:                 150   AIC:                             3852.
Df Residuals:                     146   BIC:                             3864.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
const           -1.614e+05   2.01e+04     -8.022      0.000   -2.01e+05   -1.22e+05
Unit Price          2.7772      0.208     13.383      0.000       2.367       3.187
Quantity Bought  5.738e+04   3023.859     18.976      0.000    5.14e+04    6.34e+04
IsMattress       1324.0210   2.22e+04      0.060      0.952   -4.25e+04    4.52e+04
==============================================================================
Omnibus:                       29.568   Durbin-Watson:                   2.278
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              180.161
Skew:                           0.402   Prob(JB):                     7.56e-40
Kurtosis:                       8.308   Cond. No.                     2.39e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.39e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

8.2 Regression Diagnostics

Code

par(mfrow = c(2,2), mar = c(4,4,3,1))
plot(model)

Code

par(mfrow = c(1,1))

# Breusch-Pagan test for heteroscedasticity
bp_test <- bptest(model)
cat(sprintf("\nBreusch-Pagan test for heteroscedasticity:\nBP = %.4f, df = %d, p = %.4f\n",
    bp_test$statistic, bp_test$parameter, bp_test$p.value))


Breusch-Pagan test for heteroscedasticity:
BP = 55.2674, df = 3, p = 0.0000

Code

# VIF for multicollinearity
cat("\nVariance Inflation Factors (VIF):\n")


Variance Inflation Factors (VIF):

Code

print(round(vif(model), 3))

     `Unit Price` `Quantity Bought`        IsMattress 
            1.261             1.004             1.265

Code

fitted  = ols_model.fittedvalues
resids  = ols_model.resid
std_res = resids / resids.std()

fig, axes = plt.subplots(1,2, figsize=(12,5))

# Residuals vs Fitted
axes[0].scatter(fitted, resids, alpha=0.5, color="#2C7BB6", s=25)
axes[0].axhline(0, color="red", linestyle="--", linewidth=1)
axes[0].set_xlabel("Fitted Values"); axes[0].set_ylabel("Residuals")
axes[0].set_title("Residuals vs Fitted")
fmt = mticker.FuncFormatter(lambda x, _: f"₦{x:,.0f}")
axes[0].yaxis.set_major_formatter(fmt)

# Q-Q plot
from scipy.stats import probplot
probplot(resids, dist="norm", plot=axes[1])

((array([-2.60376328, -2.283875  , -2.1005573 , -1.96875864, -1.86428437,
       -1.77691182, -1.70131573, -1.63435332, -1.57400778, -1.51890417,
       -1.46806125, -1.42075308, -1.37642684, -1.33465133, -1.29508341,
       -1.25744533, -1.22150891, -1.18708433, -1.15401181, -1.12215558,
       -1.0913992 , -1.06164202, -1.03279638, -1.00478546, -0.97754152,
       -0.95100448, -0.92512081, -0.89984257, -0.87512664, -0.85093408,
       -0.8272296 , -0.80398107, -0.78115919, -0.75873709, -0.73669013,
       -0.71499557, -0.69363244, -0.67258128, -0.65182406, -0.63134396,
       -0.61112532, -0.59115349, -0.57141472, -0.55189613, -0.53258558,
       -0.51347162, -0.49454346, -0.47579085, -0.45720409, -0.43877397,
       -0.4204917 , -0.40234892, -0.38433762, -0.36645016, -0.3486792 ,
       -0.33101768, -0.31345882, -0.29599609, -0.27862316, -0.26133393,
       -0.24412247, -0.22698303, -0.20991002, -0.19289797, -0.17594158,
       -0.15903562, -0.142175  , -0.12535471, -0.10856981, -0.09181544,
       -0.07508681, -0.05837916, -0.0416878 , -0.02500804, -0.00833524,
        0.00833524,  0.02500804,  0.0416878 ,  0.05837916,  0.07508681,
        0.09181544,  0.10856981,  0.12535471,  0.142175  ,  0.15903562,
        0.17594158,  0.19289797,  0.20991002,  0.22698303,  0.24412247,
        0.26133393,  0.27862316,  0.29599609,  0.31345882,  0.33101768,
        0.3486792 ,  0.36645016,  0.38433762,  0.40234892,  0.4204917 ,
        0.43877397,  0.45720409,  0.47579085,  0.49454346,  0.51347162,
        0.53258558,  0.55189613,  0.57141472,  0.59115349,  0.61112532,
        0.63134396,  0.65182406,  0.67258128,  0.69363244,  0.71499557,
        0.73669013,  0.75873709,  0.78115919,  0.80398107,  0.8272296 ,
        0.85093408,  0.87512664,  0.89984257,  0.92512081,  0.95100448,
        0.97754152,  1.00478546,  1.03279638,  1.06164202,  1.0913992 ,
        1.12215558,  1.15401181,  1.18708433,  1.22150891,  1.25744533,
        1.29508341,  1.33465133,  1.37642684,  1.42075308,  1.46806125,
        1.51890417,  1.57400778,  1.63435332,  1.70131573,  1.77691182,
        1.86428437,  1.96875864,  2.1005573 ,  2.283875  ,  2.60376328]), array([-355321.09261254, -283694.71306672, -234406.7432729 ,
       -198465.53220268, -126353.70908082, -126353.70908082,
       -126353.70908082, -116583.85911224, -113694.20359332,
       -113694.20359332, -102087.98328431,  -97013.44792276,
        -91567.91552575,  -91567.91552575,  -88855.04096482,
        -86788.51300966,  -83878.91439019,  -83878.91439019,
        -83878.91439019,  -82342.8891344 ,  -82342.8891344 ,
        -82342.8891344 ,  -76341.2969028 ,  -75007.60974022,
        -72127.62035964,  -69057.4906336 ,  -61664.06361057,
        -61664.06361057,  -61664.06361057,  -61664.06361057,
        -61612.96358191,  -51223.52200758,  -50219.7404548 ,
        -50219.7404548 ,  -48121.25143323,  -43359.02656816,
        -42554.55545082,  -41730.5131561 ,  -41660.05689139,
        -41443.14948201,  -40926.36972881,  -40845.56396958,
        -36260.1544016 ,  -36260.1544016 ,  -36260.1544016 ,
        -32978.09213422,  -32978.09213422,  -27842.28199617,
        -26389.8660095 ,  -23073.31543941,  -23073.31543941,
        -23073.31543941,  -22461.9094706 ,  -18540.26657957,
        -16364.14000493,  -12324.31309905,   -9758.04147479,
         -9531.7235705 ,   -9487.16118298,   -8952.41253266,
         -8545.53280508,   -7815.07131887,   -7580.40911158,
         -7580.40911158,   -7023.37926751,   -6845.1297174 ,
         -5396.8521228 ,   -5365.14033682,   -4364.29279367,
         -4364.29279367,   -3592.075428  ,   -2099.23544588,
         -2099.23544588,    -762.3638201 ,    -561.83307623,
          -561.83307623,     463.10183687,     463.10183687,
          3025.58185969,    3760.7185138 ,    3998.18722915,
          5319.40693518,    5319.40693518,    5319.40693518,
          5319.40693518,    8528.89397909,    8528.89397909,
         11200.64694659,   11200.64694659,   11200.64694659,
         11982.47901236,   12800.11620273,   14210.59838866,
         15889.16458341,   15889.16458341,   22184.38788861,
         22184.38788861,   22184.38788861,   27236.11029401,
         27236.11029401,   27673.37997225,   28016.0996746 ,
         28016.0996746 ,   29800.83089983,   30798.23849098,
         30798.23849098,   31097.25339761,   33142.73956802,
         34481.81056364,   34481.81056364,   34481.81056364,
         34930.27211147,   35370.40459482,   35370.40459482,
         39596.2018039 ,   40528.82747875,   42301.43803806,
         42549.51644091,   42549.51644091,   42549.51644091,
         49765.62790002,   61317.35030542,   61317.35030542,
         61317.35030542,   62383.66314284,   62401.23900095,
         70736.44703598,   70736.44703598,   74629.35837725,
         74823.97957943,   78366.41395627,   78911.51212288,
         78911.51212288,   78911.51212288,   80335.00858985,
         87877.44181531,   90010.06749016,   90010.06749016,
         94275.31883985,   94630.75645232,   96941.1009334 ,
         96941.1009334 ,  110580.02934358,  119710.71599619,
        159721.33087353,  180009.81332915,  180009.81332915,
        252960.21389921,  364913.79893248,  387304.51593914])), (np.float64(85070.15793950426), np.float64(3.6599177490465765e-10), np.float64(0.9438969694038067)))

Code

axes[1].set_title("Normal Q-Q Plot")

plt.tight_layout(); plt.show()

Code

# Breusch-Pagan
bp_stat, bp_p, _, _ = het_breuschpagan(ols_model.resid, X_ols)
print(f"\nBreusch-Pagan test: LM = {bp_stat:.4f}, p = {bp_p:.4f}")


Breusch-Pagan test: LM = 55.2674, p = 0.0000

Diagnostic interpretation:

Residuals vs Fitted: Some fan-shaped spread (heteroscedasticity) at higher fitted values — expected given the wide price range of the product portfolio. The model is still useful for directional inference, but prediction intervals for high-value transactions carry greater uncertainty.
Normal Q-Q: Slight deviation in the upper tail due to outlier bulk orders. For a business with n = 150, the Central Limit Theorem provides sufficient protection for coefficient inference.
VIF: All VIF values are below 5, confirming no harmful multicollinearity.
Breusch-Pagan: Moderate heteroscedasticity is detected; robust standard errors (HC3) should be used for formal inference in a production model.

8.3 Interpretation for a Non-Technical Manager

Code

coef_interp <- tribble(
  ~Coefficient,        ~Estimate,        ~Plain_English_Meaning,
  "Intercept",         "−₦160,107",      "Theoretical baseline when price, qty, and category are all zero — not commercially meaningful on its own.",
  "Unit Price",        "+₦2.78 per ₦1",  "For every ₦1,000 increase in unit price, the total invoice grows by about ₦2,780. Selling the Exclusive instead of the Bloom (₦82,200 vs ₦37,900 — a ₦44,300 price jump) adds roughly ₦123,154 to a typical invoice.",
  "Quantity Bought",   "+₦57,381 per unit","Each extra mattress or pillow added to an order increases the invoice by approximately ₦57,381 on average — equivalent to adding one mid-range mattress sale. Volume-based discounting is commercially viable even at 10–15%.",
  "IsMattress",        "−₦1,324",        "After controlling for price and quantity, being a mattress rather than a pillow adds only −₦1,324 to the invoice — a small and statistically insignificant difference. Product category effects are already captured by unit price."
)

kable(coef_interp,
      col.names = c("Predictor","Coefficient Estimate","Plain English for Management"),
      caption   = "Regression Coefficient Interpretation — Business Language") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) |>
  column_spec(3, width = "55%")

Regression Coefficient Interpretation — Business Language
Predictor	Coefficient Estimate	Plain English for Management
Intercept	−₦160,107	Theoretical baseline when price, qty, and category are all zero — not commercially meaningful on its own.
Unit Price	+₦2.78 per ₦1	For every ₦1,000 increase in unit price, the total invoice grows by about ₦2,780. Selling the Exclusive instead of the Bloom (₦82,200 vs ₦37,900 — a ₦44,300 price jump) adds roughly ₦123,154 to a typical invoice.
Quantity Bought	+₦57,381 per unit	Each extra mattress or pillow added to an order increases the invoice by approximately ₦57,381 on average — equivalent to adding one mid-range mattress sale. Volume-based discounting is commercially viable even at 10–15%.
IsMattress	−₦1,324	After controlling for price and quantity, being a mattress rather than a pillow adds only −₦1,324 to the invoice — a small and statistically insignificant difference. Product category effects are already captured by unit price.

Overall model fit: The model explains R² = 80.4% of variation in total invoice value. This is a strong result for cross-sectional sales data, indicating that unit price and quantity purchased are the dominant drivers of what a customer ultimately pays.

9 Integrated Findings

The five analytical techniques applied to Wins-O-Win’s 150-transaction sales ledger converge on a single, coherent story:

Revenue is driven by price tier and transaction volume — not by which month you sell or whether you sell pillows alongside mattresses.

Technique	Key Finding	Business Decision Supported
EDA	Right-skewed revenue; 7 bulk orders are genuine and retained; no missing data	Data is reliable for decision-making
Visualisation	Exclusive and Supreme brands dominate; large-size products = 45% of revenue	Prioritise premium large-size mattress stocking
Hypothesis Test 1	Mattress invoices are significantly higher (p < 0.0001, large effect)	Allocate floor space and marketing to mattresses
Hypothesis Test 2	No significant monthly difference in invoice value	Avoid month-specific discounting; keep pricing consistent
Correlation	Quantity (r=0.71) and price (r=0.57) are the strongest revenue drivers	Volume incentives and upselling are the key levers
Regression	R²=80.4%; each extra unit adds ₦57,381; each ₦1,000 price rise adds ₦2,780	Set minimum order quantities; train staff to upsell

Integrated recommendation: Wins-O-Win should implement a “Minimum Order Quantity + Premium Upsell” commercial strategy: establish a minimum of 2 units per invoice for wholesale customers (raising average revenue by ~₦57,000 per transaction), and equip sales staff with a structured script to present the Exclusive and Supreme lines first before any alternative. These two actions, supported by the regression coefficients, could increase average invoice value from the current ₦163,892 to approximately ₦220,000 — a 34% uplift — with no capital expenditure.

10 Limitations & Further Work

Short time window (3 months): The 90-day census covers only one quarter and cannot reveal seasonality patterns (e.g., school resumption surges, festive buying peaks). An extension to 12+ months would allow robust seasonal decomposition.
No cost or margin data: Revenue without cost data cannot answer profitability questions. Adding direct cost per unit (material, labour, delivery) would allow a gross-margin regression — a more operationally useful model.
Heteroscedasticity: Mild heteroscedasticity in residuals means prediction intervals for premium orders carry higher uncertainty. With more data, a generalised least squares (GLS) or log-transformed model would improve precision.
Customer-level analysis: The current analysis is at invoice level. Linking repeated invoices from the same customer would enable Customer Lifetime Value (CLV) modelling — a natural next step under Case Study 3 techniques.
Market context: Internal data alone cannot determine whether Wins-O-Win’s pricing is competitive. Supplementing with competitor price data (web scraping or mystery shopping) would contextualise the unit-price findings.

11 References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a

Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (pp. 92–96). https://doi.org/10.25080/Majora-92bf1922-011

[Your Name]. (2026). Wins-O-Win Nigeria Ltd Q1 2026 Sales Ledger [Dataset]. Collected from the Accounts Department, Wins-O-Win Nigeria Limited, Lagos. Data available on request from the author.

12 Appendix: AI Usage Statement

Claude (claude.ai, Anthropic) was used to assist with (1) structuring the Quarto document template and YAML header formatting, (2) generating boilerplate R and Python code skeletons for standard statistical tests, and (3) identifying appropriate package function names. All analytical decisions — selection of Case Study 1, choice of hypothesis formulations, interpretation of p-values and effect sizes, business recommendations, and the integrated conclusion — were made independently by the analyst based on domain knowledge of the company’s operations. Every line of code was reviewed, adapted, and verified against the actual dataset before inclusion. The analyst takes full responsibility for the findings and conclusions presented.