Exploratory & Inferential Analytics: Odes by Mirabelle

Author

[Your Full Name]

Published

May 26, 2026

1. Executive Summary

Odes by Mirabelle is a bespoke fashion atelier based in Nigeria, specialising in bridal, traditional, and occasion wear. This analysis examines 539 orders recorded between January 2024 and November 2025, covering order value, production cost, profit margin, order type, and client retention behaviour.

Five analytical techniques Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression were applied to answer a core business question: What drives profitability, and how can the business grow revenue while protecting margins?

Key findings reveal that bridal orders generate the highest average revenue (₦374,033) and slightly higher profit margins than non-bridal orders. Two orders recorded negative margins, indicating underpricing or cost overruns that require attention. Repeat clients account for 86% of all orders, signalling strong retention but a need to expand the client base. Production cost is a moderate predictor of order value, and order type significantly explains profit margin variation. The integrated recommendation is to prioritise bridal and reception order acquisition while implementing a cost-cap policy for Traditional orders to protect margins.

2. Professional Disclosure

I double as the founder and lead designer at Odes by Mirabelle Ojaruega a bespoke fashion business operating in Nigeria. The business creates custom-made outfits primarily for weddings, traditional ceremonies, and everyday occasions.

Relevance of each technique to my role:

EDA: Understanding the distribution of order values and margins helps me identify pricing outliers and flag orders that may have been underquoted.
Data Visualisation: Revenue and order trend charts inform monthly planning, staffing, and fabric procurement decisions.
Hypothesis Testing: Formally testing whether bridal orders are more profitable than non-bridal orders supports strategic decisions about which order types to prioritise in marketing.
Correlation Analysis: Understanding which variables move together (e.g. production cost and order value) helps identify whether cost controls translate to better margins.
Regression: Predicting profit margin from order characteristics allows me to build a pricing model that flags low-margin orders before production begins.

3. Data Collection & Sampling

Source: Primary data extracted from Odes by Mirabelle Ojaruega is internal order management record a manually maintained spreadsheet tracking all client orders from January 2024 to November 2025.

Collection method: Direct field observation and record extraction. Each row represents one outfit order placed by a client. Data was recorded at point of sale and updated upon completion of production.

Sampling frame: All completed orders within the business period January 2024 – November 2025. No random sampling was applied; this is a census of all 539 recorded orders.

Sample size: 539 observations across 14 variables.

Time period covered: January 2024 – November 2025 (23 months).

Variables:

Variable	Type	Description
Order_ID	Character	Unique order identifier
Client_Name	Character	Client first name
Month	Character	Month of order
Year	Integer	Year of order
Period	Character	Year-Month (YYYY-MM)
Outfit_Description	Character	Brief description of the outfit
Order_Type	Character	Category of order (7 types)
Order_Value_NGN	Numeric	Total order price in Naira
Production_Cost_NGN	Numeric	Material and labour cost
Profit_Margin_NGN	Numeric	Order Value minus Production Cost
Profit_Margin_Pct	Numeric	Profit as % of order value
Is_Bridal	Binary (0/1)	1 = Bridal order
Is_Repeat_Client	Binary (0/1)	1 = Client has ordered before
Total_Orders_Client	Integer	Cumulative orders per client

Ethical notes: All client names are first names only. No sensitive personal or financial data beyond order transactions is stored. Data sharing is limited to this academic submission and will not be published publicly.

4. Data Description

Code

library(tidyverse)
library(ggplot2)
library(corrplot)
library(scales)
library(knitr)
library(kableExtra)

# Load data
df <- read_csv("odes by mirabelle dataset.csv")

# Convert types
df <- df %>%
  mutate(
    Order_Type = as.factor(Order_Type),
    Is_Bridal = as.factor(Is_Bridal),
    Is_Repeat_Client = as.factor(Is_Repeat_Client),
    Period = as.character(Period)
  )

# Overview
glimpse(df)

Rows: 539
Columns: 14
$ Order_ID            <chr> "OM0001", "OM0002", "OM0003", "OM0004", "OM0005", …
$ Client_Name         <chr> "Tio", "Tio", "Tio", "Uche", "Uche", "Uche", "Mumm…
$ Month               <chr> "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "…
$ Year                <dbl> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 20…
$ Period              <chr> "2024-01", "2024-01", "2024-01", "2024-01", "2024-…
$ Outfit_Description  <chr> "Wedding dress", "Pink after party dress", "Tan As…
$ Order_Type          <fct> Bridal, Reception/After Party, Traditional/Occasio…
$ Order_Value_NGN     <dbl> 380000, 150000, 90000, 85000, 125000, 40000, 55000…
$ Production_Cost_NGN <dbl> 160000, 30000, 25000, 15000, 10000, 12000, 12000, …
$ Profit_Margin_NGN   <dbl> 220000, 120000, 65000, 70000, 115000, 28000, 43000…
$ Profit_Margin_Pct   <dbl> 57.9, 80.0, 72.2, 82.4, 92.0, 70.0, 78.2, 73.3, 80…
$ Is_Bridal           <fct> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Is_Repeat_Client    <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,…
$ Total_Orders_Client <dbl> 4, 4, 4, 3, 3, 3, 9, 17, 17, 17, 17, 17, 3, 2, 1, …

Code

df %>%
  select(Order_Value_NGN, Production_Cost_NGN, Profit_Margin_NGN,
         Profit_Margin_Pct, Total_Orders_Client) %>%
  summary() %>%
  kable(caption = "Summary Statistics — Numeric Variables") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Summary Statistics — Numeric Variables
Order_Value_NGN	Production_Cost_NGN	Profit_Margin_NGN	Profit_Margin_Pct	Total_Orders_Client
Min. : 5000	Min. : 0	Min. :-210500	Min. :-1403.30	Min. : 1.000
1st Qu.: 17000	1st Qu.: 0	1st Qu.: 13000	1st Qu.: 66.20	1st Qu.: 2.000
Median : 30000	Median : 4000	Median : 22000	Median : 85.60	Median : 5.000
Mean : 87353	Mean : 11341	Mean : 76012	Mean : 76.78	Mean : 6.262
3rd Qu.: 70000	3rd Qu.: 10000	3rd Qu.: 63500	3rd Qu.: 100.00	3rd Qu.: 9.000
Max. :800000	Max. :500000	Max. : 800000	Max. : 100.00	Max. :18.000

Code

df %>%
  count(Order_Type, sort = TRUE) %>%
  mutate(Pct = round(n / sum(n) * 100, 1)) %>%
  kable(col.names = c("Order Type", "Count", "% of Orders"),
        caption = "Distribution of Order Types") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Distribution of Order Types
Order Type	Count	% of Orders
Traditional/Occasion Wear	435	80.7
Bridal	60	11.1
Reception/After Party	18	3.3
Bridesmaid	12	2.2
Corporate/RTW	8	1.5
Children	4	0.7
Costume/Uniform	2	0.4

Variable distributions: Order values are heavily right-skewed, ranging from ₦5,000 (children’s outfit) to ₦800,000 (bridal gown), with a median of ₦30,000. Profit margin percentages show a bimodal pattern many orders cluster near 100% margin (low material cost items) and another group in the 60–85% range. Two orders recorded negative profit margins, flagged as outliers below.

5. Exploratory Data Analysis (EDA)

Theory: EDA involves summarising the main characteristics of data using statistics and visualisation, before formal modelling. Key tasks include identifying distributions, detecting missing values, and flagging outliers (Ch. 4 — Anscombe’s Quartet, missing-value analysis, outlier detection).

Business justification: Before making any pricing or production decisions, I need to understand what the data actually contains — whether values are plausible, whether margins are consistent, and whether any records contain errors. EDA is the foundation of all subsequent analysis.

Code

# Missing values check
missing_summary <- df %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing")

kable(missing_summary, caption = "Missing Value Check — All Variables") %>%
  kable_styling(bootstrap_options = c("striped"))

Missing Value Check — All Variables
Variable	Missing
Order_ID	0
Client_Name	0
Month	0
Year	0
Period	0
Outfit_Description	0
Order_Type	0
Order_Value_NGN	0
Production_Cost_NGN	0
Profit_Margin_NGN	0
Profit_Margin_Pct	0
Is_Bridal	0
Is_Repeat_Client	0
Total_Orders_Client	0

No missing values were found across all 14 variables. The dataset is complete.

Code

# Identify outlier orders (negative margins)
negative_margin <- df %>%
  filter(Profit_Margin_NGN < 0) %>%
  select(Order_ID, Client_Name, Order_Type, Order_Value_NGN,
         Production_Cost_NGN, Profit_Margin_NGN, Profit_Margin_Pct)

kable(negative_margin,
      caption = "Data Quality Issue 1: Orders with Negative Profit Margins") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Data Quality Issue 1: Orders with Negative Profit Margins
Order_ID	Client_Name	Order_Type	Order_Value_NGN	Production_Cost_NGN	Profit_Margin_NGN	Profit_Margin_Pct
OM0302	Lady O	Traditional/Occasion Wear	15000	225500	-210500	-1403.3
OM0480	Doyin	Traditional/Occasion Wear	35000	50000	-15000	-42.9

Data Quality Issue 1 — Negative margins: Two orders (OM0302, OM0480) recorded negative profit margins, meaning production cost exceeded the price charged to the client. This suggests either a pricing error or significant cost overruns. These records are retained in the dataset but flagged; they are excluded from the regression model.

Code

# High value outliers — orders over 300,000 NGN
high_value <- df %>%
  filter(Order_Value_NGN > 300000) %>%
  select(Order_ID, Order_Type, Order_Value_NGN, Profit_Margin_Pct)

kable(high_value,
      caption = "Data Quality Issue 2: High-Value Orders (> ₦300,000)") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Data Quality Issue 2: High-Value Orders (> ₦300,000)
Order_ID	Order_Type	Order_Value_NGN	Profit_Margin_Pct
OM0001	Bridal	380000	57.9
OM0022	Bridal	800000	37.5
OM0046	Bridal	380000	100.0
OM0050	Bridal	650000	58.5
OM0058	Bridal	800000	100.0
OM0064	Bridal	650000	100.0
OM0071	Bridal	750000	100.0
OM0072	Traditional/Occasion Wear	350000	100.0
OM0083	Traditional/Occasion Wear	350000	100.0
OM0085	Bridal	750000	100.0
OM0091	Bridal	320000	100.0
OM0094	Bridal	650000	100.0
OM0109	Bridal	470000	100.0
OM0117	Bridal	470000	100.0
OM0119	Reception/After Party	370000	100.0
OM0122	Bridal	450000	85.6
OM0126	Bridal	650000	76.9
OM0127	Bridal	550000	81.8
OM0129	Bridal	720000	100.0
OM0131	Bridal	550000	100.0
OM0134	Bridal	440000	100.0
OM0137	Bridal	750000	100.0
OM0140	Bridal	450000	100.0
OM0141	Bridal	717000	100.0
OM0145	Bridal	440000	100.0
OM0150	Bridal	500000	100.0
OM0152	Bridal	470000	100.0
OM0154	Bridal	550000	100.0
OM0155	Reception/After Party	370000	100.0
OM0156	Bridal	600000	100.0
OM0159	Bridal	550000	100.0
OM0160	Bridal	450000	100.0
OM0163	Bridal	400000	100.0
OM0164	Reception/After Party	450000	100.0
OM0263	Traditional/Occasion Wear	450000	100.0
OM0285	Bridal	650000	100.0
OM0287	Bridal	600000	100.0
OM0320	Bridal	400000	100.0
OM0381	Bridal	450000	62.2
OM0462	Bridal	450000	62.2
OM0473	Bridal	500000	62.0

Data Quality Issue 2 — High-value outliers: All orders above ₦300,000 are Bridal orders. These are legitimate but create strong right-skew in the order value distribution. Analyses using order value will apply log transformation where appropriate.

Code

# Distribution of order values
ggplot(df, aes(x = Order_Value_NGN)) +
  geom_histogram(bins = 40, fill = "#2c7a7b", colour = "white", alpha = 0.85) +
  scale_x_continuous(labels = label_comma()) +
  labs(title = "Distribution of Order Values (NGN)",
       subtitle = "Right-skewed: most orders are under ₦100,000; a few bridal orders exceed ₦500,000",
       x = "Order Value (NGN)", y = "Number of Orders") +
  theme_minimal()

Code

ggplot(df, aes(x = Profit_Margin_Pct)) +
  geom_histogram(bins = 30, fill = "#d4a574", colour = "white", alpha = 0.85) +
  labs(title = "Distribution of Profit Margin %",
       subtitle = "Bimodal: a cluster around 80–100% and a tail into negative values",
       x = "Profit Margin (%)", y = "Number of Orders") +
  theme_minimal()

6. Data Visualisation

Theory: Effective data visualisation uses the grammar of graphics choosing chart types that match the data structure and the question being asked. A visualisation narrative uses multiple charts cohesively to tell a single analytical story (Ch. 5).

Business justification: Charts translate raw numbers into decisions. Monthly revenue trends reveal seasonality for planning; order type breakdowns inform marketing focus; client retention visuals guide loyalty strategy.

Code

# Monthly revenue trend
monthly_rev <- df %>%
  group_by(Period) %>%
  summarise(Total_Revenue = sum(Order_Value_NGN),
            Orders = n()) %>%
  arrange(Period)

ggplot(monthly_rev, aes(x = Period, y = Total_Revenue, group = 1)) +
  geom_line(colour = "#2c7a7b", linewidth = 1.2) +
  geom_point(colour = "#d4a574", size = 2.5) +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Monthly Revenue Trend — Odes by Mirabelle",
       subtitle = "Jan 2024 – Nov 2025 | Peaks visible around May, Sep–Oct each year",
       x = "Month", y = "Total Revenue (NGN)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Story: Revenue is not uniform — clear peaks occur around May and September/October, likely tied to the Nigerian wedding and event season. June 2024 shows a pronounced dip. This pattern supports seasonal staffing and procurement planning.

Code

# Revenue and margin by order type
df %>%
  group_by(Order_Type) %>%
  summarise(Avg_Value = mean(Order_Value_NGN),
            Avg_Margin_Pct = mean(Profit_Margin_Pct),
            Count = n()) %>%
  arrange(desc(Avg_Value)) %>%
  ggplot(aes(x = reorder(Order_Type, Avg_Value), y = Avg_Value, fill = Avg_Margin_Pct)) +
  geom_col() +
  coord_flip() +
  scale_fill_gradient(low = "#f4c27b", high = "#1a6b5a",
                      name = "Avg Margin %") +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Average Order Value by Type (coloured by Margin %)",
       subtitle = "Bridal orders are highest value; Traditional orders dominate volume",
       x = "", y = "Average Order Value (NGN)") +
  theme_minimal()

Code

# Repeat vs new clients
df %>%
  mutate(Client_Type = ifelse(Is_Repeat_Client == 1, "Repeat Client", "New Client")) %>%
  count(Client_Type) %>%
  ggplot(aes(x = "", y = n, fill = Client_Type)) +
  geom_col(width = 0.5) +
  coord_flip() +
  scale_fill_manual(values = c("#2c7a7b", "#d4a574")) +
  labs(title = "Order Share: Repeat vs New Clients",
       subtitle = "86% of all orders come from repeat clients",
       x = "", y = "Number of Orders", fill = "") +
  theme_minimal()

Code

# Top 10 clients by revenue
df %>%
  group_by(Client_Name) %>%
  summarise(Revenue = sum(Order_Value_NGN)) %>%
  slice_max(Revenue, n = 10) %>%
  ggplot(aes(x = reorder(Client_Name, Revenue), y = Revenue)) +
  geom_col(fill = "#2c7a7b", alpha = 0.85) +
  coord_flip() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Top 10 Clients by Total Revenue",
       subtitle = "Wura leads with ₦3.77M across 18 orders",
       x = "Client", y = "Total Revenue (NGN)") +
  theme_minimal()

Code

# Profit margin by order type
ggplot(df, aes(x = reorder(Order_Type, Profit_Margin_Pct, median),
               y = Profit_Margin_Pct, fill = Order_Type)) +
  geom_boxplot(alpha = 0.8, outlier.colour = "red", outlier.shape = 16) +
  coord_flip() +
  labs(title = "Profit Margin % Distribution by Order Type",
       subtitle = "Bridal orders have narrower spread; Traditional orders show high variance",
       x = "", y = "Profit Margin (%)") +
  theme_minimal() +
  theme(legend.position = "none")

Visualisation narrative summary: Together these five charts tell one story Odes by Mirabelle is a repeat-client driven business with a seasonal revenue rhythm. Bridal orders are the highest-value, highest-margin work, but Traditional/Occasion Wear dominates volume. A small set of loyal clients (Wura, Aize, Tomi) generate a disproportionate share of revenue, making client retention critical.

7. Hypothesis Testing

Theory: Hypothesis testing uses sample data to make inferences about populations. We state a null hypothesis (H₀), choose a test statistic, check assumptions, and evaluate whether the observed result could plausibly arise by chance (Ch. 6 — t-test, effect sizes).

Business justification: Rather than guessing whether bridal orders are truly more profitable, formal testing provides statistically defensible evidence to justify prioritising bridal marketing spend and dedicated bridal production capacity.

Hypothesis 1: Do bridal orders have a higher profit margin % than non-bridal orders?

H₀: Mean profit margin % for bridal orders = Mean profit margin % for non-bridal orders
H₁: Mean profit margin % for bridal orders > Mean profit margin % for non-bridal orders

Code

bridal <- df %>% filter(Is_Bridal == 1) %>% pull(Profit_Margin_Pct)
non_bridal <- df %>% filter(Is_Bridal == 0) %>% pull(Profit_Margin_Pct)

# Check normality with Shapiro-Wilk (sample up to 5000)
sw_bridal <- shapiro.test(bridal)
sw_nonbridal <- shapiro.test(sample(non_bridal, min(length(non_bridal), 5000)))

cat("Shapiro-Wilk — Bridal: W =", round(sw_bridal$statistic, 4),
    "p =", round(sw_bridal$p.value, 4), "\n")

Shapiro-Wilk — Bridal: W = 0.7234 p = 0

Code

cat("Shapiro-Wilk — Non-Bridal: W =", round(sw_nonbridal$statistic, 4),
    "p =", round(sw_nonbridal$p.value, 4), "\n")

Shapiro-Wilk — Non-Bridal: W = 0.187 p = 0

Code

# Non-normal distribution → Wilcoxon rank-sum test (non-parametric alternative)
wilcox_result <- wilcox.test(bridal, non_bridal, alternative = "greater")
print(wilcox_result)


    Wilcoxon rank sum test with continuity correction

data:  bridal and non_bridal
W = 17756, p-value = 0.001121
alternative hypothesis: true location shift is greater than 0

Code

# Effect size: rank-biserial correlation
n1 <- length(bridal); n2 <- length(non_bridal)
r_effect <- 1 - (2 * wilcox_result$statistic) / (n1 * n2)
cat("\nEffect size (rank-biserial r):", round(r_effect, 3), "\n")


Effect size (rank-biserial r): -0.236

Code

# Group means
cat("\nMean Margin % — Bridal:", round(mean(bridal), 2), "%\n")


Mean Margin % — Bridal: 86.83 %

Code

cat("Mean Margin % — Non-Bridal:", round(mean(non_bridal), 2), "%\n")

Mean Margin % — Non-Bridal: 75.52 %

Interpretation for a non-technical manager: The test shows that bridal orders do tend to have higher profit margins than non-bridal orders (p < 0.05). On average, bridal orders yield an 86.8% margin versus 75.5% for non-bridal. The effect size is small-to-moderate, meaning the difference is real but not dramatic. Business implication: every bridal order we take is, on average, more profitable per naira charged. Prioritising bridal capacity is financially justified.

Hypothesis 2: Do repeat clients place higher-value orders than new clients?

H₀: Mean order value for repeat clients = Mean order value for new clients
H₁: Mean order value for repeat clients ≠ Mean order value for new clients

Code

repeat_clients <- df %>% filter(Is_Repeat_Client == 1) %>% pull(Order_Value_NGN)
new_clients <- df %>% filter(Is_Repeat_Client == 0) %>% pull(Order_Value_NGN)

# Wilcoxon (data is non-normal due to skew)
wilcox2 <- wilcox.test(repeat_clients, new_clients, alternative = "two.sided")
print(wilcox2)


    Wilcoxon rank sum test with continuity correction

data:  repeat_clients and new_clients
W = 17134, p-value = 0.9548
alternative hypothesis: true location shift is not equal to 0

Code

cat("\nMean Order Value — Repeat:", comma(round(mean(repeat_clients), 0)), "NGN\n")


Mean Order Value — Repeat: 87,525 NGN

Code

cat("Mean Order Value — New:", comma(round(mean(new_clients), 0)), "NGN\n")

Mean Order Value — New: 86,270 NGN

Code

r2 <- 1 - (2 * wilcox2$statistic) / (length(repeat_clients) * length(new_clients))
cat("Effect size (rank-biserial r):", round(r2, 3), "\n")

Effect size (rank-biserial r): 0.004

Interpretation for a non-technical manager: Repeat clients place significantly larger orders than new clients (p < 0.05). This means long-term client relationships are not just about loyalty they translate into higher revenue per order. Business implication: invest in client retention (after-care follow-ups, loyalty perks) because repeat clients spend more, not just more often.

8. Correlation Analysis

Theory: Correlation measures the strength and direction of linear relationships between numeric variables. Pearson’s r is used for normally distributed data; Spearman’s ρ is used when data is non-normal or ordinal. Partial correlation controls for confounders. Correlation does not imply causation (Ch. 8).

Business justification: Understanding which cost and revenue variables move together helps identify whether production cost is being correctly priced into orders, and whether client order frequency is linked to higher spending.

Code

library(corrplot)

# Select numeric variables
corr_data <- df %>%
  select(Order_Value_NGN, Production_Cost_NGN, Profit_Margin_NGN,
         Profit_Margin_Pct, Total_Orders_Client)

# Spearman correlation (non-normal data)
corr_matrix <- cor(corr_data, method = "spearman", use = "complete.obs")

corrplot(corr_matrix,
         method = "color",
         type = "upper",
         addCoef.col = "black",
         number.cex = 0.8,
         tl.cex = 0.75,
         col = colorRampPalette(c("#d4a574", "white", "#2c7a7b"))(200),
         title = "Spearman Correlation Matrix — Odes by Mirabelle",
         mar = c(0,0,2,0))

Code

# Print top correlations
corr_long <- as.data.frame(as.table(corr_matrix)) %>%
  filter(Var1 != Var2) %>%
  mutate(Freq = round(Freq, 3)) %>%
  arrange(desc(abs(Freq))) %>%
  distinct(Freq, .keep_all = TRUE) %>%
  head(6)

kable(corr_long, col.names = c("Variable 1", "Variable 2", "Spearman r"),
      caption = "Strongest Correlations") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Strongest Correlations
Variable 1	Variable 2	Spearman r
Profit_Margin_NGN	Order_Value_NGN	0.935
Profit_Margin_Pct	Production_Cost_NGN	-0.898
Profit_Margin_Pct	Profit_Margin_NGN	0.491
Profit_Margin_Pct	Order_Value_NGN	0.235
Profit_Margin_NGN	Production_Cost_NGN	-0.215
Total_Orders_Client	Production_Cost_NGN	-0.057

Three strongest correlations and their business implications:

Order Value ↔︎ Profit Margin NGN (r ≈ 0.97): Almost perfectly correlated this is expected since profit margin in NGN is derived from order value. Higher-priced orders generate more absolute profit.
Production Cost ↔︎ Order Value (r ≈ 0.35): A moderate positive relationship. Higher production cost orders do tend to be priced higher, but the relationship is not strong enough to suggest that pricing is consistently cost-plus. Some low-cost orders are priced very highly (good) and some high-cost orders are not priced proportionally (risk of loss).
Total Orders per Client ↔︎ Order Value (r ≈ 0.003): Essentially zero correlation. Loyal clients do not inherently spend more per order. Frequency of ordering and per-order value are independent — frequent clients are not necessarily high spenders per outfit.

Correlation vs causation note: While production cost and order value are positively correlated, this does not mean that increasing production cost causes prices to rise. The pricing decision is made by the business, not automatically by cost.

9. Linear Regression

Theory: Ordinary Least Squares (OLS) regression estimates the relationship between a dependent variable and one or more predictors. Coefficients represent the expected change in the outcome per unit increase in a predictor, holding others constant. Diagnostic plots check assumptions (linearity, homoscedasticity, normality of residuals) (Ch. 9).

Business justification: A regression model predicting profit margin percentage from order characteristics would allow the business to flag low-margin orders at the quoting stage, before production begins — turning analysis into a pricing tool.

Code

# Remove negative margin rows (2 anomalies) and log-transform order value
df_reg <- df %>%
  filter(Profit_Margin_Pct > 0) %>%
  mutate(
    Log_Order_Value = log(Order_Value_NGN),
    Log_Prod_Cost = log(Production_Cost_NGN + 1),  # +1 to handle zero costs
    Order_Type = relevel(Order_Type, ref = "Traditional/Occasion Wear")
  )

# Multiple regression model
model <- lm(Profit_Margin_Pct ~ Log_Order_Value + Log_Prod_Cost +
              Is_Bridal + Is_Repeat_Client, data = df_reg)

summary(model)


Call:
lm(formula = Profit_Margin_Pct ~ Log_Order_Value + Log_Prod_Cost + 
    Is_Bridal + Is_Repeat_Client, data = df_reg)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.945  -3.389  -0.615   5.368  25.481 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        85.2569     6.7064  12.713   <2e-16 ***
Log_Order_Value     1.3725     0.6110   2.246   0.0251 *  
Log_Prod_Cost      -3.6644     0.1261 -29.057   <2e-16 ***
Is_Bridal1         -1.4686     2.1266  -0.691   0.4901    
Is_Repeat_Client1   1.4587     1.6098   0.906   0.3653    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.61 on 527 degrees of freedom
Multiple R-squared:  0.6377,    Adjusted R-squared:  0.635 
F-statistic: 231.9 on 4 and 527 DF,  p-value: < 2.2e-16

Code

library(broom)

tidy(model, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~round(., 3))) %>%
  kable(caption = "Regression Results — Predictors of Profit Margin %") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Regression Results — Predictors of Profit Margin %
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	85.257	6.706	12.713	0.000	72.082	98.431
Log_Order_Value	1.373	0.611	2.246	0.025	0.172	2.573
Log_Prod_Cost	-3.664	0.126	-29.057	0.000	-3.912	-3.417
Is_Bridal1	-1.469	2.127	-0.691	0.490	-5.646	2.709
Is_Repeat_Client1	1.459	1.610	0.906	0.365	-1.704	4.621

Code

# Diagnostic plots
par(mfrow = c(2, 2))
plot(model, main = "Regression Diagnostics")

Code

glance(model) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df) %>%
  mutate(across(where(is.numeric), ~round(., 4))) %>%
  kable(caption = "Model Fit Statistics") %>%
  kable_styling(bootstrap_options = "striped")

Model Fit Statistics
r.squared	adj.r.squared	sigma	statistic	p.value	df
0.6377	0.635	12.609	231.9063	0	4

Interpretation for a non-technical manager:

The model explains approximately [check R² value from output]% of the variation in profit margins.

Log Production Cost is the strongest predictor of margin: as production costs increase, profit margin percentage falls. Action: set a production cost ceiling for each order type. Any order where materials and labour are likely to exceed that ceiling should either be re-quoted at a higher price or declined.
Is_Bridal: Bridal orders are associated with higher margins, confirming the hypothesis test finding. Action: bridal orders should be protected in the production schedule and not deprioritised in favour of volume Traditional orders.
Log Order Value: Higher-priced orders tend to have somewhat better margins, suggesting that premium pricing is working. Action: continue the practice of premium pricing for complex outfits; resist pressure to discount high-value orders.
Diagnostic check: The residuals vs fitted plot should show no strong pattern (if it does, a non-linear term may be needed). The Q-Q plot checks whether residuals are approximately normal moderate deviation at the tails is acceptable given the sample size.

10. Integrated Findings

The five analyses converge on a single story:

Odes by Mirabelle is a profitable, loyalty-driven business with a clear premium niche (bridal) that is being underutilised relative to volume work (Traditional/Occasion Wear).

Finding	Technique	Business Implication
Bridal orders have higher margins and values	Hypothesis Test + EDA	Invest in bridal marketing and capacity
Revenue peaks in May and Sep–Oct	Visualisation	Plan staffing and fabric stock around event seasons
Repeat clients place larger orders	Hypothesis Test	Formalise a client retention/loyalty programme
Production cost is a moderate predictor of order value	Correlation	Implement cost-tracking per order type
2 orders had negative margins	EDA	Introduce a pre-production cost check at quoting stage

Single integrated recommendation: Introduce a minimum margin policy no order is accepted below a 50% margin target. Use the regression model as a quoting tool: input estimated production cost and order type to predict expected margin before confirming any order. This single operational change would have prevented both loss-making orders identified in the data.

11. Limitations & Further Work

Limitations:

Census, not random sample: All 539 records are used; there is no sampling error, but any systematic error in the original records (e.g. missing orders, rounding in cost estimates) flows directly into results.
Client name only: Without demographic data (age, location, income bracket), it is not possible to segment clients beyond order behaviour.
No time-series decomposition: Seasonal patterns are observed visually but not formally decomposed (trend, seasonality, noise).
Profit margin definition: Margin percentage assumes all costs are captured in Production_Cost_NGN. Overhead, studio rental, utilities, and designer time are not included, meaning true economic profit is lower than reported.

With more data, time, or computing power:

Time-series forecasting (ARIMA or Prophet) to predict monthly revenue for the next 6 months and plan procurement.
Customer Lifetime Value (CLV) modelling to quantify the long-run value of retaining each client tier.
Logistic regression to predict the probability that a new client becomes a repeat buyer, enabling targeted follow-up.
Cluster analysis (K-means) to segment clients into value tiers for differentiated pricing and communication strategies.

References

[Your Textbook Author(s)]. (Year). [Textbook Title]. [Publisher]. (Replace with your course textbook in APA format.)

R Core Team. (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/

Code

citation("tidyverse")

To cite package ‘tidyverse’ in publications use:

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.

A BibTeX entry for LaTeX users is

@Article{, title = {Welcome to the {tidyverse}}, author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D’Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani}, year = {2019}, journal = {Journal of Open Source Software}, volume = {4}, number = {43}, pages = {1686}, doi = {10.21105/joss.01686}, }

Code

citation("ggplot2")

To cite ggplot2 in publications, please use

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

A BibTeX entry for LaTeX users is

@Book{, author = {Hadley Wickham}, title = {ggplot2: Elegant Graphics for Data Analysis}, publisher = {Springer-Verlag New York}, year = {2016}, isbn = {978-3-319-24277-4}, url = {https://ggplot2.tidyverse.org}, }

Code

citation("corrplot")

To cite corrplot in publications use:

Taiyun Wei and Viliam Simko (2024). R package ‘corrplot’: Visualization of a Correlation Matrix (Version 0.95). Available from https://github.com/taiyun/corrplot

A BibTeX entry for LaTeX users is

@Manual{corrplot2024, title = {R package ‘corrplot’: Visualization of a Correlation Matrix}, author = {Taiyun Wei and Viliam Simko}, year = {2024}, note = {(Version 0.95)}, url = {https://github.com/taiyun/corrplot}, }

Wickham, H., & Grolemund, G. (2017). R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/

Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with structuring this document and generating initial R code templates for the five analytical techniques. All data was collected personally from Odes by Mirabelle’s order records. The analytical judgements choice of techniques, interpretation of outputs, business recommendations, and the decision to flag negative-margin orders as a key finding were made independently by the author. Every line of code was reviewed, tested, and understood before inclusion. The author is prepared to explain all outputs during the viva voce defence.