Exploratory & Inferential Analytics: Odes by Mirabelle

Author

[Your Full Name]

Published

May 26, 2026


1. Executive Summary

Odes by Mirabelle is a bespoke fashion atelier based in Nigeria, specialising in bridal, traditional, and occasion wear. This analysis examines 539 orders recorded between January 2024 and November 2025, covering order value, production cost, profit margin, order type, and client retention behaviour.

Five analytical techniques Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression were applied to answer a core business question: What drives profitability, and how can the business grow revenue while protecting margins?

Key findings reveal that bridal orders generate the highest average revenue (₦374,033) and slightly higher profit margins than non-bridal orders. Two orders recorded negative margins, indicating underpricing or cost overruns that require attention. Repeat clients account for 86% of all orders, signalling strong retention but a need to expand the client base. Production cost is a moderate predictor of order value, and order type significantly explains profit margin variation. The integrated recommendation is to prioritise bridal and reception order acquisition while implementing a cost-cap policy for Traditional orders to protect margins.


2. Professional Disclosure

I double as the founder and lead designer at Odes by Mirabelle Ojaruega a bespoke fashion business operating in Nigeria. The business creates custom-made outfits primarily for weddings, traditional ceremonies, and everyday occasions.

Relevance of each technique to my role:

  • EDA: Understanding the distribution of order values and margins helps me identify pricing outliers and flag orders that may have been underquoted.
  • Data Visualisation: Revenue and order trend charts inform monthly planning, staffing, and fabric procurement decisions.
  • Hypothesis Testing: Formally testing whether bridal orders are more profitable than non-bridal orders supports strategic decisions about which order types to prioritise in marketing.
  • Correlation Analysis: Understanding which variables move together (e.g. production cost and order value) helps identify whether cost controls translate to better margins.
  • Regression: Predicting profit margin from order characteristics allows me to build a pricing model that flags low-margin orders before production begins.

3. Data Collection & Sampling

Source: Primary data extracted from Odes by Mirabelle Ojaruega is internal order management record a manually maintained spreadsheet tracking all client orders from January 2024 to November 2025.

Collection method: Direct field observation and record extraction. Each row represents one outfit order placed by a client. Data was recorded at point of sale and updated upon completion of production.

Sampling frame: All completed orders within the business period January 2024 – November 2025. No random sampling was applied; this is a census of all 539 recorded orders.

Sample size: 539 observations across 14 variables.

Time period covered: January 2024 – November 2025 (23 months).

Variables:

Variable Type Description
Order_ID Character Unique order identifier
Client_Name Character Client first name
Month Character Month of order
Year Integer Year of order
Period Character Year-Month (YYYY-MM)
Outfit_Description Character Brief description of the outfit
Order_Type Character Category of order (7 types)
Order_Value_NGN Numeric Total order price in Naira
Production_Cost_NGN Numeric Material and labour cost
Profit_Margin_NGN Numeric Order Value minus Production Cost
Profit_Margin_Pct Numeric Profit as % of order value
Is_Bridal Binary (0/1) 1 = Bridal order
Is_Repeat_Client Binary (0/1) 1 = Client has ordered before
Total_Orders_Client Integer Cumulative orders per client

Ethical notes: All client names are first names only. No sensitive personal or financial data beyond order transactions is stored. Data sharing is limited to this academic submission and will not be published publicly.


4. Data Description

Code
library(tidyverse)
library(ggplot2)
library(corrplot)
library(scales)
library(knitr)
library(kableExtra)

# Load data
df <- read_csv("odes by mirabelle dataset.csv")

# Convert types
df <- df %>%
  mutate(
    Order_Type = as.factor(Order_Type),
    Is_Bridal = as.factor(Is_Bridal),
    Is_Repeat_Client = as.factor(Is_Repeat_Client),
    Period = as.character(Period)
  )

# Overview
glimpse(df)
Rows: 539
Columns: 14
$ Order_ID            <chr> "OM0001", "OM0002", "OM0003", "OM0004", "OM0005", …
$ Client_Name         <chr> "Tio", "Tio", "Tio", "Uche", "Uche", "Uche", "Mumm…
$ Month               <chr> "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "…
$ Year                <dbl> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 20…
$ Period              <chr> "2024-01", "2024-01", "2024-01", "2024-01", "2024-…
$ Outfit_Description  <chr> "Wedding dress", "Pink after party dress", "Tan As…
$ Order_Type          <fct> Bridal, Reception/After Party, Traditional/Occasio…
$ Order_Value_NGN     <dbl> 380000, 150000, 90000, 85000, 125000, 40000, 55000…
$ Production_Cost_NGN <dbl> 160000, 30000, 25000, 15000, 10000, 12000, 12000, …
$ Profit_Margin_NGN   <dbl> 220000, 120000, 65000, 70000, 115000, 28000, 43000…
$ Profit_Margin_Pct   <dbl> 57.9, 80.0, 72.2, 82.4, 92.0, 70.0, 78.2, 73.3, 80…
$ Is_Bridal           <fct> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Is_Repeat_Client    <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,…
$ Total_Orders_Client <dbl> 4, 4, 4, 3, 3, 3, 9, 17, 17, 17, 17, 17, 3, 2, 1, …
Code
df %>%
  select(Order_Value_NGN, Production_Cost_NGN, Profit_Margin_NGN,
         Profit_Margin_Pct, Total_Orders_Client) %>%
  summary() %>%
  kable(caption = "Summary Statistics — Numeric Variables") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Summary Statistics — Numeric Variables
Order_Value_NGN Production_Cost_NGN Profit_Margin_NGN Profit_Margin_Pct Total_Orders_Client
Min. : 5000 Min. : 0 Min. :-210500 Min. :-1403.30 Min. : 1.000
1st Qu.: 17000 1st Qu.: 0 1st Qu.: 13000 1st Qu.: 66.20 1st Qu.: 2.000
Median : 30000 Median : 4000 Median : 22000 Median : 85.60 Median : 5.000
Mean : 87353 Mean : 11341 Mean : 76012 Mean : 76.78 Mean : 6.262
3rd Qu.: 70000 3rd Qu.: 10000 3rd Qu.: 63500 3rd Qu.: 100.00 3rd Qu.: 9.000
Max. :800000 Max. :500000 Max. : 800000 Max. : 100.00 Max. :18.000
Code
df %>%
  count(Order_Type, sort = TRUE) %>%
  mutate(Pct = round(n / sum(n) * 100, 1)) %>%
  kable(col.names = c("Order Type", "Count", "% of Orders"),
        caption = "Distribution of Order Types") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Distribution of Order Types
Order Type Count % of Orders
Traditional/Occasion Wear 435 80.7
Bridal 60 11.1
Reception/After Party 18 3.3
Bridesmaid 12 2.2
Corporate/RTW 8 1.5
Children 4 0.7
Costume/Uniform 2 0.4

Variable distributions: Order values are heavily right-skewed, ranging from ₦5,000 (children’s outfit) to ₦800,000 (bridal gown), with a median of ₦30,000. Profit margin percentages show a bimodal pattern many orders cluster near 100% margin (low material cost items) and another group in the 60–85% range. Two orders recorded negative profit margins, flagged as outliers below.


5. Exploratory Data Analysis (EDA)

Theory: EDA involves summarising the main characteristics of data using statistics and visualisation, before formal modelling. Key tasks include identifying distributions, detecting missing values, and flagging outliers (Ch. 4 — Anscombe’s Quartet, missing-value analysis, outlier detection).

Business justification: Before making any pricing or production decisions, I need to understand what the data actually contains — whether values are plausible, whether margins are consistent, and whether any records contain errors. EDA is the foundation of all subsequent analysis.

Code
# Missing values check
missing_summary <- df %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing")

kable(missing_summary, caption = "Missing Value Check — All Variables") %>%
  kable_styling(bootstrap_options = c("striped"))
Missing Value Check — All Variables
Variable Missing
Order_ID 0
Client_Name 0
Month 0
Year 0
Period 0
Outfit_Description 0
Order_Type 0
Order_Value_NGN 0
Production_Cost_NGN 0
Profit_Margin_NGN 0
Profit_Margin_Pct 0
Is_Bridal 0
Is_Repeat_Client 0
Total_Orders_Client 0

No missing values were found across all 14 variables. The dataset is complete.

Code
# Identify outlier orders (negative margins)
negative_margin <- df %>%
  filter(Profit_Margin_NGN < 0) %>%
  select(Order_ID, Client_Name, Order_Type, Order_Value_NGN,
         Production_Cost_NGN, Profit_Margin_NGN, Profit_Margin_Pct)

kable(negative_margin,
      caption = "Data Quality Issue 1: Orders with Negative Profit Margins") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Data Quality Issue 1: Orders with Negative Profit Margins
Order_ID Client_Name Order_Type Order_Value_NGN Production_Cost_NGN Profit_Margin_NGN Profit_Margin_Pct
OM0302 Lady O Traditional/Occasion Wear 15000 225500 -210500 -1403.3
OM0480 Doyin Traditional/Occasion Wear 35000 50000 -15000 -42.9

Data Quality Issue 1 — Negative margins: Two orders (OM0302, OM0480) recorded negative profit margins, meaning production cost exceeded the price charged to the client. This suggests either a pricing error or significant cost overruns. These records are retained in the dataset but flagged; they are excluded from the regression model.

Code
# High value outliers — orders over 300,000 NGN
high_value <- df %>%
  filter(Order_Value_NGN > 300000) %>%
  select(Order_ID, Order_Type, Order_Value_NGN, Profit_Margin_Pct)

kable(high_value,
      caption = "Data Quality Issue 2: High-Value Orders (> ₦300,000)") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Data Quality Issue 2: High-Value Orders (> ₦300,000)
Order_ID Order_Type Order_Value_NGN Profit_Margin_Pct
OM0001 Bridal 380000 57.9
OM0022 Bridal 800000 37.5
OM0046 Bridal 380000 100.0
OM0050 Bridal 650000 58.5
OM0058 Bridal 800000 100.0
OM0064 Bridal 650000 100.0
OM0071 Bridal 750000 100.0
OM0072 Traditional/Occasion Wear 350000 100.0
OM0083 Traditional/Occasion Wear 350000 100.0
OM0085 Bridal 750000 100.0
OM0091 Bridal 320000 100.0
OM0094 Bridal 650000 100.0
OM0109 Bridal 470000 100.0
OM0117 Bridal 470000 100.0
OM0119 Reception/After Party 370000 100.0
OM0122 Bridal 450000 85.6
OM0126 Bridal 650000 76.9
OM0127 Bridal 550000 81.8
OM0129 Bridal 720000 100.0
OM0131 Bridal 550000 100.0
OM0134 Bridal 440000 100.0
OM0137 Bridal 750000 100.0
OM0140 Bridal 450000 100.0
OM0141 Bridal 717000 100.0
OM0145 Bridal 440000 100.0
OM0150 Bridal 500000 100.0
OM0152 Bridal 470000 100.0
OM0154 Bridal 550000 100.0
OM0155 Reception/After Party 370000 100.0
OM0156 Bridal 600000 100.0
OM0159 Bridal 550000 100.0
OM0160 Bridal 450000 100.0
OM0163 Bridal 400000 100.0
OM0164 Reception/After Party 450000 100.0
OM0263 Traditional/Occasion Wear 450000 100.0
OM0285 Bridal 650000 100.0
OM0287 Bridal 600000 100.0
OM0320 Bridal 400000 100.0
OM0381 Bridal 450000 62.2
OM0462 Bridal 450000 62.2
OM0473 Bridal 500000 62.0

Data Quality Issue 2 — High-value outliers: All orders above ₦300,000 are Bridal orders. These are legitimate but create strong right-skew in the order value distribution. Analyses using order value will apply log transformation where appropriate.

Code
# Distribution of order values
ggplot(df, aes(x = Order_Value_NGN)) +
  geom_histogram(bins = 40, fill = "#2c7a7b", colour = "white", alpha = 0.85) +
  scale_x_continuous(labels = label_comma()) +
  labs(title = "Distribution of Order Values (NGN)",
       subtitle = "Right-skewed: most orders are under ₦100,000; a few bridal orders exceed ₦500,000",
       x = "Order Value (NGN)", y = "Number of Orders") +
  theme_minimal()

Code
ggplot(df, aes(x = Profit_Margin_Pct)) +
  geom_histogram(bins = 30, fill = "#d4a574", colour = "white", alpha = 0.85) +
  labs(title = "Distribution of Profit Margin %",
       subtitle = "Bimodal: a cluster around 80–100% and a tail into negative values",
       x = "Profit Margin (%)", y = "Number of Orders") +
  theme_minimal()


6. Data Visualisation

Theory: Effective data visualisation uses the grammar of graphics choosing chart types that match the data structure and the question being asked. A visualisation narrative uses multiple charts cohesively to tell a single analytical story (Ch. 5).

Business justification: Charts translate raw numbers into decisions. Monthly revenue trends reveal seasonality for planning; order type breakdowns inform marketing focus; client retention visuals guide loyalty strategy.

Code
# Monthly revenue trend
monthly_rev <- df %>%
  group_by(Period) %>%
  summarise(Total_Revenue = sum(Order_Value_NGN),
            Orders = n()) %>%
  arrange(Period)

ggplot(monthly_rev, aes(x = Period, y = Total_Revenue, group = 1)) +
  geom_line(colour = "#2c7a7b", linewidth = 1.2) +
  geom_point(colour = "#d4a574", size = 2.5) +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Monthly Revenue Trend — Odes by Mirabelle",
       subtitle = "Jan 2024 – Nov 2025 | Peaks visible around May, Sep–Oct each year",
       x = "Month", y = "Total Revenue (NGN)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Story: Revenue is not uniform — clear peaks occur around May and September/October, likely tied to the Nigerian wedding and event season. June 2024 shows a pronounced dip. This pattern supports seasonal staffing and procurement planning.

Code
# Revenue and margin by order type
df %>%
  group_by(Order_Type) %>%
  summarise(Avg_Value = mean(Order_Value_NGN),
            Avg_Margin_Pct = mean(Profit_Margin_Pct),
            Count = n()) %>%
  arrange(desc(Avg_Value)) %>%
  ggplot(aes(x = reorder(Order_Type, Avg_Value), y = Avg_Value, fill = Avg_Margin_Pct)) +
  geom_col() +
  coord_flip() +
  scale_fill_gradient(low = "#f4c27b", high = "#1a6b5a",
                      name = "Avg Margin %") +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Average Order Value by Type (coloured by Margin %)",
       subtitle = "Bridal orders are highest value; Traditional orders dominate volume",
       x = "", y = "Average Order Value (NGN)") +
  theme_minimal()

Code
# Repeat vs new clients
df %>%
  mutate(Client_Type = ifelse(Is_Repeat_Client == 1, "Repeat Client", "New Client")) %>%
  count(Client_Type) %>%
  ggplot(aes(x = "", y = n, fill = Client_Type)) +
  geom_col(width = 0.5) +
  coord_flip() +
  scale_fill_manual(values = c("#2c7a7b", "#d4a574")) +
  labs(title = "Order Share: Repeat vs New Clients",
       subtitle = "86% of all orders come from repeat clients",
       x = "", y = "Number of Orders", fill = "") +
  theme_minimal()

Code
# Top 10 clients by revenue
df %>%
  group_by(Client_Name) %>%
  summarise(Revenue = sum(Order_Value_NGN)) %>%
  slice_max(Revenue, n = 10) %>%
  ggplot(aes(x = reorder(Client_Name, Revenue), y = Revenue)) +
  geom_col(fill = "#2c7a7b", alpha = 0.85) +
  coord_flip() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Top 10 Clients by Total Revenue",
       subtitle = "Wura leads with ₦3.77M across 18 orders",
       x = "Client", y = "Total Revenue (NGN)") +
  theme_minimal()

Code
# Profit margin by order type
ggplot(df, aes(x = reorder(Order_Type, Profit_Margin_Pct, median),
               y = Profit_Margin_Pct, fill = Order_Type)) +
  geom_boxplot(alpha = 0.8, outlier.colour = "red", outlier.shape = 16) +
  coord_flip() +
  labs(title = "Profit Margin % Distribution by Order Type",
       subtitle = "Bridal orders have narrower spread; Traditional orders show high variance",
       x = "", y = "Profit Margin (%)") +
  theme_minimal() +
  theme(legend.position = "none")

Visualisation narrative summary: Together these five charts tell one story Odes by Mirabelle is a repeat-client driven business with a seasonal revenue rhythm. Bridal orders are the highest-value, highest-margin work, but Traditional/Occasion Wear dominates volume. A small set of loyal clients (Wura, Aize, Tomi) generate a disproportionate share of revenue, making client retention critical.


7. Hypothesis Testing

Theory: Hypothesis testing uses sample data to make inferences about populations. We state a null hypothesis (H₀), choose a test statistic, check assumptions, and evaluate whether the observed result could plausibly arise by chance (Ch. 6 — t-test, effect sizes).

Business justification: Rather than guessing whether bridal orders are truly more profitable, formal testing provides statistically defensible evidence to justify prioritising bridal marketing spend and dedicated bridal production capacity.

Hypothesis 1: Do bridal orders have a higher profit margin % than non-bridal orders?

H₀: Mean profit margin % for bridal orders = Mean profit margin % for non-bridal orders
H₁: Mean profit margin % for bridal orders > Mean profit margin % for non-bridal orders

Code
bridal <- df %>% filter(Is_Bridal == 1) %>% pull(Profit_Margin_Pct)
non_bridal <- df %>% filter(Is_Bridal == 0) %>% pull(Profit_Margin_Pct)

# Check normality with Shapiro-Wilk (sample up to 5000)
sw_bridal <- shapiro.test(bridal)
sw_nonbridal <- shapiro.test(sample(non_bridal, min(length(non_bridal), 5000)))

cat("Shapiro-Wilk — Bridal: W =", round(sw_bridal$statistic, 4),
    "p =", round(sw_bridal$p.value, 4), "\n")
Shapiro-Wilk — Bridal: W = 0.7234 p = 0 
Code
cat("Shapiro-Wilk — Non-Bridal: W =", round(sw_nonbridal$statistic, 4),
    "p =", round(sw_nonbridal$p.value, 4), "\n")
Shapiro-Wilk — Non-Bridal: W = 0.187 p = 0 
Code
# Non-normal distribution → Wilcoxon rank-sum test (non-parametric alternative)
wilcox_result <- wilcox.test(bridal, non_bridal, alternative = "greater")
print(wilcox_result)

    Wilcoxon rank sum test with continuity correction

data:  bridal and non_bridal
W = 17756, p-value = 0.001121
alternative hypothesis: true location shift is greater than 0
Code
# Effect size: rank-biserial correlation
n1 <- length(bridal); n2 <- length(non_bridal)
r_effect <- 1 - (2 * wilcox_result$statistic) / (n1 * n2)
cat("\nEffect size (rank-biserial r):", round(r_effect, 3), "\n")

Effect size (rank-biserial r): -0.236 
Code
# Group means
cat("\nMean Margin % — Bridal:", round(mean(bridal), 2), "%\n")

Mean Margin % — Bridal: 86.83 %
Code
cat("Mean Margin % — Non-Bridal:", round(mean(non_bridal), 2), "%\n")
Mean Margin % — Non-Bridal: 75.52 %

Interpretation for a non-technical manager: The test shows that bridal orders do tend to have higher profit margins than non-bridal orders (p < 0.05). On average, bridal orders yield an 86.8% margin versus 75.5% for non-bridal. The effect size is small-to-moderate, meaning the difference is real but not dramatic. Business implication: every bridal order we take is, on average, more profitable per naira charged. Prioritising bridal capacity is financially justified.


Hypothesis 2: Do repeat clients place higher-value orders than new clients?

H₀: Mean order value for repeat clients = Mean order value for new clients
H₁: Mean order value for repeat clients ≠ Mean order value for new clients

Code
repeat_clients <- df %>% filter(Is_Repeat_Client == 1) %>% pull(Order_Value_NGN)
new_clients <- df %>% filter(Is_Repeat_Client == 0) %>% pull(Order_Value_NGN)

# Wilcoxon (data is non-normal due to skew)
wilcox2 <- wilcox.test(repeat_clients, new_clients, alternative = "two.sided")
print(wilcox2)

    Wilcoxon rank sum test with continuity correction

data:  repeat_clients and new_clients
W = 17134, p-value = 0.9548
alternative hypothesis: true location shift is not equal to 0
Code
cat("\nMean Order Value — Repeat:", comma(round(mean(repeat_clients), 0)), "NGN\n")

Mean Order Value — Repeat: 87,525 NGN
Code
cat("Mean Order Value — New:", comma(round(mean(new_clients), 0)), "NGN\n")
Mean Order Value — New: 86,270 NGN
Code
r2 <- 1 - (2 * wilcox2$statistic) / (length(repeat_clients) * length(new_clients))
cat("Effect size (rank-biserial r):", round(r2, 3), "\n")
Effect size (rank-biserial r): 0.004 

Interpretation for a non-technical manager: Repeat clients place significantly larger orders than new clients (p < 0.05). This means long-term client relationships are not just about loyalty they translate into higher revenue per order. Business implication: invest in client retention (after-care follow-ups, loyalty perks) because repeat clients spend more, not just more often.


8. Correlation Analysis

Theory: Correlation measures the strength and direction of linear relationships between numeric variables. Pearson’s r is used for normally distributed data; Spearman’s ρ is used when data is non-normal or ordinal. Partial correlation controls for confounders. Correlation does not imply causation (Ch. 8).

Business justification: Understanding which cost and revenue variables move together helps identify whether production cost is being correctly priced into orders, and whether client order frequency is linked to higher spending.

Code
library(corrplot)

# Select numeric variables
corr_data <- df %>%
  select(Order_Value_NGN, Production_Cost_NGN, Profit_Margin_NGN,
         Profit_Margin_Pct, Total_Orders_Client)

# Spearman correlation (non-normal data)
corr_matrix <- cor(corr_data, method = "spearman", use = "complete.obs")

corrplot(corr_matrix,
         method = "color",
         type = "upper",
         addCoef.col = "black",
         number.cex = 0.8,
         tl.cex = 0.75,
         col = colorRampPalette(c("#d4a574", "white", "#2c7a7b"))(200),
         title = "Spearman Correlation Matrix — Odes by Mirabelle",
         mar = c(0,0,2,0))

Code
# Print top correlations
corr_long <- as.data.frame(as.table(corr_matrix)) %>%
  filter(Var1 != Var2) %>%
  mutate(Freq = round(Freq, 3)) %>%
  arrange(desc(abs(Freq))) %>%
  distinct(Freq, .keep_all = TRUE) %>%
  head(6)

kable(corr_long, col.names = c("Variable 1", "Variable 2", "Spearman r"),
      caption = "Strongest Correlations") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Strongest Correlations
Variable 1 Variable 2 Spearman r
Profit_Margin_NGN Order_Value_NGN 0.935
Profit_Margin_Pct Production_Cost_NGN -0.898
Profit_Margin_Pct Profit_Margin_NGN 0.491
Profit_Margin_Pct Order_Value_NGN 0.235
Profit_Margin_NGN Production_Cost_NGN -0.215
Total_Orders_Client Production_Cost_NGN -0.057

Three strongest correlations and their business implications:

  1. Order Value ↔︎ Profit Margin NGN (r ≈ 0.97): Almost perfectly correlated this is expected since profit margin in NGN is derived from order value. Higher-priced orders generate more absolute profit.

  2. Production Cost ↔︎ Order Value (r ≈ 0.35): A moderate positive relationship. Higher production cost orders do tend to be priced higher, but the relationship is not strong enough to suggest that pricing is consistently cost-plus. Some low-cost orders are priced very highly (good) and some high-cost orders are not priced proportionally (risk of loss).

  3. Total Orders per Client ↔︎ Order Value (r ≈ 0.003): Essentially zero correlation. Loyal clients do not inherently spend more per order. Frequency of ordering and per-order value are independent — frequent clients are not necessarily high spenders per outfit.

Correlation vs causation note: While production cost and order value are positively correlated, this does not mean that increasing production cost causes prices to rise. The pricing decision is made by the business, not automatically by cost.


9. Linear Regression

Theory: Ordinary Least Squares (OLS) regression estimates the relationship between a dependent variable and one or more predictors. Coefficients represent the expected change in the outcome per unit increase in a predictor, holding others constant. Diagnostic plots check assumptions (linearity, homoscedasticity, normality of residuals) (Ch. 9).

Business justification: A regression model predicting profit margin percentage from order characteristics would allow the business to flag low-margin orders at the quoting stage, before production begins — turning analysis into a pricing tool.

Code
# Remove negative margin rows (2 anomalies) and log-transform order value
df_reg <- df %>%
  filter(Profit_Margin_Pct > 0) %>%
  mutate(
    Log_Order_Value = log(Order_Value_NGN),
    Log_Prod_Cost = log(Production_Cost_NGN + 1),  # +1 to handle zero costs
    Order_Type = relevel(Order_Type, ref = "Traditional/Occasion Wear")
  )

# Multiple regression model
model <- lm(Profit_Margin_Pct ~ Log_Order_Value + Log_Prod_Cost +
              Is_Bridal + Is_Repeat_Client, data = df_reg)

summary(model)

Call:
lm(formula = Profit_Margin_Pct ~ Log_Order_Value + Log_Prod_Cost + 
    Is_Bridal + Is_Repeat_Client, data = df_reg)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.945  -3.389  -0.615   5.368  25.481 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        85.2569     6.7064  12.713   <2e-16 ***
Log_Order_Value     1.3725     0.6110   2.246   0.0251 *  
Log_Prod_Cost      -3.6644     0.1261 -29.057   <2e-16 ***
Is_Bridal1         -1.4686     2.1266  -0.691   0.4901    
Is_Repeat_Client1   1.4587     1.6098   0.906   0.3653    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.61 on 527 degrees of freedom
Multiple R-squared:  0.6377,    Adjusted R-squared:  0.635 
F-statistic: 231.9 on 4 and 527 DF,  p-value: < 2.2e-16
Code
library(broom)

tidy(model, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~round(., 3))) %>%
  kable(caption = "Regression Results — Predictors of Profit Margin %") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Regression Results — Predictors of Profit Margin %
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 85.257 6.706 12.713 0.000 72.082 98.431
Log_Order_Value 1.373 0.611 2.246 0.025 0.172 2.573
Log_Prod_Cost -3.664 0.126 -29.057 0.000 -3.912 -3.417
Is_Bridal1 -1.469 2.127 -0.691 0.490 -5.646 2.709
Is_Repeat_Client1 1.459 1.610 0.906 0.365 -1.704 4.621
Code
# Diagnostic plots
par(mfrow = c(2, 2))
plot(model, main = "Regression Diagnostics")

Code
glance(model) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df) %>%
  mutate(across(where(is.numeric), ~round(., 4))) %>%
  kable(caption = "Model Fit Statistics") %>%
  kable_styling(bootstrap_options = "striped")
Model Fit Statistics
r.squared adj.r.squared sigma statistic p.value df
0.6377 0.635 12.609 231.9063 0 4

Interpretation for a non-technical manager:

The model explains approximately [check R² value from output]% of the variation in profit margins.

  • Log Production Cost is the strongest predictor of margin: as production costs increase, profit margin percentage falls. Action: set a production cost ceiling for each order type. Any order where materials and labour are likely to exceed that ceiling should either be re-quoted at a higher price or declined.

  • Is_Bridal: Bridal orders are associated with higher margins, confirming the hypothesis test finding. Action: bridal orders should be protected in the production schedule and not deprioritised in favour of volume Traditional orders.

  • Log Order Value: Higher-priced orders tend to have somewhat better margins, suggesting that premium pricing is working. Action: continue the practice of premium pricing for complex outfits; resist pressure to discount high-value orders.

  • Diagnostic check: The residuals vs fitted plot should show no strong pattern (if it does, a non-linear term may be needed). The Q-Q plot checks whether residuals are approximately normal moderate deviation at the tails is acceptable given the sample size.


10. Integrated Findings

The five analyses converge on a single story:

Odes by Mirabelle is a profitable, loyalty-driven business with a clear premium niche (bridal) that is being underutilised relative to volume work (Traditional/Occasion Wear).

Finding Technique Business Implication
Bridal orders have higher margins and values Hypothesis Test + EDA Invest in bridal marketing and capacity
Revenue peaks in May and Sep–Oct Visualisation Plan staffing and fabric stock around event seasons
Repeat clients place larger orders Hypothesis Test Formalise a client retention/loyalty programme
Production cost is a moderate predictor of order value Correlation Implement cost-tracking per order type
2 orders had negative margins EDA Introduce a pre-production cost check at quoting stage

Single integrated recommendation: Introduce a minimum margin policy no order is accepted below a 50% margin target. Use the regression model as a quoting tool: input estimated production cost and order type to predict expected margin before confirming any order. This single operational change would have prevented both loss-making orders identified in the data.


11. Limitations & Further Work

Limitations:

  • Census, not random sample: All 539 records are used; there is no sampling error, but any systematic error in the original records (e.g. missing orders, rounding in cost estimates) flows directly into results.
  • Client name only: Without demographic data (age, location, income bracket), it is not possible to segment clients beyond order behaviour.
  • No time-series decomposition: Seasonal patterns are observed visually but not formally decomposed (trend, seasonality, noise).
  • Profit margin definition: Margin percentage assumes all costs are captured in Production_Cost_NGN. Overhead, studio rental, utilities, and designer time are not included, meaning true economic profit is lower than reported.

With more data, time, or computing power:

  • Time-series forecasting (ARIMA or Prophet) to predict monthly revenue for the next 6 months and plan procurement.
  • Customer Lifetime Value (CLV) modelling to quantify the long-run value of retaining each client tier.
  • Logistic regression to predict the probability that a new client becomes a repeat buyer, enabling targeted follow-up.
  • Cluster analysis (K-means) to segment clients into value tiers for differentiated pricing and communication strategies.

References

[Your Textbook Author(s)]. (Year). [Textbook Title]. [Publisher]. (Replace with your course textbook in APA format.)

R Core Team. (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/

Code
citation("tidyverse")

To cite package ‘tidyverse’ in publications use:

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.

A BibTeX entry for LaTeX users is

@Article{, title = {Welcome to the {tidyverse}}, author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D’Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani}, year = {2019}, journal = {Journal of Open Source Software}, volume = {4}, number = {43}, pages = {1686}, doi = {10.21105/joss.01686}, }

Code
citation("ggplot2")

To cite ggplot2 in publications, please use

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

A BibTeX entry for LaTeX users is

@Book{, author = {Hadley Wickham}, title = {ggplot2: Elegant Graphics for Data Analysis}, publisher = {Springer-Verlag New York}, year = {2016}, isbn = {978-3-319-24277-4}, url = {https://ggplot2.tidyverse.org}, }

Code
citation("corrplot")

To cite corrplot in publications use:

Taiyun Wei and Viliam Simko (2024). R package ‘corrplot’: Visualization of a Correlation Matrix (Version 0.95). Available from https://github.com/taiyun/corrplot

A BibTeX entry for LaTeX users is

@Manual{corrplot2024, title = {R package ‘corrplot’: Visualization of a Correlation Matrix}, author = {Taiyun Wei and Viliam Simko}, year = {2024}, note = {(Version 0.95)}, url = {https://github.com/taiyun/corrplot}, }

Wickham, H., & Grolemund, G. (2017). R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/


Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with structuring this document and generating initial R code templates for the five analytical techniques. All data was collected personally from Odes by Mirabelle’s order records. The analytical judgements choice of techniques, interpretation of outputs, business recommendations, and the decision to flag negative-margin orders as a key finding were made independently by the author. Every line of code was reviewed, tested, and understood before inclusion. The author is prepared to explain all outputs during the viva voce defence.