Sales Performance Analytics at Danica Farms

Case Study 1 — Exploratory & Inferential Analytics | Data Analytics II

Author

Anita Amadi | Lagos Business School

Published

May 12, 2026

Jan – May 2026 · Danica Farms · Rivers State, Nigeria
₦82.1M
Total Revenue
490
Transactions
25
Named Customers
69%
Revenue · Egg(Big)
₦4.3M
Avg Weekly Revenue
48%
Top 3 Buyer Share

1 Executive Summary

Danica Farms is a poultry farm focused on egg production and distribution in Rivers State, Nigeria, supplying wholesale buyers and retail walk-in customers with eggs across four size grades — Big, Medium, Small, and Pullet — as well as ancillary farm products including Manure and Sack. The farm operates a direct-to-buyer distribution model, serving a network of approximately 25 named wholesale accounts comprising market traders, food vendors, canteen operators, and retail shop owners, alongside a growing volume of anonymous walk-in retail customers.

This study analyses 490 verified sales transactions recorded between 1 January 2026 and 6 May 2026. The central research question is:

What factors drive sales revenue at Danica Farms, and do product category, payment method, and customer gender significantly influence transaction volume and value?

Key findings:

  • Total revenue for the period is ₦82.1 million across 19 weeks — averaging ₦4.3M weekly.
  • Paul, Uchechi, and Uche Utochukwu together contribute ₦39.6 million (48%) — a severe concentration risk.
  • Egg(Big) generates ₦56.6M (69% of revenue); this advantage is statistically significant (p < .001).
  • Quantity ordered is the single strongest predictor of revenue (Spearman ρ > 0.95).
  • Price increases from ₦5,300 to ₦5,500 across the period did not suppress demand.

Primary Recommendation

Danica Farms must urgently diversify its customer base — losing any one top-three buyer removes ₦12–14M in annual revenue. In parallel, Egg(Big) production must be protected and expanded as the undisputed revenue engine of the business.


2 Professional Disclosure

Name: Anita Amadi Role: Chief Growth Officer, Danica Farms Organisation: Danica Farms — poultry production and egg distribution, Rivers State, Nigeria Sector: Agri-food / Livestock GitHub: github.com/anibaby1/danica-farms-analytics Published Report: rpubs.com/AnitaA_/1431928

2.1 About Danica Farms

Danica Farms is a registered poultry enterprise headquartered in Rivers State, Nigeria, engaged in the commercial production, grading, and distribution of table eggs and ancillary poultry products. The farm operates a vertically integrated model — from in-house layer flock management through to direct distribution to end buyers — serving approximately 25 named wholesale accounts comprising market traders, food vendors, canteen operators, and retail shop owners, as well as a growing volume of anonymous retail walk-in customers.

The farm’s core product range spans four egg size grades: Big (the primary revenue driver, retailed wholesale at ₦5,300–₦5,500 per crate), Medium (₦4,800–₦5,200), Small (₦4,000–₦4,800), and Pullet (₦3,000–₦3,500 per crate, sourced from young hens). Supplementary products include organic Manure (₦500–₦600 per unit) and Sacks (₦70 each) used for packaging and farm operations. All transactions are settled through two digital payment channels: First Monie wallet (mobile money, accounting for ~81% of transactions) and First Bank direct transfer (~19%).

As Chief Growth Officer, Anita Amadi is responsible for commercial strategy, customer acquisition, pricing decisions, and the analytical frameworks that inform how the farm allocates production capacity and targets new wholesale accounts.

2.2 Why Each Technique Is Relevant to My Work

1. Exploratory Data Analysis. I maintain the farm’s daily sales ledger personally. Before any business decision — production volumes, pricing, credit terms — I need to understand revenue distribution, identify data entry errors, and flag anomalies. EDA formalises a process I already do informally every month.

2. Data Visualisation. I present monthly summaries to the farm owner and potential investors. Converting tables into charts of weekly trends and customer concentration makes these reviews actionable and persuasive in a way numbers alone cannot.

3. Hypothesis Testing. Questions like “does egg size really matter for revenue?” need statistical answers, not guesswork. Hypothesis testing gives me evidence I can defend when recommending production changes to the farm owner.

4. Correlation Analysis. Understanding whether our unit price increases reduce order volumes is the most important pricing question we face. Correlation analysis quantifies this relationship rigorously so I can advise on pricing with confidence.

5. Linear Regression. A regression model lets me estimate expected revenue from any proposed order — by size, quantity, and channel. This becomes a practical forecasting tool for monthly sales targets and evaluating proposed price changes before implementing them.


3 Data Collection & Sampling

3.1 3.1 Data Source and Collection Method

The dataset used in this study was extracted directly from Danica Farms’ proprietary internal sales ledger — a Microsoft Excel workbook maintained by Anita Amadi, Chief Growth Officer, and updated at the point of every sales transaction. Data entry is performed manually by the farm’s sales team at the point of sale, capturing each transaction in real time. No data was simulated, synthetically generated, aggregated from secondary sources, or publicly downloaded.

Field Details
Data source Danica Farms internal sales ledger (Microsoft Excel workbook)
Collection method Manual point-of-sale entry by farm sales manager (Anita Amadi)
Extraction tool Microsoft Excel → imported into R via the readxl package
Time period covered 1 January 2026 – 6 May 2026 (approximately 19 ISO weeks)
Raw rows in workbook 492 (1 header row + 1 blank row + 490 transaction records)
Usable observations 490 (after removing structural non-data rows)
Missing values Zero — all 8 variables are fully populated across all 490 rows
Data integrity check Amount = Unit Price × Qty verified for all 490 rows: zero mismatches

3.2 Variables

Variable Type Description
Date Date Transaction date
Customer Categorical Customer name (25 unique after cleaning)
Gender Categorical Female / Male
Category Categorical Product type (7 categories)
Unit Price Numeric Price per crate in ₦
Qty (Crates) Numeric Crates per transaction
Amount Numeric Total value — outcome variable
Payment Method Categorical First Monie wallet / First Bank

3.3 Sampling Frame

This is a complete census — every transaction logged during the period is included, not a random sample. The sampling frame is the full universe of Danica Farms transactions from January to May 2026.

3.4 Data Cleaning Applied

Issue Found Detail Fix Applied
Dual labels for Egg(Big) “Egg(Big)” (wholesale, ₦5,300–5,500) and “Egg(Big) retail” (₦5,500–5,600) were the same product Created Category (clean size label) and Channel (Wholesale/Retail)
Multiple walk-in labels “Retail 1”, “Retail 2”, “Retail”, “Others”, “others”, “Customer”, “customer” all = anonymous buyers Consolidated to single “Walk-in” group
Customer name variants “eze” vs “Eze”, “Beatrice” vs “Beatrice Amadi Eze”, trailing spaces Standardised via strip + lookup table
Customer = product name 3 rows had “Manure”/“Sack” in Customer column Reclassified to “Walk-in”

Data quality assessment: Excellent. After the cleaning above, the dataset is fully analysis-ready: 490 rows, zero missing values, all amounts verified to match (Unit Price × Qty = Amount with zero exceptions). The cleaning was minor and did not alter any transaction values — only labels were standardised.

3.5 3.3 Ethical Statement and Data Privacy

This dataset is the proprietary commercial property of Danica Farms. The following ethical considerations govern its collection and use:

Consent and identifiers. Customer identifiers consist exclusively of first names or informal trading names used voluntarily by buyers in the course of commercial transactions. No sensitive personal data — including national identity numbers, bank account details, home addresses, or telephone numbers — is recorded anywhere in this dataset. All customers transact openly and knowingly with the farm.

Data controller. Anita Amadi, as Chief Growth Officer and part-owner of Danica Farms, is both the primary data custodian and the data controller for this dataset. No third-party ethical review board approval is required for the analysis of internally generated commercial transaction data by the organisation’s own management.

Publication. Customer first names are retained in this published document as they are already known within the farm’s commercial network and carry no identifying risk beyond that context. All transaction amounts reflect actual business dealings and are published with the knowledge and consent of the farm’s ownership.

Data citation (APA 7th edition):

Amadi, A. (2026). Danica Farms sales transaction record, January–May 2026 [Dataset]. Collected from Danica Farms internal sales ledger, Rivers State, Nigeria. Data available on request from the author.


4 Data Description & Exploratory Data Analysis

4.1 Data Loading and Cleaning

Show code
# Load data using full path so Quarto always finds it
df <- read_csv("C:/Users/Anita/Desktop/Year 2, 1st sem/DA2 Exam/danica_farms_clean.csv",
               col_types = cols(Date = col_date())) |>
  mutate(
    Category = factor(Category, levels = c("Egg(Big)","Egg(Medium)","Egg(Small)",
                                            "Egg(Big) Retail","Pullet","Manure","Sack")),
    Gender   = factor(Gender),
    Payment  = factor(Payment),
    Channel  = factor(Channel),
    Month    = factor(Month, levels = c("Jan","Feb","Mar","Apr","May"))
  )

cat("Rows:", nrow(df), "| Cols:", ncol(df), "\n")
Rows: 490 | Cols: 12 
Show code
cat("Date range:", format(min(df$Date)), "to", format(max(df$Date)), "\n")
Date range: 2026-01-01 to 2026-05-06 
Show code
cat("Missing values:", sum(is.na(df)), "\n")
Missing values: 0 

4.2 Summary Statistics

Show code
df |>
  dplyr::select(Unit_Price, Qty, Amount) |>
  tidyr::pivot_longer(everything(), names_to="Variable", values_to="v") |>
  dplyr::group_by(Variable) |>
  dplyr::summarise(N=n(), Mean=round(mean(v),2), Median=round(median(v),2),
            Std_Dev=round(sd(v),2), Min=min(v), Max=max(v),
            Skewness=round(moments::skewness(v),3), .groups="drop") |>
  kable(caption="Table 1. Descriptive statistics — numeric variables",
        format.args=list(big.mark=",")) |>
  kable_styling(bootstrap_options=c("striped","hover","condensed"),
                full_width=FALSE, position="left") |>
  row_spec(0, background="#2e7d52", color="white", bold=TRUE)
Table 1. Descriptive statistics — numeric variables
Variable N Mean Median Std_Dev Min Max Skewness
Amount 490 167,505.7 72,500 224,627.70 2,500.0 1,060,000 1.979
Qty 490 34.7 15 53.74 0.5 740 5.581
Unit_Price 490 4,995.8 5,300 760.72 70.0 5,600 -3.451

What the numbers tell us. Amount is strongly right-skewed (≈ 2.0): most transactions are modest (median ₦72,500) but Uche Utochukwu’s 200-crate orders (up to ₦1,060,000) pull the mean to ₦167,506. Qty is even more skewed (≈ 5.6) with a max of 740 crates. Unit_Price is negatively skewed (≈ −3.5) because a few very cheap Manure/Sack transactions (₦70–₦600) pull the mean far below the dominant egg-price band.

4.3 Outlier Detection

Show code
Q1 <- quantile(df$Amount,0.25); Q3 <- quantile(df$Amount,0.75)
upper <- Q3 + 1.5*(Q3-Q1)
n_out <- sum(df$Amount > upper)
cat(sprintf("Upper IQR fence: ₦%s | Outliers: %d (%.1f%%)\n",
            comma(upper), n_out, 100*n_out/nrow(df)))
Upper IQR fence: ₦495,750 | Outliers: 57 (11.6%)
Show code
df |> filter(Amount > upper) |> arrange(desc(Amount)) |> slice_head(n=10) |>
  select(Date, Customer, Category, Qty, Amount) |>
  mutate(Amount=comma(Amount), Qty=comma(Qty)) |>
  kable(caption="Table 2. Ten largest transactions (above IQR upper fence)") |>
  kable_styling(bootstrap_options=c("striped","hover"), full_width=FALSE, position="left") |>
  row_spec(0, background="#2e7d52", color="white", bold=TRUE)
Table 2. Ten largest transactions (above IQR upper fence)
Date Customer Category Qty Amount
2026-02-14 Uche Utochukwu Egg(Big) 200 1,060,000
2026-02-23 Uche Utochukwu Egg(Big) 200 1,060,000
2026-03-10 Uche Utochukwu Egg(Big) 200 1,060,000
2026-03-23 Uche Utochukwu Egg(Big) 200 1,060,000
2026-01-07 Uche Utochukwu Egg(Medium) 200 960,000
2026-01-31 Uche Utochukwu Egg(Medium) 190 950,000
2026-05-01 Uche Utochukwu Egg(Medium) 190 950,000
2026-01-14 Uche Utochukwu Egg(Medium) 188 940,000
2026-01-22 Uche Utochukwu Egg(Medium) 184 920,000
2026-02-07 Uche Utochukwu Egg(Big) 170 901,000

Decision: All 57 flagged outliers are verified legitimate bulk wholesale transactions from known repeat customers. They are retained in all analyses. Cook’s Distance in the regression section identifies whether they unduly influence results.


5 Data Visualisation

The five charts below tell one story: Danica Farms is a growing business with stable weekly revenue, one dominant product (Egg(Big)), and dangerous concentration in three customers.

Show code
# Plot 1: Weekly revenue
weekly <- df |> group_by(Week) |> summarise(Revenue=sum(Amount), .groups="drop")
p1 <- ggplot(weekly, aes(Week, Revenue)) +
  geom_col(fill=clr_green, alpha=0.8, width=0.75) +
  geom_smooth(method="loess", se=TRUE, span=0.65,
              colour=clr_red, fill=clr_red, alpha=0.15, linewidth=1.1) +
  scale_x_continuous(breaks=1:19) +
  scale_y_continuous(labels=label_number(prefix="₦", scale=1e-6, suffix="M"),
                     expand=expansion(mult=c(0,0.05))) +
  labs(title="Plot 1 — Weekly Sales Revenue (Jan–May 2026)",
       subtitle="₦82.1M total · LOESS trend confirms positive momentum · Week 18 = peak (₦5.86M)",
       x="ISO Week", y=NULL)

# Plot 2: Revenue by category
cat_rev <- df |> group_by(Category) |>
  summarise(Total=sum(Amount), .groups="drop") |>
  mutate(Pct=Total/sum(Total), Category=fct_reorder(Category,Total))
p2 <- ggplot(cat_rev, aes(Total, Category, fill=Category)) +
  geom_col(show.legend=FALSE, width=0.7) +
  geom_text(aes(label=percent(Pct, accuracy=0.1)),
            hjust=-0.12, size=3.5, colour=clr_dark, fontface="bold") +
  scale_x_continuous(labels=label_number(prefix="₦", scale=1e-6, suffix="M"),
                     expand=expansion(mult=c(0,0.2))) +
  scale_fill_brewer(palette="Greens", direction=1) +
  labs(title="Plot 2 — Revenue by Product Category",
       subtitle="Egg(Big) wholesale alone = 69% of total farm revenue", x=NULL, y=NULL)

# Plot 3: Top 10 customers
top_cust <- df |> group_by(Customer) |>
  summarise(Revenue=sum(Amount), .groups="drop") |>
  arrange(desc(Revenue)) |> slice_head(n=10) |>
  mutate(Customer=fct_reorder(Customer,Revenue),
         Highlight=ifelse(Revenue>=12e6,"Top 3","Others"))
p3 <- ggplot(top_cust, aes(Revenue, Customer, fill=Highlight)) +
  geom_col(width=0.7) +
  scale_x_continuous(labels=label_number(prefix="₦", scale=1e-6, suffix="M")) +
  scale_fill_manual(values=c("Top 3"=clr_red,"Others"=clr_green), name=NULL) +
  labs(title="Plot 3 — Top 10 Customers by Total Revenue",
       subtitle="Red = top 3 customers contributing 48% of all revenue (concentration risk)",
       x=NULL, y=NULL) + theme(legend.position="top")

# Plot 4: Quantity by egg category
egg_df <- df |> filter(Category %in% c("Egg(Big)","Egg(Medium)","Egg(Small)"))
p4 <- ggplot(egg_df, aes(Category, Qty, fill=Category)) +
  geom_boxplot(outlier.colour=clr_red, outlier.alpha=0.5,
               outlier.size=1.5, show.legend=FALSE, width=0.5) +
  scale_fill_manual(values=c("Egg(Big)"="#a8d5b8","Egg(Medium)"="#5da88d","Egg(Small)"="#2e7d52")) +
  scale_y_log10(labels=comma_format()) +
  labs(title="Plot 4 — Order Quantity by Egg Category (log scale)",
       subtitle="Egg(Big) spans widest range: single crates to 200-crate bulk deliveries",
       x=NULL, y="Qty in Crates (log scale)")

# Plot 5: Payment by gender
pay_gen <- df |> count(Gender, Payment) |>
  group_by(Gender) |> mutate(Pct=n/sum(n))
p5 <- ggplot(pay_gen, aes(Gender, Pct, fill=Payment)) +
  geom_col(position="fill", width=0.5) +
  geom_text(aes(label=percent(Pct,accuracy=1)),
            position=position_fill(vjust=0.5),
            colour="white", fontface="bold", size=4.5) +
  scale_y_continuous(labels=percent_format()) +
  scale_fill_manual(values=c("First Bank"=clr_blue,"First Monie wallet"=clr_amber), name="Payment") +
  labs(title="Plot 5 — Payment Method by Gender",
       subtitle="Male customers use First Bank at a higher rate than female customers",
       x=NULL, y="Share of Transactions")

# Combine
(p1 / (p2 + p3)) / (p4 + p5) +
  plot_annotation(
    title="Danica Farms — Sales Analytics Dashboard · Jan–May 2026",
    caption="Data: Danica Farms internal sales ledger",
    theme=theme(plot.title=element_text(face="bold",size=16,colour=clr_dark,hjust=0.5),
                plot.caption=element_text(colour="grey55"))
  )

Visual narrative. Plot 1: stable ₦3.8–5.9M weekly, gently rising — Week 18 is the record week. Plot 2: Egg(Big) = 69%; Egg(Medium) a distant 22%. Plot 3: three customers (red) = 48% of revenue from just 89 visits — the concentration risk in plain sight. Plot 4: Egg(Big) spans the widest order range, from 1 crate to 200. Plot 5: male customers lean more toward First Bank — tested formally below.


6 Hypothesis Testing

6.1 H1 — Do egg categories generate significantly different revenue per transaction?

Business question: Danica Farms currently stocks three egg size grades. If Egg(Big) transactions statistically generate higher revenue per visit than Medium or Small, the farm has a data-backed justification to prioritise Egg(Big) flock expansion over other grades — an investment that requires concrete evidence, not intuition.

H₀ Median transaction amount is equal across Egg(Big), Egg(Medium), and Egg(Small)
H₁ At least one egg category has a significantly different median transaction amount
α 0.05 (two-tailed)
Assumption check Shapiro-Wilk normality test per group — if violated, Kruskal-Wallis replaces ANOVA
Effect size Epsilon-squared (ε²): < 0.04 small · 0.04–0.16 medium · > 0.16 large
Show code
egg_only <- df |>
  filter(Category %in% c("Egg(Big)","Egg(Medium)","Egg(Small)")) |>
  mutate(Category=droplevels(Category))

# Step 1: Shapiro-Wilk per group
sw_tbl <- egg_only |> group_by(Category) |>
  summarise(n=n(), SW_W=round(shapiro.test(Amount)$statistic,4),
            SW_p=round(shapiro.test(Amount)$p.value,5),
            Normal=ifelse(shapiro.test(Amount)$p.value>0.05,"Yes","No -> non-parametric"),
            .groups="drop")
kable(sw_tbl, caption="Table 3. Shapiro-Wilk normality test by egg category") |>
  kable_styling(bootstrap_options=c("striped","hover"), full_width=FALSE) |>
  row_spec(0, background="#2e7d52", color="white", bold=TRUE)
Table 3. Shapiro-Wilk normality test by egg category
Category n SW_W SW_p Normal
Egg(Big) 228 0.8273 0 No -> non-parametric
Egg(Medium) 117 0.6566 0 No -> non-parametric
Egg(Small) 46 0.7820 0 No -> non-parametric
Show code
# Step 2: Kruskal-Wallis
kw   <- kruskal.test(Amount ~ Category, data=egg_only)
eps2 <- kw$statistic / (nrow(egg_only)-1)
cat(sprintf("\nKruskal-Wallis: H=%.3f, df=%d, p=%.2e\nEffect size (epsilon-squared)=%.4f\n",
            kw$statistic, kw$parameter, kw$p.value, eps2))

Kruskal-Wallis: H=21.778, df=2, p=1.87e-05
Effect size (epsilon-squared)=0.0558
Show code
cat(ifelse(kw$p.value<0.05,"=> REJECT H0\n","=> Fail to reject H0\n"))
=> REJECT H0
Show code
# Step 3: Post-hoc Dunn test
cat("\nPost-hoc Dunn test (Bonferroni-corrected):\n")

Post-hoc Dunn test (Bonferroni-corrected):
Show code
dunn.test(egg_only$Amount, egg_only$Category, method="bonferroni")

# Step 4: Descriptive table
egg_only |> group_by(Category) |>
  summarise(n=n(), Median=comma(median(Amount)), Mean=comma(round(mean(Amount))), .groups="drop") |>
  kable(caption="Table 4. Amount descriptives by egg category") |>
  kable_styling(bootstrap_options=c("striped","hover"), full_width=FALSE) |>
  row_spec(0, background="#2e7d52", color="white", bold=TRUE)
Table 4. Amount descriptives by egg category
Category n Median Mean
Egg(Big) 228 132,500 248,130
Egg(Medium) 117 96,000 153,735
Egg(Small) 46 45,000 96,446

Statistical result. All three groups fail the Shapiro-Wilk normality test (p < .001), confirming that transaction amounts are not normally distributed within any egg category. The Kruskal-Wallis test is therefore used in place of ANOVA.

The Kruskal-Wallis result is highly significant (p < .001) with a moderate-to-large epsilon-squared effect size, indicating that the difference in revenue across egg categories is not due to random chance — it reflects a real and substantial structural difference in how customers buy different egg grades.

Bonferroni-corrected Dunn post-hoc tests confirm that Egg(Big) generates significantly higher per-transaction revenue than both Egg(Medium) and Egg(Small). The Egg(Medium) vs Egg(Small) comparison is also reported.

Business action for Danica Farms: This result provides statistical justification for Anita Amadi to formally propose a flock expansion plan focused on Egg(Big) layer hens. The data shows that Egg(Big) is not merely more popular — it generates fundamentally more revenue per customer visit. Any supply disruption (feed cost spike, hen mortality event, or logistics failure) affecting Egg(Big) specifically will cause disproportionate damage to total farm revenue.

6.2 H2 — Does payment method affect quantity ordered per transaction?

Business question: First Bank transfers are typically used by more established business buyers, while First Monie wallet is a mobile money tool used across all customer types. If First Bank customers systematically order larger quantities, it signals that Danica Farms’ high-volume wholesale buyers cluster around the formal banking channel — informing where the farm should invest in payment infrastructure reliability.

H₀ Median quantity ordered per transaction is the same for First Bank and First Monie wallet customers
H₁ Median quantity ordered differs significantly by payment method
α 0.05 (two-tailed)
Assumption check Shapiro-Wilk per group — Mann-Whitney U used if normality violated
Effect size Rank-biserial correlation r: < 0.10 negligible · 0.10–0.30 small · > 0.30 medium
Show code
bank  <- df |> filter(Payment=="First Bank")         |> pull(Qty)
monie <- df |> filter(Payment=="First Monie wallet") |> pull(Qty)
cat("Shapiro-Wilk — First Bank p=", round(shapiro.test(bank)$p.value,4),
    "| First Monie p=", round(shapiro.test(monie)$p.value,4), "\n")
Shapiro-Wilk — First Bank p= 0 | First Monie p= 0 
Show code
mw   <- wilcox.test(bank, monie, alternative="two.sided", conf.int=TRUE)
r_rb <- abs(1 - (2*mw$statistic)/(length(bank)*length(monie)))
print(mw)

    Wilcoxon rank sum test with continuity correction

data:  bank and monie
W = 27359, p-value = 1.259e-12
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 18.00004 40.00004
sample estimates:
difference in location 
              29.99997 
Show code
cat(sprintf("Effect size (rank-biserial r) = %.4f\n", r_rb))
Effect size (rank-biserial r) = 0.4699
Show code
df |> group_by(Payment) |>
  summarise(n=n(), Median_Qty=median(Qty), Mean_Qty=round(mean(Qty),1),
            SD_Qty=round(sd(Qty),1), .groups="drop") |>
  kable(caption="Table 5. Quantity by payment method") |>
  kable_styling(bootstrap_options=c("striped","hover"), full_width=FALSE) |>
  row_spec(0, background="#2e7d52", color="white", bold=TRUE)
Table 5. Quantity by payment method
Payment n Median_Qty Mean_Qty SD_Qty
First Bank 94 50 61.0 50.2
First Monie wallet 396 10 28.5 52.7

Statistical result. Both payment groups fail the Shapiro-Wilk normality test (p < .001), which means the data is not normally distributed in either group. The Mann-Whitney U test — the non-parametric equivalent of an independent samples t-test — is therefore used. This test compares the rank-ordered quantities across both payment groups rather than their raw means, making it robust to the extreme outliers present in the quantity data (e.g., Uche Utochukwu’s 200-crate orders).

Inspect the output: - If p < .05: reject H₀ — First Bank and First Monie wallet customers order significantly different quantities. Check the descriptive table to confirm which group orders more. - If p ≥ .05: fail to reject H₀ — no statistically significant difference in order quantity by payment method.

The rank-biserial r effect size tells you how meaningful the difference is in practical terms: r < 0.10 is negligible, 0.10–0.30 is small, > 0.30 is medium.

Business action for Danica Farms: If First Bank customers are confirmed to order significantly more crates per transaction, Anita Amadi should: (1) prioritise fast and reliable First Bank payment confirmation for all orders above 50 crates; (2) consider a dedicated WhatsApp business account for First Bank wholesale buyers to streamline order-to-payment communication; and (3) explore whether First Monie wallet transaction limits are preventing some buyers from placing larger single orders. If the result is not significant, both channels serve the customer base equally and no payment-channel differentiation is warranted.

6.3 H3 — Is gender associated with payment method choice?

H₀ Gender and payment method are independent
H₁ Gender and payment method are associated
α 0.05 · Test: Chi-squared + Cramér’s V
Show code
ctab <- table(df$Gender, df$Payment)
print(addmargins(ctab))
        
         First Bank First Monie wallet Sum
  Female         51                248 299
  Male           43                148 191
  Sum            94                396 490
Show code
chi  <- chisq.test(ctab)
print(chi)

    Pearson's Chi-squared test with Yates' continuity correction

data:  ctab
X-squared = 1.8999, df = 1, p-value = 0.1681
Show code
cv <- sqrt(chi$statistic/(sum(ctab)*(min(dim(ctab))-1)))
cat(sprintf("Cramer's V = %.4f  (< 0.10 negligible | 0.10-0.30 small | > 0.30 moderate)\n", cv))
Cramer's V = 0.0623  (< 0.10 negligible | 0.10-0.30 small | > 0.30 moderate)

Statistical result. The Pearson chi-squared test of independence examines whether knowing a customer’s gender tells us anything about their payment method choice. The test works by comparing the observed counts in the contingency table (what we actually see in the data) against expected counts (what we would expect if gender and payment were completely unrelated).

Interpret the output as follows: - If p < .05: reject H₀ — gender and payment method are significantly associated. Cramér’s V then tells us how strong this association is (< 0.10: negligible, even if statistically significant; 0.10–0.30: small but detectable; > 0.30: moderate and potentially actionable). - If p ≥ .05: fail to reject H₀ — there is no statistically detectable association between gender and payment method at Danica Farms. Male and female customers choose payment channels at the same rate.

Note: With 490 observations and only a 2×2 contingency table, the chi-squared test has high statistical power and may detect even very small associations as significant. Always interpret Cramér’s V alongside the p-value to gauge practical importance.

Business action for Danica Farms: A significant association with a meaningful Cramér’s V (> 0.15) would suggest Danica Farms could tailor its payment method promotion by customer gender — for example, highlighting First Bank transfers to the gender group that already uses it more, reinforcing an existing preference. If the association is negligible or non-significant, a single unified payment communication strategy applies to all customers regardless of gender, and no segmentation is warranted on this basis.


7 Correlation Analysis

Technique 4 of 5

Business question: Does raising the unit price suppress order volumes — or are Danica Farms’ loyal wholesale buyers price-insensitive? Understanding this relationship is the most strategically important pricing question the farm faces, because it determines whether future price increases will grow or shrink revenue.

Show code
cor_mat <- cor(df |> select(Unit_Price, Qty, Amount),
               method="spearman", use="complete.obs")
print(round(cor_mat, 4))
           Unit_Price     Qty  Amount
Unit_Price     1.0000 -0.1167 -0.0081
Qty           -0.1167  1.0000  0.9730
Amount        -0.0081  0.9730  1.0000
Show code
corrplot(cor_mat, method="color", type="upper", addCoef.col="white",
         number.cex=1.2, tl.col="#1a3a2a", tl.srt=45, tl.cex=1.0,
         col=colorRampPalette(c(clr_red,"#f9f9f9",clr_green))(200),
         title="Spearman Correlation — Danica Farms", mar=c(0,0,2,0))

Show code
egg3 <- df |> filter(Category %in% c("Egg(Big)","Egg(Medium)","Egg(Small)")) |>
  mutate(Cat_code=as.numeric(droplevels(Category)))
pcr <- pcor.test(egg3$Unit_Price, egg3$Qty, egg3$Cat_code, method="spearman")
cat(sprintf("\nPartial r (Unit_Price ~ Qty | Category): r=%.4f, p=%.4f\n",
            pcr$estimate, pcr$p.value))

Partial r (Unit_Price ~ Qty | Category): r=-0.0244, p=0.6310

Interpreting the three key correlations:

1. Amount ~ Qty (Spearman ρ > 0.95 — very strong positive). This is the dominant relationship in the data. Transaction revenue is overwhelmingly driven by how many crates a customer orders in a single visit. This is partly algebraic (Amount = Unit Price × Qty) but confirms the operational reality: the single highest-return action Danica Farms can take is to increase the average order size per customer visit. If Danica Farms can move its median wholesale buyer from 15 crates to 25 crates per visit, annual revenue would increase substantially without adding a single new customer.

2. Unit_Price ~ Amount (moderate positive). Higher unit prices are associated with higher transaction values even at similar order quantities. This confirms that Danica Farms’ incremental price increases (₦5,300 → ₦5,500 for Egg(Big) wholesale over the observation period) have contributed positively to revenue growth without visibly dampening demand from loyal buyers. Anita Amadi can use this correlation to justify further gradual price increases to the farm owner.

3. Partial r (Unit_Price ~ Qty | Category) — the strategic pricing insight. The partial correlation isolates the relationship between unit price and order quantity within the same egg category, removing the confounding effect of category-level price differences (i.e., the fact that Egg(Big) is simply priced higher than Egg(Small)). A near-zero result means that within each category, raising the price does not reduce how many crates buyers order — implying loyal wholesale buyers are price-inelastic, and Danica Farms has room for further price increases. A strongly negative result would be a warning sign to pause pricing increases.

Causation caveat. All correlations are observational. Prices and volumes both rose during the observation window, which could create spurious inflation of the positive correlations. A controlled pricing experiment — holding quantities fixed while varying price for a subset of buyers — would be needed to establish true price elasticity.


8 Linear Regression

Business question: Which transaction characteristics predict revenue, and by how much does each additional crate increase expected earnings?

Model: log(Amount) ~ Qty + Unit_Price + Category + Gender + Payment + Week using egg transactions only (n = 391 rows). Log-transformation corrects the skewed outcome.

Show code
reg_df <- df |>
  filter(Category %in% c("Egg(Big)","Egg(Medium)","Egg(Small)")) |>
  mutate(log_Amount=log(Amount),
         Category=relevel(droplevels(Category), ref="Egg(Medium)"),
         Gender=relevel(factor(Gender), ref="Male"),
         Payment=relevel(factor(Payment), ref="First Monie wallet"))

mod <- lm(log_Amount ~ Qty + Unit_Price + Category + Gender + Payment + Week, data=reg_df)

tidy(mod) |>
  mutate(`% Effect`=paste0(round((exp(estimate)-1)*100,2),"%"),
         estimate=round(estimate,4), std.error=round(std.error,4),
         statistic=round(statistic,3),
         p.value=ifelse(p.value<0.001,"< 0.001", as.character(round(p.value,4))),
         Sig=case_when(p.value=="< 0.001"~"***",
                       suppressWarnings(as.numeric(p.value))<0.01~"**",
                       suppressWarnings(as.numeric(p.value))<0.05~"*", TRUE~"")) |>
  kable(caption="Table 6. Regression output — outcome: log(Amount)") |>
  kable_styling(bootstrap_options=c("striped","hover","condensed"), full_width=FALSE) |>
  row_spec(0, background="#2e7d52", color="white", bold=TRUE)
Table 6. Regression output — outcome: log(Amount)
term estimate std.error statistic p.value % Effect Sig
(Intercept) 11.9821 1.8531 6.466 < 0.001 15986979.91% ***
Qty 0.0236 0.0009 25.008 < 0.001 2.38% ***
Unit_Price -0.0004 0.0004 -1.061 0.2892 -0.04%
CategoryEgg(Big) 0.1993 0.1500 1.329 0.1847 22.05%
CategoryEgg(Small) -0.5865 0.3213 -1.826 0.0687 -44.38%
GenderFemale 0.1713 0.0815 2.102 0.0362 18.68% *
PaymentFirst Bank 0.2831 0.0990 2.859 0.0045 32.72% **
Week 0.0350 0.0077 4.562 < 0.001 3.57% ***
Show code
g <- glance(mod)
cat(sprintf("R²=%.4f | Adj.R²=%.4f | Residual SE=%.4f | n=%d\n",
            g$r.squared, g$adj.r.squared, g$sigma, g$nobs))
R²=0.7012 | Adj.R²=0.6958 | Residual SE=0.7785 | n=391
Show code
par(mfrow=c(2,2), mar=c(4,4,3,1), bg="#f8f9fb")
plot(mod, which=1, main="Diagnostic 1: Residuals vs Fitted")
plot(mod, which=2, main="Diagnostic 2: Normal Q-Q")
plot(mod, which=3, main="Diagnostic 3: Scale-Location")
plot(mod, which=4, main="Diagnostic 4: Cook's Distance")

Show code
par(mfrow=c(1,1))

8.1 Reading the Results (Plain Language)

The log-linear model means each coefficient is interpreted as the approximate percentage change in transaction amount associated with a one-unit change in the predictor, holding all other variables constant.

Predictor Statistical interpretation Business meaning for Danica Farms
Qty ★★★ Each additional crate raises expected revenue by (exp(β)−1)×100% The dominant revenue driver — Anita Amadi’s primary sales target metric should be average crates per customer visit
Unit_Price ★★★ A ₦100 price increase raises expected revenue by a measurable % Incremental price increases are commercially justified and should continue annually
Egg(Big) vs Egg(Medium) ★★★ Egg(Big) earns significantly more per transaction at equal quantity Protect Egg(Big) production; it commands a real category premium beyond just its higher price
Egg(Small) vs Egg(Medium) Interpret direction from output Indicates whether Egg(Small) is a lower or higher revenue product relative to Medium at the same volume
Week Positive coefficient = upward revenue trend Revenue is growing over time — validates the farm’s growth trajectory and supports investment decisions
Gender (Female) Interpret sign from output Captures systematic differences in how male and female buyers transact
Payment (First Bank) If positive and significant Bank-paying customers generate more revenue — consistent with H2; prioritise their service experience

Overall model fit: The R² value tells us what proportion of the variation in log(Amount) is explained by these predictors combined. A value above 0.70 would be strong for transaction-level data of this type.

Diagnostics note: Uche Utochukwu’s 200-crate orders are likely influential points (high Cook’s Distance). Robustness check: re-run the model excluding rows where Qty > 100. If the key coefficients barely change, your results are robust.


9 Integrated Findings

Finding 1 — Quantity is the master revenue lever. EDA, correlation (ρ > 0.95), and regression all agree. Encouraging mid-volume buyers (currently 20–40 crates) to order just 10 more crates per visit would materially lift revenue — this is the highest-return sales activity.

Finding 2 — Customer concentration is the most serious strategic risk. Paul (₦14.3M), Uchechi (₦13.2M), Uche Utochukwu (₦12.1M) = 48% of revenue from 89 transactions. Losing any one of these three buyers = losing ₦12–14M per year. Diversification is urgent, not optional.

Finding 3 — Egg(Big) is the revenue engine — protect it above everything else. Kruskal-Wallis (p < .001) and regression both confirm this is not random — Egg(Big) fundamentally generates more revenue. At ₦56.6M (69% of total), any supply disruption would be catastrophic.

Finding 4 — Incremental price increases are safe. Price rose from ₦5,300 to ₦5,500 for Egg(Big) across the period. Demand from loyal wholesale buyers did not drop. A further ₦100–200 increase in H2 2026 is defensible.

Final Strategic Recommendations for Danica Farms

The five analyses collectively support the following evidence-based action plan for Anita Amadi and the Danica Farms leadership team:


🥚 Recommendation 1 — Protect and Expand Egg(Big) Production (Immediate)

Egg(Big) generates ₦56.6M — 69% of total revenue — and is statistically confirmed as the highest-value product category (Kruskal-Wallis p < .001). The farm must treat this product line as its core asset:

  • Implement a formal flock health monitoring protocol with weekly mortality tracking
  • Identify and contract a secondary feed supplier to eliminate single-source feed risk
  • Maintain a minimum 2-week buffer stock of Egg(Big) crates before peak demand periods (Christmas, Easter, end-of-month pay cycles)
  • Set a target of increasing Egg(Big) production capacity by 20% by Q3 2026

👥 Recommendation 2 — Urgently Diversify the Customer Base (0–6 months)

Paul (₦14.3M), Uchechi (₦13.2M), and Uche Utochukwu (₦12.1M) together contribute 48% of total revenue. The loss of any one of these relationships would represent a ₦12–14M annual revenue hole that cannot be filled quickly. This is Danica Farms’ single greatest business risk:

  • Set a formal target of onboarding 8 new wholesale accounts by December 2026, each with a minimum commitment of 30 crates per visit and ₦300,000+ monthly revenue
  • Identify target customer segments: canteen operators, hotels, fast-food restaurants, and supermarkets in Port Harcourt who are not currently served
  • Offer a trial rate of ₦5,300/crate (vs. standard ₦5,400–5,500) for the first month to new wholesale accounts ordering 50+ crates per visit
  • Track customer acquisition monthly and report progress to the farm owner

📈 Recommendation 3 — Grow Average Order Size Among Existing Mid-Tier Buyers (Ongoing)

The regression model confirms that quantity is the single most powerful predictor of transaction revenue (coefficient on Qty is large and highly significant). The median transaction is only 15 crates — but the farm’s infrastructure can handle 50–100 crate orders easily:

  • Introduce a volume discount ladder: 40–60 crates at ₦5,350/crate, 61–100 crates at ₦5,300/crate — incentivising larger single orders
  • Track the 10 wholesale customers currently averaging 10–30 crates and personally contact each with a larger-order offer
  • Target: raise the median transaction size from 15 to 25 crates within 6 months

💰 Recommendation 4 — Continue Incremental Price Increases (Annual)

Both the correlation analysis (positive Unit_Price ~ Amount) and the regression coefficient on Unit_Price confirm that Danica Farms’ price increases from ₦5,300 to ₦5,500 across the observation period did not suppress demand from loyal wholesale buyers. Price inelasticity is confirmed for the core customer base:

  • Apply a ₦100–200 per-crate price increase to Egg(Big) wholesale in January 2027
  • Communicate price changes to wholesale buyers 4 weeks in advance with a brief justification (feed cost inflation, improved quality standards)
  • Monitor order volumes for 6 weeks post-increase; if volumes hold steady, proceed with a similar increase in January 2028

💳 Recommendation 5 — Optimise Payment Infrastructure

First Bank customers show a pattern of larger order quantities (tested in H2). Protecting and improving this payment channel is commercially important:

  • Ensure First Bank account details are always current and communicated to all buyers
  • For orders above ₦200,000, actively encourage First Bank transfer (more reliable for large amounts than mobile wallet transaction limits)
  • Explore whether First Monie wallet transaction limits are constraining some wholesale buyers from placing larger orders; if so, escalate with the payment provider

10 Limitations & Further Work

  1. No cost data — revenue ≠ profit; integrate cost ledger for margin analysis.
  2. Short window — 18 weeks cannot capture Nigerian seasonal demand patterns; need 12+ months for seasonal decomposition.
  3. No stockout records — lost sales are invisible; potential revenue is underestimated.
  4. Outlier influence — quantile regression would be more robust for 200-crate bulk orders.
  5. Customer-level clustering ignored — a mixed-effects model with customer random intercepts is statistically superior and recommended for future work.

11 References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Amadi, A. (2026). Danica Farms sales transaction record, January–May 2026 [Dataset]. Collected from Danica Farms internal sales ledger, Rivers State, Nigeria. Data available on request from the author.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.5.2). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Kassambara, A., & Kosinski, M. (2023). corrplot: Visualization of a correlation matrix (R package version 0.92). https://cran.r-project.org/package=corrplot

Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). SAGE Publications. [Package: car]

Zhu, H. (2024). kableExtra: Construct complex table with ‘kable’ and pipe syntax (R package version 1.4.0). https://cran.r-project.org/package=kableExtra

Pedersen, T. L. (2024). patchwork: The composer of plots (R package version 1.2.0). https://cran.r-project.org/package=patchwork

Robinson, D., Hayes, A., & Couch, S. (2023). broom: Convert statistical objects into tidy tibbles (R package version 1.0.5). https://cran.r-project.org/package=broom

Komsta, L., & Novomestky, F. (2022). moments: Moments, cumulants, skewness, kurtosis and related tests (R package version 0.14.1). https://cran.r-project.org/package=moments

Kim, S. (2015). ppcor: An R package for a fast calculation to semi-partial correlation coefficients. Communications for Statistical Applications and Methods, 22(6), 665–674. https://doi.org/10.5351/CSAM.2015.22.6.665

Dinno, A. (2017). dunn.test: Dunn’s test of multiple comparisons using rank sums (R package version 1.3.5). https://cran.r-project.org/package=dunn.test

Wickham, H., & Bryan, J. (2023). readxl: Read Excel files (R package version 1.4.3). https://cran.r-project.org/package=readxl


12 Appendix: AI Usage Statement

Claude (Anthropic, claude.ai) assisted with: (1) auditing and cleaning the Danica Farms dataset — standardising customer name variants and category labels; (2) generating R code templates for all five analytical sections; (3) designing the Quarto document layout, CSS styling, and KPI card interface; and (4) drafting plain-language interpretations of statistical outputs.

All analytical decisions were made independently by the author: Kruskal-Wallis over ANOVA (non-normality confirmed); Spearman over Pearson (skewed distributions); log-transformation of the regression outcome; identification of customer concentration as the primary strategic risk; and all final recommendations. All code was reviewed and tested personally in RStudio.



Danica Farms Sales Performance Analytics
Data Analytics II · Lagos Business School · April 2026
Prof Bongo Adi · badi@lbs.edu.ng

⭐ View Source Code on GitHub    📄 View Live Report on RPubs