Vendor Performance Analytics at Chowdeck: An Exploratory & Inferential Study of Q1 2026 Sales Data

Author

Jeremiah Izuagie

Published

May 15, 2026

1 Executive Summary

Note

Word count: ~180 words

This report presents an exploratory and inferential analytics study of Chowdeck’s Q1 2026 vendor sales performance data. Chowdeck is Nigeria’s fastest-growing food and drink delivery platform, processing over 50,000 orders daily across multiple cities. The study analyses 120 vendor records spanning 13 weeks (December 29, 2025 to March 29, 2026), covering key dimensions including revenue performance (GMV), order volumes, advertising activity, vendor type, and geographic location.

Key findings:

Total Q1 GMV across the portfolio was approximately ₦138.6 million, with significant variation across vendor types and salesperson portfolios
Vendors that ran advertising campaigns generated significantly higher GMV than non-advertising vendors (p < 0.05)
Vendor type is a statistically significant predictor of GMV — restaurants and fast food outlets consistently outperformed other categories
A strong positive correlation (r = 0.89) was identified between total orders and total GMV, confirming order volume as the primary revenue driver
The regression model explains approximately 82% of GMV variance, with active weeks and order volume as the strongest predictors

Recommendation: Chowdeck’s sales team should prioritise onboarding vendors with high order frequency potential and invest in advertising activation for underperforming accounts.

2 Professional Disclosure

2.1 Role & Organisation

I am a Sales Executive at Chowdeck, Nigeria’s leading food and drink delivery platform headquartered in Lagos. My day-to-day responsibilities include vendor acquisition and onboarding, account management, partnership development, advertising sales, and contributing to the company’s Q1–Q2 revenue targets across multiple business verticals including restaurant GMV, ads, events, and API integrations.

The data analysed in this study is drawn from my team’s Q1 2026 sales portfolio, covering 120 vendors managed across three salespersons — anonymised as Jerry, Jachike, and Louis — across 13 weeks of the quarter.

2.2 Technique Justification

Technique 1 — Exploratory Data Analysis (EDA): EDA is the natural first step in any sales performance review at Chowdeck. Before drawing conclusions about which vendors or strategies are performing, I must first understand the shape and quality of the data — identifying outliers (vendors generating disproportionate GMV), missing activity weeks (vendors who went inactive), and distributional patterns that might skew aggregate statistics. In my role, EDA directly informs weekly pipeline reviews and helps identify which vendors need account management intervention.

Technique 2 — Data Visualisation: Chowdeck operates in a fast-moving market where decisions need to be communicated quickly to non-technical stakeholders including team leads and senior managers. Effective visualisation of sales trends, geographic distributions, and product performance allows me to present weekly and quarterly performance narratives clearly. This directly supports the performance review meetings I participate in and the reports shared with leadership.

Technique 3 — Hypothesis Testing: A recurring question in our sales team is whether certain strategies — running ads, focusing on specific vendor types, or targeting particular cities — produce measurably better outcomes. Hypothesis testing allows me to answer these questions rigorously rather than relying on anecdote. For example, testing whether vendors that ran advertising campaigns in Q1 generated significantly higher GMV than those that did not directly informs our Q2 advertising sales strategy.

Technique 4 — Correlation Analysis: Understanding the relationships between key performance variables — order volume, active weeks, advertising activity, and GMV — is central to how I prioritise account management effort. If order volume strongly predicts GMV, then focusing on driving orders (through promotions or combo listings) is more valuable than simply onboarding new vendors. Correlation analysis provides the evidence base for these strategic prioritisation decisions.

Technique 5 — Linear Regression: Regression allows me to quantify the specific contribution of each vendor characteristic to GMV outcomes. This is directly applicable to my role: if I can show that each additional active week generates a specific naira value in GMV, I can build a compelling case to vendors for staying consistently live on the platform. Similarly, if advertising activity has a significant positive coefficient, it strengthens our pitch for vendors to invest in ads.

3 Data Collection & Sampling

3.1 Data Source & Collection Method

The dataset was extracted from Chowdeck’s internal vendor performance tracking system, which records weekly GMV, order volumes, and revenue activity for all vendors managed by the sales team. Data was collected for Q1 2026, covering the period December 29, 2025 to March 29, 2026 — a total of 13 weeks.

The primary variables recorded for each vendor include:

Variable	Type	Description
Vendor_ID	Numeric	Unique identifier for each vendor
Vendor_Name	Categorical	Anonymised vendor name (Vendor_001 to Vendor_120)
Salesperson	Categorical	Sales rep responsible for the account (Jerry, Jachike, Louis)
Vendor_Type	Categorical	Business category (Restaurant, Pharmacy, Supermarket, etc.)
City	Categorical	Operating city (Lagos, Abuja, Port Harcourt, Ibadan, Kano)
Product_Type	Categorical	Revenue type (GMV, Ads, Events, Voucher)
Ran_Ads	Binary	Whether the vendor ran advertising in Q1 (1 = Yes, 0 = No)
Active_Weeks	Numeric	Number of weeks with non-zero GMV
Total_Orders	Numeric	Total orders processed in Q1
Total_GMV	Numeric	Total gross merchandise value in Q1 (₦)
Week columns (×13)	Numeric	Weekly GMV for each of the 13 weeks

3.2 Sampling Frame & Sample Size

The sampling frame is the complete population of active vendors managed by the Chowdeck sales team during Q1 2026. The sample consists of 120 vendors, representing a census of the team’s portfolio rather than a random sample — meaning all eligible vendors are included rather than a subset.

This approach was appropriate because:

The total portfolio size (120 vendors) is manageable for full enumeration
Sampling from this population would have introduced unnecessary selection bias
A census ensures all performance patterns — including outliers — are captured

Statistical justification of sample size: With n = 120 and α = 0.05, the minimum detectable effect size for a two-sample t-test at 80% power is approximately d = 0.52 (medium effect), which is sufficient for the business questions being addressed.

3.3 Time Period

All data covers Q1 2026: December 29, 2025 — March 29, 2026 (13 weeks). This represents one complete business quarter and is the standard reporting period used by Chowdeck’s sales team for performance evaluation.

4 Data Description

Show Code

# ── Load dataset ──
df <- read.csv("chowdeck_q1_sales_data.csv")

# Convert categorical variables to factors
df <- df %>%
  mutate(
    Salesperson  = factor(Salesperson),
    Vendor_Type  = factor(Vendor_Type),
    City         = factor(City),
    Product_Type = factor(Product_Type),
    Ran_Ads      = factor(Ran_Ads, labels = c("No Ads", "Ran Ads"))
  )

# Display structure
glimpse(df)

Rows: 120
Columns: 23
$ Vendor_ID    <int> 100000, 100001, 100002, 100003, 100004, 100005, 100006, 1…
$ Vendor_Name  <chr> "Vendor_045", "Vendor_048", "Vendor_005", "Vendor_056", "…
$ Salesperson  <fct> Jerry, Jachike, Louis, Jerry, Jachike, Louis, Jerry, Jach…
$ Vendor_Type  <fct> Retail, Cloud Kitchen, Cloud Kitchen, Event Venue, Bevera…
$ City         <fct> Abuja, Port Harcourt, Lagos, Kano, Lagos, Lagos, Ibadan, …
$ Product_Type <fct> Events, GMV, GMV, GMV, GMV, GMV, GMV, GMV, GMV, Ads, GMV,…
$ Ran_Ads      <fct> Ran Ads, No Ads, Ran Ads, No Ads, Ran Ads, No Ads, No Ads…
$ Active_Weeks <int> 13, 10, 10, 13, 11, 13, 10, 12, 10, 11, 10, 10, 13, 12, 1…
$ Total_Orders <int> 106, 32, 97, 1512, 66, 59, 26, 28, 423, 33, 199, 8, 334, …
$ Total_GMV    <int> 556857, 110726, 363763, 8199596, 287356, 233839, 137566, …
$ Dec29_Jan4   <int> 45361, 0, 0, 563089, 0, 17539, 0, 0, 0, 0, 0, 0, 67797, 0…
$ Jan5_Jan11   <int> 34242, 0, 0, 548201, 0, 17708, 0, 19471, 0, 0, 0, 0, 5738…
$ Jan12_Jan18  <int> 32788, 0, 0, 557282, 26406, 6240, 0, 12240, 0, 15953, 0, …
$ Jan19_Jan25  <int> 48222, 9951, 45147, 688851, 21797, 18626, 8519, 19484, 17…
$ Jan26_Feb1   <int> 29422, 7044, 35795, 429074, 31077, 20860, 13324, 17655, 1…
$ Feb2_Feb8    <int> 46709, 14031, 32259, 480979, 23191, 13638, 13576, 19657, …
$ Feb9_Feb15   <int> 42264, 13907, 41631, 673388, 25196, 20538, 16583, 24483, …
$ Feb16_Feb22  <int> 29658, 11613, 33978, 605177, 27460, 26246, 12956, 15872, …
$ Feb23_Mar1   <int> 39980, 10199, 33579, 679833, 28152, 19994, 16273, 13969, …
$ Mar2_Mar8    <int> 51538, 7506, 49641, 623911, 25404, 15611, 18170, 13619, 2…
$ Mar9_Mar15   <int> 50543, 9793, 35527, 608379, 17987, 16361, 7793, 14182, 15…
$ Mar16_Mar22  <int> 58884, 14723, 13111, 894139, 31267, 19195, 19062, 17768, …
$ Mar23_Mar29  <int> 47246, 11959, 43095, 847293, 29419, 21283, 11310, 19991, …

Show Code

# ── Load dataset ──
df = pd.read_csv("chowdeck_q1_sales_data.csv")

# Convert to appropriate types
cat_cols = ['Salesperson', 'Vendor_Type', 'City', 'Product_Type']
for col in cat_cols:
    df[col] = df[col].astype('category')

df['Ran_Ads'] = df['Ran_Ads'].map({0: 'No Ads', 1: 'Ran Ads'}).astype('category')

print("Dataset shape:", df.shape)

Dataset shape: (120, 23)

Show Code

print("\nData types:\n", df.dtypes)


Data types:
 Vendor_ID          int64
Vendor_Name       object
Salesperson     category
Vendor_Type     category
City            category
Product_Type    category
Ran_Ads         category
Active_Weeks       int64
Total_Orders       int64
Total_GMV          int64
Dec29_Jan4         int64
Jan5_Jan11         int64
Jan12_Jan18        int64
Jan19_Jan25        int64
Jan26_Feb1         int64
Feb2_Feb8          int64
Feb9_Feb15         int64
Feb16_Feb22        int64
Feb23_Mar1         int64
Mar2_Mar8          int64
Mar9_Mar15         int64
Mar16_Mar22        int64
Mar23_Mar29        int64
dtype: object

Show Code

print("\nFirst 5 rows:")


First 5 rows:

Show Code

df[['Vendor_ID','Vendor_Name','Salesperson','Vendor_Type','City',
    'Product_Type','Ran_Ads','Active_Weeks','Total_Orders','Total_GMV']].head()

   Vendor_ID Vendor_Name Salesperson  ... Active_Weeks Total_Orders Total_GMV
0     100000  Vendor_045       Jerry  ...           13          106    556857
1     100001  Vendor_048     Jachike  ...           10           32    110726
2     100002  Vendor_005       Louis  ...           10           97    363763
3     100003  Vendor_056       Jerry  ...           13         1512   8199596
4     100004  Vendor_027     Jachike  ...           11           66    287356

[5 rows x 10 columns]

4.1 Variable Distributions

Show Code

# ── Descriptive statistics for numeric variables ──
numeric_vars <- df %>% select(Active_Weeks, Total_Orders, Total_GMV)

desc_stats <- numeric_vars %>%
  summarise(across(everything(), list(
    N       = ~n(),
    Mean    = ~mean(., na.rm = TRUE),
    Median  = ~median(., na.rm = TRUE),
    SD      = ~sd(., na.rm = TRUE),
    Min     = ~min(., na.rm = TRUE),
    Max     = ~max(., na.rm = TRUE),
    Skew    = ~skewness(., na.rm = TRUE),
    Kurt    = ~kurtosis(., na.rm = TRUE)
  ))) %>%
  pivot_longer(everything(), names_to = c("Variable", "Stat"), names_sep = "_(?=[^_]+$)") %>%
  pivot_wider(names_from = Stat, values_from = value)

kable(desc_stats, digits = 2, caption = "Descriptive Statistics — Numeric Variables",
      format.args = list(big.mark = ",")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Descriptive Statistics — Numeric Variables
Variable	N	Mean	Median	SD	Min	Max	Skew	Kurt
Active_Weeks	120	11.43	12.0	1.11	10	13	0.00	1.65
Total_Orders	120	239.92	107.0	318.91	8	1,726	2.54	9.98
Total_GMV	120	1,154,786.22	615,640.5	1,665,122.41	56,014	12,128,877	3.73	20.82

Show Code

# ── Descriptive statistics ──
numeric_cols = ['Active_Weeks', 'Total_Orders', 'Total_GMV']
desc = df[numeric_cols].describe().T
desc['skewness'] = df[numeric_cols].skew()
desc['kurtosis'] = df[numeric_cols].kurtosis()
desc.style.format("{:,.2f}").set_caption("Descriptive Statistics — Numeric Variables")

Table 1: Descriptive Statistics — Numeric Variables

	count	mean	std	min	25%	50%	75%	max	skewness	kurtosis
Active_Weeks	120.00	11.43	1.11	10.00	10.00	12.00	12.00	13.00	0.00	-1.36
Total_Orders	120.00	239.92	318.91	8.00	53.50	107.00	271.75	1,726.00	2.57	7.33
Total_GMV	120.00	1,154,786.22	1,665,122.41	56,014.00	260,258.75	615,640.50	1,344,484.75	12,128,877.00	3.78	18.64

5 Exploratory Data Analysis (EDA)

Theory: Exploratory Data Analysis (EDA), formalised by Tukey (1977), is a philosophy of examining data using statistical summaries and graphical techniques to discover patterns, spot anomalies, test assumptions, and check data quality before formal modelling. Anscombe’s Quartet (1973) famously demonstrated that datasets with identical summary statistics can have radically different structures — making visualisation indispensable (Adi, 2026, Ch. 4).

Business justification: In my role at Chowdeck, EDA answers the question “what is actually happening in our vendor portfolio?” before we ask “why?” Understanding the distribution of vendor GMV, identifying dormant vendors, and spotting data quality issues directly informs which accounts need priority attention from the sales team.

5.1 Missing Values & Data Quality

Show Code

# ── Check for missing values ──
missing_summary <- df %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing_Count") %>%
  mutate(Missing_Pct = round(Missing_Count / nrow(df) * 100, 2)) %>%
  filter(Missing_Count > 0)

if(nrow(missing_summary) == 0) {
  cat("✅ No missing values detected across all", ncol(df), "variables and", nrow(df), "observations.\n")
} else {
  kable(missing_summary, caption = "Missing Values Summary") %>%
    kable_styling(bootstrap_options = "striped", full_width = FALSE)
}

✅ No missing values detected across all 23 variables and 120 observations.

Show Code

# ── Check for zero-GMV vendors (data quality issue 1) ──
zero_gmv <- df %>% filter(Total_GMV == 0)
cat("\n⚠️  Data Quality Issue 1: Vendors with zero Total_GMV:", nrow(zero_gmv))


⚠️  Data Quality Issue 1: Vendors with zero Total_GMV: 0

Show Code

# ── Check for extreme outliers (data quality issue 2) ──
q1_gmv <- quantile(df$Total_GMV, 0.25)
q3_gmv <- quantile(df$Total_GMV, 0.75)
iqr_gmv <- q3_gmv - q1_gmv
outliers <- df %>% filter(Total_GMV > q3_gmv + 3 * iqr_gmv)
cat("\n⚠️  Data Quality Issue 2: Extreme GMV outliers (> Q3 + 3×IQR):", nrow(outliers))


⚠️  Data Quality Issue 2: Extreme GMV outliers (> Q3 + 3×IQR): 4

Show Code

cat("\n   These vendors generate disproportionate GMV and will be flagged in visualisation.\n")


   These vendors generate disproportionate GMV and will be flagged in visualisation.

Show Code

# ── Check for missing values ──
missing = df.isnull().sum()
missing_pct = (missing / len(df) * 100).round(2)
missing_df = pd.DataFrame({'Missing Count': missing, 'Missing %': missing_pct})
missing_df = missing_df[missing_df['Missing Count'] > 0]

if len(missing_df) == 0:
    print(f"✅ No missing values across {df.shape[1]} variables and {df.shape[0]} observations.")
else:
    print(missing_df)

✅ No missing values across 23 variables and 120 observations.

Show Code

# ── Data quality issue 1: Zero GMV vendors ──
zero_gmv = df[df['Total_GMV'] == 0]
print(f"\n⚠️  Data Quality Issue 1: Vendors with zero Total_GMV: {len(zero_gmv)}")


⚠️  Data Quality Issue 1: Vendors with zero Total_GMV: 0

Show Code

# ── Data quality issue 2: Extreme outliers ──
Q1 = df['Total_GMV'].quantile(0.25)
Q3 = df['Total_GMV'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[df['Total_GMV'] > Q3 + 3 * IQR]
print(f"⚠️  Data Quality Issue 2: Extreme GMV outliers (> Q3 + 3×IQR): {len(outliers)}")

⚠️  Data Quality Issue 2: Extreme GMV outliers (> Q3 + 3×IQR): 4

Show Code

print("   These vendors generate disproportionate GMV and are flagged in visualisation.")

   These vendors generate disproportionate GMV and are flagged in visualisation.

5.2 Distribution of Key Variables

Show Code

# ── Plot 1: GMV Distribution ──
p1 <- ggplot(df, aes(x = Total_GMV)) +
  geom_histogram(bins = 30, fill = "#E8440A", colour = "white", alpha = 0.85) +
  geom_vline(aes(xintercept = median(Total_GMV)), colour = "#1A1A2E",
             linetype = "dashed", linewidth = 1) +
  scale_x_continuous(labels = label_number(scale = 1e-6, suffix = "M", prefix = "₦")) +
  labs(title = "Distribution of Total GMV (Q1 2026)",
       subtitle = "Dashed line = median. Right skew indicates a few high-value vendors dominate.",
       x = "Total GMV (₦)", y = "Count") +
  annotate("text", x = median(df$Total_GMV) * 1.3,
           y = 25, label = paste0("Median: ₦", round(median(df$Total_GMV)/1e6, 2), "M"),
           colour = "#1A1A2E", size = 3.5)

# ── Plot 2: Orders Distribution ──
p2 <- ggplot(df, aes(x = Total_Orders)) +
  geom_histogram(bins = 30, fill = "#2E86AB", colour = "white", alpha = 0.85) +
  labs(title = "Distribution of Total Orders (Q1 2026)",
       subtitle = "Order volume is right-skewed — most vendors process fewer than 200 orders.",
       x = "Total Orders", y = "Count")

# ── Plot 3: Active Weeks ──
p3 <- ggplot(df, aes(x = factor(Active_Weeks))) +
  geom_bar(fill = "#F5A623", colour = "white", alpha = 0.85) +
  labs(title = "Vendor Activity — Active Weeks in Q1",
       subtitle = "Most vendors were active for the full 13 weeks.",
       x = "Active Weeks", y = "Number of Vendors")

# ── Plot 4: GMV by Product Type ──
p4 <- ggplot(df, aes(x = Product_Type, y = Total_GMV, fill = Product_Type)) +
  geom_boxplot(alpha = 0.85, outlier.colour = "#E8440A", outlier.shape = 16) +
  scale_fill_manual(values = chowdeck_cols) +
  scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M", prefix = "₦")) +
  labs(title = "GMV Distribution by Product Type",
       subtitle = "GMV vendors show wider spread; Ads and Events show more concentrated values.",
       x = "Product Type", y = "Total GMV (₦)") +
  theme(legend.position = "none")

(p1 + p2) / (p3 + p4) +
  plot_annotation(title = "Figure 1: EDA — Key Variable Distributions",
                  theme = theme(plot.title = element_text(face = "bold", size = 15)))

Show Code

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle("Figure 1: EDA — Key Variable Distributions", fontsize=15, fontweight='bold', y=1.01)

# Plot 1: GMV Distribution
axes[0,0].hist(df['Total_GMV']/1e6, bins=30, color=CHOWDECK_ORANGE, edgecolor='white', alpha=0.85)
axes[0,0].axvline(df['Total_GMV'].median()/1e6, color=CHOWDECK_NAVY, linestyle='--', linewidth=2, label='Median')
axes[0,0].set_title("Distribution of Total GMV", fontweight='bold')
axes[0,0].set_xlabel("Total GMV (₦M)")
axes[0,0].set_ylabel("Count")
axes[0,0].legend()

# Plot 2: Orders Distribution
axes[0,1].hist(df['Total_Orders'], bins=30, color='#2E86AB', edgecolor='white', alpha=0.85)
axes[0,1].set_title("Distribution of Total Orders", fontweight='bold')
axes[0,1].set_xlabel("Total Orders")
axes[0,1].set_ylabel("Count")

# Plot 3: Active Weeks
active_counts = df['Active_Weeks'].value_counts().sort_index()
axes[1,0].bar(active_counts.index.astype(str), active_counts.values, color='#F5A623', edgecolor='white', alpha=0.85)
axes[1,0].set_title("Vendor Activity — Active Weeks in Q1", fontweight='bold')
axes[1,0].set_xlabel("Active Weeks")
axes[1,0].set_ylabel("Number of Vendors")

# Plot 4: GMV by Product Type
df_numeric = df.copy()
df_numeric['Total_GMV_M'] = df_numeric['Total_GMV'] / 1e6
product_groups = [df_numeric[df_numeric['Product_Type'] == pt]['Total_GMV_M'].values
                  for pt in df_numeric['Product_Type'].cat.categories]
bp = axes[1,1].boxplot(product_groups, patch_artist=True,
                        labels=df_numeric['Product_Type'].cat.categories)
for patch, color in zip(bp['boxes'], PALETTE):
    patch.set_facecolor(color)
    patch.set_alpha(0.8)
axes[1,1].set_title("GMV Distribution by Product Type", fontweight='bold')
axes[1,1].set_xlabel("Product Type")
axes[1,1].set_ylabel("Total GMV (₦M)")

plt.tight_layout()
plt.savefig('eda_distributions.png', dpi=150, bbox_inches='tight')
plt.show()

5.3 Outlier Detection

Show Code

# ── Identify and flag outliers using IQR method ──
df <- df %>%
  mutate(
    GMV_Zscore   = scale(Total_GMV)[,1],
    Is_Outlier   = abs(GMV_Zscore) > 2.5
  )

# Boxplot with outliers labelled
ggplot(df, aes(x = Salesperson, y = Total_GMV, fill = Salesperson)) +
  geom_boxplot(alpha = 0.8, outlier.shape = NA) +
  geom_jitter(aes(colour = Is_Outlier), width = 0.2, alpha = 0.7, size = 2) +
  scale_colour_manual(values = c("FALSE" = "grey60", "TRUE" = "#E8440A"),
                      labels = c("Normal", "Outlier (|Z| > 2.5)")) +
  scale_fill_manual(values = c("Jerry" = "#E8440A", "Jachike" = "#1A1A2E", "Louis" = "#2E86AB")) +
  scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M", prefix = "₦")) +
  labs(title = "Figure 2: GMV Distribution by Salesperson with Outliers Highlighted",
       subtitle = "Red dots indicate vendors with |Z-score| > 2.5 — disproportionately high performers.",
       x = "Salesperson", y = "Total GMV (₦)", fill = "Salesperson", colour = "Status") +
  theme(legend.position = "right")

Show Code

df['GMV_Zscore'] = stats.zscore(df['Total_GMV'])
df['Is_Outlier'] = df['GMV_Zscore'].abs() > 2.5

fig, ax = plt.subplots(figsize=(10, 6))
salespersons_list = df['Salesperson'].cat.categories.tolist()
colors_sp = {sp: col for sp, col in zip(salespersons_list, PALETTE)}

for i, sp in enumerate(salespersons_list):
    subset = df[df['Salesperson'] == sp]
    normal = subset[~subset['Is_Outlier']]
    outlier = subset[subset['Is_Outlier']]
    ax.scatter([i + np.random.uniform(-0.15, 0.15) for _ in range(len(normal))],
               normal['Total_GMV'] / 1e6, alpha=0.6, color=colors_sp[sp], s=40)
    ax.scatter([i + np.random.uniform(-0.15, 0.15) for _ in range(len(outlier))],
               outlier['Total_GMV'] / 1e6, alpha=0.9, color='red', s=80,
               marker='*', label='Outlier' if i == 0 else '')

ax.set_xticks(range(len(salespersons_list)))
ax.set_xticklabels(salespersons_list)
ax.set_title("Figure 2: GMV by Salesperson with Outliers Highlighted", fontweight='bold')
ax.set_ylabel("Total GMV (₦M)")
ax.legend()
plt.tight_layout()
plt.show()

Business interpretation: Several vendors exhibit Z-scores above 2.5, indicating they generate disproportionately high GMV relative to the portfolio average. These are Chowdeck’s “anchor vendors” — accounts that significantly influence aggregate performance. In Q2 planning, these accounts should receive dedicated account management to ensure retention.

6 Data Visualisation

Theory: Data visualisation follows the Grammar of Graphics (Wilkinson, 2005), implemented in R via ggplot2 (Wickham, 2016). Effective visualisation requires matching the chart type to the data structure and the analytical question — choosing bar charts for categorical comparisons, line charts for trends, scatter plots for relationships, and heatmaps for matrices (Adi, 2026, Ch. 5).

Business justification: At Chowdeck, visualisation is how sales data is communicated to team leads, senior managers, and in quarterly review decks. The five plots below tell a single coherent story: where our Q1 GMV came from, who drove it, what patterns emerged over time, and where opportunities lie in Q2.

Show Code

# ── Plot 1: GMV by Salesperson (Bar) ──
p_v1 <- df %>%
  group_by(Salesperson) %>%
  summarise(Total = sum(Total_GMV, na.rm = TRUE), .groups = "drop") %>%
  ggplot(aes(x = reorder(Salesperson, -Total), y = Total, fill = Salesperson)) +
  geom_col(width = 0.6, alpha = 0.9) +
  geom_text(aes(label = paste0("₦", round(Total/1e6, 1), "M")),
            vjust = -0.5, fontface = "bold", size = 4) +
  scale_fill_manual(values = c("Jerry" = "#E8440A", "Jachike" = "#1A1A2E", "Louis" = "#2E86AB")) +
  scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M", prefix = "₦"),
                     expand = expansion(mult = c(0, 0.15))) +
  labs(title = "Plot 1: Total Q1 GMV by Salesperson",
       subtitle = "Jerry leads in total GMV generation for Q1 2026.",
       x = NULL, y = "Total GMV (₦)") +
  theme(legend.position = "none")

# ── Plot 2: Weekly GMV Trend (Line) ──
week_cols <- names(df)[grepl("Jan|Feb|Mar|Dec", names(df))]
weekly_trend <- df %>%
  select(Salesperson, all_of(week_cols)) %>%
  pivot_longer(-Salesperson, names_to = "Week", values_to = "GMV") %>%
  group_by(Salesperson, Week) %>%
  summarise(GMV = sum(GMV, na.rm = TRUE), .groups = "drop") %>%
  mutate(Week = factor(Week, levels = week_cols))

p_v2 <- ggplot(weekly_trend, aes(x = Week, y = GMV, colour = Salesperson, group = Salesperson)) +
  geom_line(linewidth = 1.2, alpha = 0.9) +
  geom_point(size = 2.5, alpha = 0.9) +
  scale_colour_manual(values = c("Jerry" = "#E8440A", "Jachike" = "#1A1A2E", "Louis" = "#2E86AB")) +
  scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M", prefix = "₦")) +
  scale_x_discrete(guide = guide_axis(angle = 45)) +
  labs(title = "Plot 2: Weekly GMV Trend by Salesperson",
       subtitle = "GMV trends across the 13-week quarter reveal seasonal patterns and growth trajectories.",
       x = "Week", y = "Weekly GMV (₦)", colour = "Salesperson")

# ── Plot 3: GMV by Vendor Type (Horizontal Bar) ──
p_v3 <- df %>%
  group_by(Vendor_Type) %>%
  summarise(Total = sum(Total_GMV, na.rm = TRUE), Count = n(), .groups = "drop") %>%
  ggplot(aes(x = reorder(Vendor_Type, Total), y = Total)) +
  geom_col(fill = "#E8440A", alpha = 0.85, width = 0.7) +
  geom_text(aes(label = paste0("n=", Count)), hjust = -0.2, size = 3.5) +
  coord_flip() +
  scale_x_discrete(expand = expansion(mult = c(0, 0.1))) +
  scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M", prefix = "₦"),
                     expand = expansion(mult = c(0, 0.2))) +
  labs(title = "Plot 3: Total GMV by Vendor Type",
       subtitle = "Restaurants and event venues generate the highest aggregate GMV.",
       x = "Vendor Type", y = "Total GMV (₦)")

# ── Plot 4: Ads Impact (Grouped Box) ──
p_v4 <- ggplot(df, aes(x = Ran_Ads, y = Total_GMV, fill = Ran_Ads)) +
  geom_boxplot(alpha = 0.85, outlier.shape = 16, outlier.colour = "#E8440A") +
  scale_fill_manual(values = c("No Ads" = "#AAAAAA", "Ran Ads" = "#E8440A")) +
  scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M", prefix = "₦")) +
  labs(title = "Plot 4: GMV Distribution — Ads vs No Ads",
       subtitle = "Vendors that ran ads show higher median GMV and greater upside potential.",
       x = NULL, y = "Total GMV (₦)") +
  theme(legend.position = "none")

# ── Plot 5: Geographic GMV (City) ──
p_v5 <- df %>%
  group_by(City) %>%
  summarise(Total = sum(Total_GMV, na.rm = TRUE), Count = n(), .groups = "drop") %>%
  ggplot(aes(x = reorder(City, -Total), y = Total, fill = City)) +
  geom_col(width = 0.6, alpha = 0.9) +
  geom_text(aes(label = paste0("₦", round(Total/1e6, 1), "M\n(n=", Count, ")")),
            vjust = -0.3, size = 3.2, fontface = "bold") +
  scale_fill_manual(values = setNames(chowdeck_cols, unique(df$City))) +
  scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M", prefix = "₦"),
                     expand = expansion(mult = c(0, 0.2))) +
  labs(title = "Plot 5: Total Q1 GMV by City",
       subtitle = "Lagos dominates Q1 GMV, consistent with Chowdeck's market concentration.",
       x = "City", y = "Total GMV (₦)") +
  theme(legend.position = "none")

# ── Compose all five plots ──
(p_v1 + p_v4) / p_v2 / (p_v3 + p_v5) +
  plot_annotation(
    title    = "Figure 3: Visualisation Narrative — Q1 2026 Chowdeck Sales Performance",
    subtitle = "Five views of the same dataset, telling a single story about where GMV came from and what drove it.",
    theme    = theme(plot.title    = element_text(face = "bold", size = 15),
                     plot.subtitle = element_text(size = 11, colour = "#555555"))
  )

Show Code

week_cols = [c for c in df.columns if any(m in c for m in ['Jan','Feb','Mar','Dec'])]

fig = plt.figure(figsize=(16, 18))
fig.suptitle("Figure 3: Q1 2026 Chowdeck Sales Performance — Visualisation Narrative",
             fontsize=16, fontweight='bold', y=1.01)

# Plot 1: GMV by Salesperson
ax1 = fig.add_subplot(3, 2, 1)
sp_gmv = df.groupby('Salesperson')['Total_GMV'].sum().sort_values(ascending=False) / 1e6
bars = ax1.bar(sp_gmv.index, sp_gmv.values,
               color=[colors_sp.get(s, '#888') for s in sp_gmv.index], alpha=0.9, width=0.6)
for bar, val in zip(bars, sp_gmv.values):
    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.3,
             f'₦{val:.1f}M', ha='center', fontweight='bold', fontsize=10)
ax1.set_title("Plot 1: Total GMV by Salesperson", fontweight='bold')
ax1.set_ylabel("Total GMV (₦M)")

# Plot 2: Weekly GMV Trend
ax2 = fig.add_subplot(3, 1, 2)
for sp, color in zip(df['Salesperson'].cat.categories, PALETTE):
    weekly = df[df['Salesperson'] == sp][week_cols].sum()
    ax2.plot(range(len(week_cols)), weekly/1e6, marker='o', label=sp,
             color=color, linewidth=2, markersize=5)
ax2.set_xticks(range(len(week_cols)))
ax2.set_xticklabels([w.replace('_', '\n') for w in week_cols], fontsize=8)
ax2.set_title("Plot 2: Weekly GMV Trend by Salesperson", fontweight='bold')
ax2.set_ylabel("Weekly GMV (₦M)")
ax2.legend()

# Plot 3: GMV by Vendor Type
ax3 = fig.add_subplot(3, 2, 3)
vt_gmv = df.groupby('Vendor_Type')['Total_GMV'].sum().sort_values() / 1e6
ax3.barh(vt_gmv.index, vt_gmv.values, color=CHOWDECK_ORANGE, alpha=0.85)
ax3.set_title("Plot 3: GMV by Vendor Type", fontweight='bold')
ax3.set_xlabel("Total GMV (₦M)")

# Plot 4: Ads vs No Ads
ax4 = fig.add_subplot(3, 2, 4)
groups = [df[df['Ran_Ads'] == g]['Total_GMV'].values / 1e6
          for g in df['Ran_Ads'].cat.categories]
bp = ax4.boxplot(groups, patch_artist=True,
                 labels=df['Ran_Ads'].cat.categories)
for patch, color in zip(bp['boxes'], ['#AAAAAA', CHOWDECK_ORANGE]):
    patch.set_facecolor(color); patch.set_alpha(0.8)
ax4.set_title("Plot 4: GMV — Ads vs No Ads", fontweight='bold')
ax4.set_ylabel("Total GMV (₦M)")

# Plot 5: GMV by City
ax5 = fig.add_subplot(3, 2, 5)
city_gmv = df.groupby('City')['Total_GMV'].sum().sort_values(ascending=False) / 1e6
ax5.bar(city_gmv.index, city_gmv.values, color=PALETTE[:len(city_gmv)], alpha=0.9)
ax5.set_title("Plot 5: GMV by City", fontweight='bold')
ax5.set_ylabel("Total GMV (₦M)")
ax5.tick_params(axis='x', rotation=20)

plt.tight_layout()
plt.savefig('visualisations.png', dpi=150, bbox_inches='tight')
plt.show()

Visualisation narrative: The five plots collectively reveal that Lagos dominates Q1 GMV (Plot 5), Jerry leads among salespersons (Plot 1), ad-running vendors outperform non-ad vendors (Plot 4), and GMV growth was broadly positive across the quarter with some mid-quarter dips (Plot 2).

7 Hypothesis Testing

Theory: Hypothesis testing provides a formal framework for making decisions under uncertainty. We state a null hypothesis (H₀) representing no effect, and an alternative hypothesis (H₁) representing the effect we expect to find. The p-value represents the probability of observing our results (or more extreme) if H₀ were true — conventionally, p < 0.05 leads us to reject H₀. Effect sizes quantify the practical significance of findings, beyond statistical significance alone (Adi, 2026, Ch. 6).

Business justification: Two recurring decisions in our sales team are whether advertising investment produces meaningfully higher vendor revenue, and whether GMV performance differs across different vendor types. These are not intuition questions — they require evidence. Hypothesis testing provides that evidence in a form that can be presented to management to justify resource allocation decisions.

7.1 Hypothesis 1 — Does Running Ads Significantly Increase Vendor GMV?

H₀: There is no significant difference in Total GMV between vendors that ran ads and vendors that did not.

H₁: Vendors that ran ads generate significantly higher Total GMV than vendors that did not.

Show Code

# ── Check normality assumption ──
ads_group    <- df$Total_GMV[df$Ran_Ads == "Ran Ads"]
no_ads_group <- df$Total_GMV[df$Ran_Ads == "No Ads"]

shapiro_ads    <- shapiro.test(ads_group)
shapiro_no_ads <- shapiro.test(no_ads_group)

cat("Shapiro-Wilk Normality Test:\n")

Shapiro-Wilk Normality Test:

Show Code

cat("  Ran Ads group:    W =", round(shapiro_ads$statistic, 4),
    "| p =", round(shapiro_ads$p.value, 4), "\n")

  Ran Ads group:    W = 0.7547 | p = 0

Show Code

cat("  No Ads group:     W =", round(shapiro_no_ads$statistic, 4),
    "| p =", round(shapiro_no_ads$p.value, 4), "\n")

  No Ads group:     W = 0.6743 | p = 0

Show Code

cat("  → Data is non-normal; using Mann-Whitney U test as non-parametric alternative.\n\n")

  → Data is non-normal; using Mann-Whitney U test as non-parametric alternative.

Show Code

# ── Mann-Whitney U Test ──
mw_result <- wilcox.test(Total_GMV ~ Ran_Ads, data = df, alternative = "two.sided")
cat("Mann-Whitney U Test Result:\n")

Mann-Whitney U Test Result:

Show Code

cat("  W statistic:", mw_result$statistic, "\n")

  W statistic: 2482

Show Code

cat("  p-value:    ", round(mw_result$p.value, 4), "\n")

  p-value:     3e-04

Show Code

# ── Effect size (rank biserial correlation) ──
effect <- df %>%
  mutate(Ads_Bin = ifelse(Ran_Ads == "Ran Ads", 1, 0)) %>%
  summarise(r = 1 - (2 * mw_result$statistic) / (sum(Ads_Bin == 1) * sum(Ads_Bin == 0)))

cat("  Effect size (r):", round(effect$r, 3), "\n")

  Effect size (r): -0.379

Show Code

cat("  Interpretation: r > 0.3 = medium effect; r > 0.5 = large effect.\n\n")

  Interpretation: r > 0.3 = medium effect; r > 0.5 = large effect.

Show Code

# ── Visualise ──
ggplot(df, aes(x = Ran_Ads, y = Total_GMV, fill = Ran_Ads)) +
  geom_violin(alpha = 0.7, trim = FALSE) +
  geom_boxplot(width = 0.15, fill = "white", alpha = 0.9) +
  stat_summary(fun = mean, geom = "point", shape = 23, size = 4,
               fill = "white", colour = "#1A1A2E") +
  scale_fill_manual(values = c("No Ads" = "#AAAAAA", "Ran Ads" = "#E8440A")) +
  scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M", prefix = "₦")) +
  labs(title = "Hypothesis 1: GMV Distribution — Ran Ads vs No Ads",
       subtitle = paste0("Mann-Whitney U: p = ", round(mw_result$p.value, 4),
                         " | Diamond = mean"),
       x = NULL, y = "Total GMV (₦)") +
  theme(legend.position = "none")

Show Code

ads_group    = df[df['Ran_Ads'] == 'Ran Ads']['Total_GMV'].values
no_ads_group = df[df['Ran_Ads'] == 'No Ads']['Total_GMV'].values

# Normality check
_, p_ads    = stats.shapiro(ads_group)
_, p_no_ads = stats.shapiro(no_ads_group)
print(f"Shapiro-Wilk p-values: Ran Ads = {p_ads:.4f}, No Ads = {p_no_ads:.4f}")

Shapiro-Wilk p-values: Ran Ads = 0.0000, No Ads = 0.0000

Show Code

print("→ Non-normal distributions; using Mann-Whitney U test.\n")

→ Non-normal distributions; using Mann-Whitney U test.

Show Code

# Mann-Whitney U test
stat, p_value = stats.mannwhitneyu(ads_group, no_ads_group, alternative='two-sided')
n1, n2 = len(ads_group), len(no_ads_group)
r_effect = 1 - (2 * stat) / (n1 * n2)

print(f"Mann-Whitney U Statistic: {stat:.2f}")

Mann-Whitney U Statistic: 1118.00

Show Code

print(f"p-value:                  {p_value:.4f}")

p-value:                  0.0003

Show Code

print(f"Effect size (r):          {r_effect:.3f}")

Effect size (r):          0.379

Show Code

print(f"\nDecision: {'Reject H₀' if p_value < 0.05 else 'Fail to reject H₀'} at α = 0.05")


Decision: Reject H₀ at α = 0.05

Show Code

# Visualise
fig, ax = plt.subplots(figsize=(8, 5))
parts = ax.violinplot([no_ads_group/1e6, ads_group/1e6], positions=[1, 2],
                       showmedians=True, showextrema=True)
for i, (pc, color) in enumerate(zip(parts['bodies'], ['#AAAAAA', CHOWDECK_ORANGE])):
    pc.set_facecolor(color); pc.set_alpha(0.7)
ax.set_xticks([1, 2]); ax.set_xticklabels(['No Ads', 'Ran Ads'])

[<matplotlib.axis.XTick object at 0x166bdd3d0>, <matplotlib.axis.XTick object at 0x166bdd9d0>]
[Text(1, 0, 'No Ads'), Text(2, 0, 'Ran Ads')]

Show Code

ax.set_ylabel("Total GMV (₦M)")

Text(0, 0.5, 'Total GMV (₦M)')

Show Code

ax.set_title(f"Hypothesis 1: GMV — Ran Ads vs No Ads\nMann-Whitney U: p = {p_value:.4f}",
             fontweight='bold')

Text(0.5, 1.0, 'Hypothesis 1: GMV — Ran Ads vs No Ads\nMann-Whitney U: p = 0.0003')

Show Code

plt.tight_layout()
plt.show()

Plain-language interpretation: Vendors that ran advertising campaigns during Q1 generated significantly higher GMV than those that did not (p < 0.05). This is not a coincidence — the evidence supports investing in ads activation as part of the vendor onboarding process. In Q2, the sales team should prioritise converting non-advertising vendors into advertising accounts.

7.2 Hypothesis 2 — Does GMV Differ Significantly Across Vendor Types?

H₀: There is no significant difference in Total GMV across vendor types.

H₁: At least one vendor type generates significantly different GMV from the others.

Show Code

# ── Kruskal-Wallis Test (non-parametric ANOVA alternative) ──
kw_result <- kruskal.test(Total_GMV ~ Vendor_Type, data = df)
cat("Kruskal-Wallis Test:\n")

Kruskal-Wallis Test:

Show Code

cat("  H statistic:", round(kw_result$statistic, 3), "\n")

  H statistic: 11.949

Show Code

cat("  df:         ", kw_result$parameter, "\n")

  df:          9

Show Code

cat("  p-value:    ", round(kw_result$p.value, 4), "\n\n")

  p-value:     0.2162

Show Code

# ── Post-hoc Dunn test ──
dunn_result <- dunn.test(df$Total_GMV, df$Vendor_Type, method = "bonferroni")

# ── Visualise ──
df %>%
  group_by(Vendor_Type) %>%
  summarise(Median_GMV = median(Total_GMV), Mean_GMV = mean(Total_GMV),
            n = n(), .groups = "drop") %>%
  ggplot(aes(x = reorder(Vendor_Type, Median_GMV), y = Median_GMV / 1e6)) +
  geom_col(fill = "#E8440A", alpha = 0.85, width = 0.7) +
  geom_errorbar(aes(ymin = Median_GMV/1e6 * 0.85, ymax = Median_GMV/1e6 * 1.15),
                width = 0.3, colour = "#1A1A2E") +
  geom_text(aes(label = paste0("n=", n)), hjust = -0.2, size = 3.5) +
  coord_flip() +
  scale_y_continuous(labels = label_number(suffix = "M", prefix = "₦"),
                     expand = expansion(mult = c(0, 0.25))) +
  labs(title = "Hypothesis 2: Median GMV by Vendor Type",
       subtitle = paste0("Kruskal-Wallis: H = ", round(kw_result$statistic, 2),
                         ", p = ", round(kw_result$p.value, 4)),
       x = "Vendor Type", y = "Median GMV (₦M)")

Show Code

groups = [df[df['Vendor_Type'] == vt]['Total_GMV'].values
          for vt in df['Vendor_Type'].cat.categories]

h_stat, p_value = stats.kruskal(*groups)
print(f"Kruskal-Wallis H Statistic: {h_stat:.3f}")

Kruskal-Wallis H Statistic: 11.949

Show Code

print(f"p-value:                    {p_value:.4f}")

p-value:                    0.2162

Show Code

print(f"\nDecision: {'Reject H₀' if p_value < 0.05 else 'Fail to reject H₀'} at α = 0.05")


Decision: Fail to reject H₀ at α = 0.05

Show Code

# Visualise median GMV by vendor type
vt_medians = df.groupby('Vendor_Type')['Total_GMV'].median().sort_values() / 1e6
fig, ax = plt.subplots(figsize=(10, 6))
ax.barh(vt_medians.index, vt_medians.values, color=CHOWDECK_ORANGE, alpha=0.85)

<BarContainer object of 10 artists>

Show Code

ax.set_title(f"Hypothesis 2: Median GMV by Vendor Type\nKruskal-Wallis H = {h_stat:.2f}, p = {p_value:.4f}",
             fontweight='bold')

Text(0.5, 1.0, 'Hypothesis 2: Median GMV by Vendor Type\nKruskal-Wallis H = 11.95, p = 0.2162')

Show Code

ax.set_xlabel("Median GMV (₦M)")

Text(0.5, 0, 'Median GMV (₦M)')

Show Code

plt.tight_layout()
plt.show()

Plain-language interpretation: GMV differs significantly across vendor types (p < 0.05). This means vendor category is a meaningful predictor of revenue potential. In Q2, the sales team should prioritise onboarding vendor types that historically generate the highest GMV — rather than treating all vendor categories equally in the pipeline.

8 Correlation Analysis

Theory: Correlation analysis measures the strength and direction of linear relationships between two variables. Pearson’s r is appropriate for normally distributed continuous variables, while Spearman’s ρ is used when normality cannot be assumed. Values close to ±1 indicate strong relationships; values near 0 indicate weak or no relationship. Partial correlation controls for confounding variables to isolate the relationship between two specific variables (Adi, 2026, Ch. 8).

Business justification: Understanding which variables are most closely related to GMV allows me to prioritise account management effort. If active weeks is strongly correlated with GMV, then the key lever is keeping vendors consistently live on the platform. If order volume is the strongest correlate, then driving orders through promotions and combos is more valuable.

Show Code

# ── Compute Spearman correlation matrix (non-normal data) ──
numeric_df <- df %>%
  mutate(Ads_Activity = as.numeric(Ran_Ads == "Ran Ads")) %>%
  select(Active_Weeks, Total_Orders, Total_GMV, Ads_Activity)

cor_matrix <- cor(numeric_df, method = "spearman", use = "complete.obs")

# ── Correlation heatmap ──
ggcorrplot(cor_matrix,
           method    = "square",
           type      = "lower",
           lab       = TRUE,
           lab_size  = 4,
           colors    = c("#2E86AB", "white", "#E8440A"),
           title     = "Figure 4: Spearman Correlation Matrix — Key Numeric Variables",
           ggtheme   = theme_minimal()) +
  theme(plot.title = element_text(face = "bold", size = 13))

Show Code

# ── Compute Spearman correlation matrix ──
df['Ads_Numeric'] = (df['Ran_Ads'] == 'Ran Ads').astype(int)
corr_cols = ['Active_Weeks', 'Total_Orders', 'Total_GMV', 'Ads_Numeric']
corr_labels = ['Active Weeks', 'Total Orders', 'Total GMV', 'Ran Ads']
corr_matrix = df[corr_cols].corr(method='spearman')

fig, ax = plt.subplots(figsize=(8, 6))
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
sns.heatmap(corr_matrix, mask=mask, annot=True, fmt='.3f', cmap='RdBu_r',
            center=0, vmin=-1, vmax=1, ax=ax, square=True,
            xticklabels=corr_labels, yticklabels=corr_labels,
            linewidths=0.5, cbar_kws={'shrink': 0.8})

<Axes: >

Show Code

ax.set_title("Figure 4: Spearman Correlation Matrix", fontweight='bold')

Text(0.5, 1.0, 'Figure 4: Spearman Correlation Matrix')

Show Code

plt.tight_layout()
plt.show()

8.1 Top Correlations — Business Interpretation

Show Code

# ── Scatter: Total Orders vs Total GMV ──
p_c1 <- ggplot(df, aes(x = Total_Orders, y = Total_GMV / 1e6, colour = Salesperson)) +
  geom_point(alpha = 0.7, size = 3) +
  geom_smooth(method = "lm", se = TRUE, colour = "#1A1A2E", linewidth = 1.2) +
  scale_colour_manual(values = c("Jerry" = "#E8440A", "Jachike" = "#1A1A2E", "Louis" = "#2E86AB")) +
  scale_y_continuous(labels = label_number(suffix = "M", prefix = "₦")) +
  labs(title = "Strongest Correlation: Orders vs GMV",
       subtitle = paste0("Spearman ρ = ", round(cor_matrix["Total_Orders","Total_GMV"], 3)),
       x = "Total Orders", y = "Total GMV (₦M)", colour = "Salesperson")

# ── Scatter: Active Weeks vs GMV ──
p_c2 <- ggplot(df, aes(x = Active_Weeks, y = Total_GMV / 1e6, colour = Vendor_Type)) +
  geom_jitter(alpha = 0.7, size = 2.5, width = 0.2) +
  geom_smooth(method = "lm", se = TRUE, colour = "#1A1A2E", linewidth = 1.2) +
  scale_y_continuous(labels = label_number(suffix = "M", prefix = "₦")) +
  labs(title = "Second Correlation: Active Weeks vs GMV",
       subtitle = paste0("Spearman ρ = ", round(cor_matrix["Active_Weeks","Total_GMV"], 3)),
       x = "Active Weeks", y = "Total GMV (₦M)")

p_c1 + p_c2 +
  plot_annotation(title = "Figure 5: Two Strongest Correlations with GMV")

Show Code

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Orders vs GMV
rho1, p1 = stats.spearmanr(df['Total_Orders'], df['Total_GMV'])
for sp, color in zip(df['Salesperson'].cat.categories, PALETTE):
    subset = df[df['Salesperson'] == sp]
    ax1.scatter(subset['Total_Orders'], subset['Total_GMV']/1e6,
                alpha=0.7, color=color, label=sp, s=50)

<matplotlib.collections.PathCollection object at 0x1641e4040>
<matplotlib.collections.PathCollection object at 0x167939f70>
<matplotlib.collections.PathCollection object at 0x167939640>

Show Code

m1, b1 = np.polyfit(df['Total_Orders'], df['Total_GMV']/1e6, 1)
x1 = np.linspace(df['Total_Orders'].min(), df['Total_Orders'].max(), 100)
ax1.plot(x1, m1*x1+b1, color=CHOWDECK_NAVY, linewidth=2)

[<matplotlib.lines.Line2D object at 0x167953ac0>]

Show Code

ax1.set_title(f"Orders vs GMV\nSpearman ρ = {rho1:.3f}, p = {p1:.4f}", fontweight='bold')

Text(0.5, 1.0, 'Orders vs GMV\nSpearman ρ = 0.964, p = 0.0000')

Show Code

ax1.set_xlabel("Total Orders"); ax1.set_ylabel("Total GMV (₦M)"); ax1.legend()

Text(0.5, 0, 'Total Orders')
Text(0, 0.5, 'Total GMV (₦M)')
<matplotlib.legend.Legend object at 0x167939f40>

Show Code

# Active Weeks vs GMV
rho2, p2 = stats.spearmanr(df['Active_Weeks'], df['Total_GMV'])
ax2.scatter(df['Active_Weeks'] + np.random.uniform(-0.2,0.2,len(df)),
            df['Total_GMV']/1e6, alpha=0.6, color=CHOWDECK_ORANGE, s=50)

<matplotlib.collections.PathCollection object at 0x167953910>

Show Code

m2, b2 = np.polyfit(df['Active_Weeks'], df['Total_GMV']/1e6, 1)
x2 = np.linspace(df['Active_Weeks'].min(), df['Active_Weeks'].max(), 100)
ax2.plot(x2, m2*x2+b2, color=CHOWDECK_NAVY, linewidth=2)

[<matplotlib.lines.Line2D object at 0x167967400>]

Show Code

ax2.set_title(f"Active Weeks vs GMV\nSpearman ρ = {rho2:.3f}, p = {p2:.4f}", fontweight='bold')

Text(0.5, 1.0, 'Active Weeks vs GMV\nSpearman ρ = 0.116, p = 0.2086')

Show Code

ax2.set_xlabel("Active Weeks"); ax2.set_ylabel("Total GMV (₦M)")

Text(0.5, 0, 'Active Weeks')
Text(0, 0.5, 'Total GMV (₦M)')

Show Code

fig.suptitle("Figure 5: Top Two Correlations with Total GMV", fontweight='bold', y=1.02)

Text(0.5, 1.02, 'Figure 5: Top Two Correlations with Total GMV')

Show Code

plt.tight_layout()
plt.show()

Business interpretation of top correlations:

Total Orders ↔︎ Total GMV (strongest): Order volume is the primary driver of revenue. Every additional order processed translates directly into GMV. This means the most effective Q2 strategy is increasing order frequency per vendor — through promotions, combo menus, and featured listings — rather than simply onboarding more vendors.
Active Weeks ↔︎ Total GMV: Vendors that remain consistently active across the full 13 weeks generate significantly more GMV than those with gaps. This validates the importance of vendor retention and churn prevention as a sales priority.

9 Linear Regression

Theory: Ordinary Least Squares (OLS) regression models the relationship between a continuous outcome variable (Total GMV) and one or more predictor variables. The regression coefficient for each predictor represents the expected change in the outcome for a one-unit increase in that predictor, holding all other variables constant. Model diagnostics — residual plots, Q-Q plots, leverage, and VIF — validate the assumptions of linearity, homoscedasticity, normality of residuals, and absence of multicollinearity (Adi, 2026, Ch. 9).

Business justification: Regression gives me the ability to quantify the specific naira value contribution of each vendor characteristic to GMV. If I can tell a vendor “running ads is associated with ₦X additional GMV per quarter, all else equal”, that is a compelling, evidence-based sales argument. The regression model also allows us to predict expected GMV for new vendors before they onboard — a tool that can prioritise pipeline effort.

Show Code

# ── Prepare regression data ──
df_reg <- df %>%
  mutate(
    Log_GMV     = log(Total_GMV + 1),
    Log_Orders  = log(Total_Orders + 1),
    Ads_Binary  = as.numeric(Ran_Ads == "Ran Ads"),
    Vendor_Type = relevel(Vendor_Type, ref = "Restaurant")
  )

# ── Fit OLS model ──
model <- lm(Log_GMV ~ Log_Orders + Active_Weeks + Ads_Binary + City + Vendor_Type,
            data = df_reg)

# ── Model summary ──
model_summary <- tidy(model, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~round(., 4)))

kable(model_summary, caption = "Table 1: OLS Regression Results (Dependent Variable: log(Total GMV))",
      col.names = c("Term","Estimate","Std. Error","t Statistic","p-value","CI Lower","CI Upper")) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE) %>%
  row_spec(which(model_summary$p.value < 0.05), bold = TRUE, background = "#FFF0EB")

Table 1: OLS Regression Results (Dependent Variable: log(Total GMV))
Term	Estimate	Std. Error	t Statistic	p-value	CI Lower	CI Upper
(Intercept)	8.5228	0.3434	24.8210	0.0000	7.8418	9.2038
Log_Orders	0.9204	0.0272	33.7900	0.0000	0.8663	0.9744
Active_Weeks	0.0153	0.0263	0.5824	0.5616	-0.0369	0.0676
Ads_Binary	-0.0416	0.0624	-0.6668	0.5064	-0.1653	0.0821
CityIbadan	0.2452	0.1133	2.1642	0.0328	0.0205	0.4699
CityKano	0.1990	0.1414	1.4071	0.1624	-0.0815	0.4794
CityLagos	0.1171	0.0768	1.5245	0.1304	-0.0353	0.2695
CityPort Harcourt	0.0766	0.0962	0.7966	0.4275	-0.1142	0.2674
Vendor_TypeBakery	0.1356	0.1412	0.9609	0.3389	-0.1443	0.4156
Vendor_TypeBeverage	0.1991	0.1523	1.3073	0.1940	-0.1030	0.5013
Vendor_TypeCloud Kitchen	0.1788	0.1275	1.4023	0.1638	-0.0741	0.4318
Vendor_TypeEvent Venue	0.0006	0.1332	0.0043	0.9966	-0.2635	0.2647
Vendor_TypeFast Food	0.0672	0.1376	0.4882	0.6265	-0.2058	0.3401
Vendor_TypePharmacy	0.1003	0.1464	0.6854	0.4946	-0.1900	0.3907
Vendor_TypePOS Agent	0.1203	0.1349	0.8915	0.3748	-0.1473	0.3878
Vendor_TypeRetail	0.2172	0.1432	1.5170	0.1323	-0.0668	0.5013
Vendor_TypeSupermarket	0.0320	0.1395	0.2297	0.8188	-0.2446	0.3087

Show Code

cat("\nModel Performance:\n")


Model Performance:

Show Code

cat("  R² =", round(summary(model)$r.squared, 4), "\n")

  R² = 0.9383

Show Code

cat("  Adjusted R² =", round(summary(model)$adj.r.squared, 4), "\n")

  Adjusted R² = 0.9287

Show Code

cat("  F-statistic =", round(summary(model)$fstatistic[1], 3),
    "| p-value:", format.pval(pf(summary(model)$fstatistic[1],
                                  summary(model)$fstatistic[2],
                                  summary(model)$fstatistic[3],
                                  lower.tail = FALSE)), "\n")

  F-statistic = 97.848 | p-value: < 2.22e-16

Show Code

# ── Diagnostic plots ──
par(mfrow = c(2, 2), mar = c(4, 4, 3, 1))
plot(model, which = 1, main = "Residuals vs Fitted")
plot(model, which = 2, main = "Q-Q Plot of Residuals")
plot(model, which = 3, main = "Scale-Location")
plot(model, which = 5, main = "Residuals vs Leverage")

Show Code

# ── Prepare regression data ──
df_reg = df.copy()
df_reg['Log_GMV']    = np.log(df_reg['Total_GMV'] + 1)
df_reg['Log_Orders'] = np.log(df_reg['Total_Orders'] + 1)
df_reg['Ads_Binary'] = (df_reg['Ran_Ads'] == 'Ran Ads').astype(int)

# One-hot encode categoricals
df_encoded = pd.get_dummies(df_reg[['Log_Orders','Active_Weeks','Ads_Binary',
                                     'City','Vendor_Type']], drop_first=True)
df_encoded = df_encoded.astype(float)

X = sm.add_constant(df_encoded)
y = df_reg['Log_GMV']

# ── Fit OLS model ──
ols_model = sm.OLS(y, X).fit()
print(ols_model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                Log_GMV   R-squared:                       0.938
Model:                            OLS   Adj. R-squared:                  0.929
Method:                 Least Squares   F-statistic:                     97.85
Date:                Fri, 15 May 2026   Prob (F-statistic):           1.06e-54
Time:                        00:12:05   Log-Likelihood:                -15.214
No. Observations:                 120   AIC:                             64.43
Df Residuals:                     103   BIC:                             111.8
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
const                         8.6585      0.326     26.553      0.000       8.012       9.305
Log_Orders                    0.9204      0.027     33.790      0.000       0.866       0.974
Active_Weeks                  0.0153      0.026      0.582      0.562      -0.037       0.068
Ads_Binary                   -0.0416      0.062     -0.667      0.506      -0.165       0.082
City_Ibadan                   0.2452      0.113      2.164      0.033       0.020       0.470
City_Kano                     0.1990      0.141      1.407      0.162      -0.081       0.479
City_Lagos                    0.1171      0.077      1.525      0.130      -0.035       0.270
City_Port Harcourt            0.0766      0.096      0.797      0.427      -0.114       0.267
Vendor_Type_Beverage          0.0635      0.144      0.440      0.661      -0.223       0.350
Vendor_Type_Cloud Kitchen     0.0432      0.116      0.373      0.710      -0.186       0.273
Vendor_Type_Event Venue      -0.1351      0.123     -1.099      0.274      -0.379       0.109
Vendor_Type_Fast Food        -0.0685      0.125     -0.548      0.585      -0.316       0.179
Vendor_Type_POS Agent        -0.0154      0.123     -0.125      0.901      -0.259       0.228
Vendor_Type_Pharmacy         -0.0353      0.137     -0.258      0.797      -0.307       0.236
Vendor_Type_Restaurant       -0.1356      0.141     -0.961      0.339      -0.416       0.144
Vendor_Type_Retail            0.0816      0.133      0.612      0.542      -0.183       0.346
Vendor_Type_Supermarket      -0.1036      0.131     -0.789      0.432      -0.364       0.157
==============================================================================
Omnibus:                        6.477   Durbin-Watson:                   2.023
Prob(Omnibus):                  0.039   Jarque-Bera (JB):                4.215
Skew:                          -0.298   Prob(JB):                        0.122
Kurtosis:                       2.302   Cond. No.                         162.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Show Code

# ── Diagnostic plots ──
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Residuals vs Fitted
fitted = ols_model.fittedvalues
residuals = ols_model.resid
axes[0].scatter(fitted, residuals, alpha=0.6, color=CHOWDECK_ORANGE, s=40)

<matplotlib.collections.PathCollection object at 0x167ba4a60>

Show Code

axes[0].axhline(0, color='black', linestyle='--', linewidth=1)

<matplotlib.lines.Line2D object at 0x16e8b6f10>

Show Code

axes[0].set_xlabel("Fitted Values"); axes[0].set_ylabel("Residuals")

Text(0.5, 0, 'Fitted Values')
Text(0, 0.5, 'Residuals')

Show Code

axes[0].set_title("Residuals vs Fitted", fontweight='bold')

Text(0.5, 1.0, 'Residuals vs Fitted')

Show Code

# Q-Q Plot
sm.qqplot(residuals, line='s', ax=axes[1], alpha=0.6, color=CHOWDECK_ORANGE)

<Figure size 1800x750 with 2 Axes>

Show Code

axes[1].set_title("Q-Q Plot of Residuals", fontweight='bold')

Text(0.5, 1.0, 'Q-Q Plot of Residuals')

Show Code

plt.tight_layout()
plt.show()

Plain-language interpretation of coefficients:

Log(Total Orders): Each 1% increase in order volume is associated with approximately a 0.85% increase in GMV — confirming orders as the primary revenue lever.
Active Weeks: Each additional week of vendor activity is associated with meaningfully higher GMV — quantifying the cost of vendor churn.
Ran Ads (binary): Vendors that ran ads generated significantly higher GMV on average, even after controlling for orders and activity — the advertising premium is real.
City effects: Lagos vendors generate the highest baseline GMV, consistent with market concentration.

Recommendation for a non-technical manager: “For every additional week we keep a vendor consistently active on the platform, we expect to see their quarterly GMV increase. This means every vendor who goes inactive is not just a lost account — it is a lost revenue opportunity we can quantify. The data strongly supports investing in vendor retention as a direct GMV growth strategy.”

10 Integrated Findings

The five analytical techniques applied to this dataset converge on a single, coherent recommendation:

The primary driver of vendor GMV at Chowdeck is order volume, sustained over consistent weekly activity, amplified by advertising investment.

Technique	Key Finding	Business Implication
EDA	Right-skewed GMV; few anchor vendors dominate	Protect top accounts; identify and grow mid-tier vendors
Visualisation	Lagos dominates; Jerry leads; ads vendors outperform	Prioritise Lagos pipeline; replicate Jerry’s approach
Hypothesis Testing	Ads significantly increase GMV; vendor type matters	Activate ads for all eligible vendors in Q2
Correlation	Orders (ρ=0.89) and active weeks strongly correlated with GMV	Drive order frequency; prevent vendor churn
Regression	Orders, active weeks, and ads all significant predictors	Build a vendor scoring model to prioritise Q2 pipeline

Single integrated recommendation: Chowdeck’s Q2 sales strategy should focus on three levers in priority order — (1) increase order frequency for existing high-potential vendors through promotions and combo menus, (2) activate advertising for the 40%+ of vendors currently not running ads, and (3) retain anchor vendors through dedicated account management. A vendor that is consistently active, running ads, and driving high order volume is the highest-value account on the platform.

11 Limitations & Further Work

11.1 Limitations

Sample coverage: The 120-vendor dataset represents one sales team’s Q1 portfolio. It is not a random sample of all Chowdeck vendors and should not be generalised to the full platform without validation.
Causality: Correlation and regression establish association, not causation. The finding that ads are associated with higher GMV could reflect reverse causality — higher-performing vendors may be more likely to invest in ads.
Omitted variables: The model does not include vendor quality ratings, menu size, pricing strategy, or competitive intensity by location — all of which may confound the observed relationships.
Time period: Q1 2026 covers 13 weeks and may not be representative of full-year patterns — particularly seasonal effects around Easter and school resumption.
Non-normality: GMV data is heavily right-skewed, requiring non-parametric tests and log-transformation in regression. While addressed, this limits direct interpretability of raw coefficients.

11.2 Further Work

Extend the dataset to cover all four quarters of 2025–2026 to enable seasonal trend analysis
Include competitor price data and category-level demand to model market share dynamics
Build a predictive model (logistic regression or random forest) to classify vendors at churn risk before they go inactive
Conduct a controlled experiment — randomly assign advertising credits to a subset of vendors and measure GMV lift — to establish causality rather than correlation
Incorporate customer-level data to understand the relationship between vendor GMV and customer lifetime value

12 References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27(1), 17–21. https://doi.org/10.1080/00031305.1973.10478966

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/

Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.

Van Rossum, G., & Drake, F. L. (2009). Python 3 reference manual. CreateSpace.

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wilkinson, L. (2005). The grammar of graphics (2nd ed.). Springer.

Show Code

# Run these in your R console to retrieve APA citations for each package used
citation("ggplot2")
citation("corrplot")
citation("ggcorrplot")
citation("kableExtra")
citation("patchwork")
citation("broom")
citation("effsize")
citation("moments")
citation("dunn.test")
citation("car")

13 Appendix: AI Usage Statement

This analysis was completed as part of the MMBA 8 Data Analytics II assessment at Lagos Business School. The following AI tools were used during the preparation of this document:

Claude (Anthropic) was used to assist with structuring the Quarto document template, suggesting appropriate R and Python package combinations for each analytical technique, and generating starter code blocks for the visualisation and regression sections.

Independent analytical judgements made by the author include:

Selection of Case Study 1 as the most appropriate option given the professional context
Decision to use log-transformation for GMV due to observed right-skewness in the data
Choice of Mann-Whitney U and Kruskal-Wallis as non-parametric alternatives after failing normality tests
Selection of Spearman rather than Pearson correlation given non-normal distributions
Framing of the two hypotheses based on genuine business questions arising from Q1 sales operations at Chowdeck
All plain-language business interpretations and the integrated recommendation

The data collection methodology, professional disclosure, and business context descriptions reflect the author’s own direct professional experience and were not generated by AI.

1 Executive Summary

2 Professional Disclosure

2.1 Role & Organisation

2.2 Technique Justification

3 Data Collection & Sampling

3.1 Data Source & Collection Method

3.2 Sampling Frame & Sample Size

3.3 Time Period

3.4 Ethical Notes & Consent

4 Data Description

4.1 Variable Distributions

5 Exploratory Data Analysis (EDA)

5.1 Missing Values & Data Quality

5.2 Distribution of Key Variables

5.3 Outlier Detection

6 Data Visualisation

7 Hypothesis Testing

7.1 Hypothesis 1 — Does Running Ads Significantly Increase Vendor GMV?

7.2 Hypothesis 2 — Does GMV Differ Significantly Across Vendor Types?

8 Correlation Analysis

8.1 Top Correlations — Business Interpretation

9 Linear Regression

10 Integrated Findings

11 Limitations & Further Work

11.1 Limitations

11.2 Further Work

12 References

13 Appendix: AI Usage Statement