What Drives Vendor GMV on Chowdeck? An Exploratory and Inferential Analysis of Q1 2025 Sales Data

Author

Jeremiah Izuagie

Published

May 15, 2026

1 Executive Summary

Chowdeck is a Nigerian food and drink delivery platform operating across Lagos, Abuja, Port Harcourt, Ibadan, and Kano. This report analyses 120 vendor records collected from Chowdeck’s internal sales operations system covering Q1 2025 (January 1 – March 29, 2025). The dataset captures vendor-level Gross Merchandise Value (GMV), order volumes, product types, vendor categories, salesperson assignments, onboarding dates, and weekly trading performance across 13 weeks.

The central business question is: which vendor characteristics and operational factors most strongly predict GMV performance? Five analytical techniques — Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression — are applied to answer this question from multiple angles.

Key findings reveal that vendor type and product type are significant differentiators of GMV; that vendors onboarded earlier in the quarter consistently outperform later-onboarded vendors; and that total orders and active weeks are the strongest predictors of GMV in a regression framework. The integrated recommendation is that the sales team should prioritise early onboarding and order volume activation as the primary levers for GMV growth in Q2 2025.


2 Professional Disclosure

2.1 Role and Organisation

I am a salesperson on Chowdeck’s three-person commercial sales team. Chowdeck is a Nigerian food and drink delivery platform. My responsibilities include onboarding new vendor partners, managing existing vendor relationships, driving GMV through featured listings, advertisements, and promotional events, and building the Chowstore supplier pipeline — Chowdeck’s new Lagos-based online supermarket product.

2.2 Technique Relevance

Exploratory Data Analysis (EDA): I work with vendor sales data weekly to identify underperforming accounts and flag accounts that need intervention. EDA is the first step I take every week before any decision — understanding the shape, spread, and anomalies in GMV data directly informs which vendors I prioritise for outreach.

Data Visualisation: Chowdeck’s commercial conversations — both internal (weekly team reviews, manager check-ins) and external (vendor pitches, partner decks) — rely heavily on visual storytelling. Being able to translate a GMV trend into a chart that a non-technical vendor partner or line manager immediately understands is a core job skill.

Hypothesis Testing: A recurring question on the sales team is whether certain vendor types, cities, or product formats generate materially different GMV. Hypothesis testing gives a statistically grounded answer to these questions, moving team strategy decisions from intuition to evidence.

Correlation Analysis: Understanding which variables move together — orders and GMV, active weeks and GMV, vendor type and performance — helps me identify the highest-leverage inputs to focus on. Correlation analysis is directly applicable to building a vendor scoring model that the team uses informally today.

Linear Regression: Regression allows me to quantify the marginal contribution of each vendor characteristic to GMV. A regression model with interpretable coefficients gives me a practical tool for forecasting expected GMV when onboarding a new vendor, which is a core input to prioritisation decisions.


3 Data Collection & Sampling

3.1 Source and Collection Method

The dataset was extracted from Chowdeck’s internal sales tracking system, which records vendor-level performance data on a weekly basis. The data was compiled and structured by the sales team as part of standard Q1 reporting operations. Variables were verified against the platform’s vendor management records.

3.2 Dataset Overview

Attribute Detail
Dataset name Chowdeck Q1 2025 Vendor Sales Data
Collection period January 1, 2025 – March 29, 2025 (13 weeks)
Unit of observation Individual vendor
Total observations 120 vendors
Variables 23 (see Section 4)
Geographic scope Lagos, Abuja, Port Harcourt, Ibadan, Kano

3.3 Sampling Frame

The sampling frame is the full population of vendor accounts managed by the Chowdeck sales team during Q1 2025. The three-person team (Jeremy, Jack, Lou) each managed 40 vendors. No random sampling was applied — this is a census of all active vendor accounts for the quarter, ensuring no selection bias.

3.4 Onboarding Timeline

Not all vendors were active from day one. Twenty vendors were onboarded at the start of Q1 (January 1). The remaining 100 vendors were progressively onboarded through the quarter across eight subsequent cohorts (January 8 through February 26), reflecting the sales team’s live pipeline-building activity.

3.5 Ethical Notes

All vendor names have been anonymised using codenames (e.g., Vendor_001) to comply with Chowdeck’s data-sharing policy. No personally identifiable information is published. Salesperson names have been replaced with code names (Jeremy, Jack, Lou). The data is used solely for academic analysis under the author’s own access rights as a Chowdeck employee.

Data citation: Onuoha, J. (2026). Chowdeck Q1 2025 vendor sales performance data [Dataset]. Collected from Chowdeck Commercial Sales Team, Lagos, Nigeria. Data available on request from the author.


4 Data Description

4.1 Loading the Data

Code
library(tidyverse)
library(skimr)
library(janitor)
library(knitr)
library(kableExtra)

df <- read_csv("chowdeck_q1_sales_data.csv") |>
  clean_names() |>
  mutate(
    onboarded_date = as.Date(onboarded_date),
    salesperson    = as.factor(salesperson),
    vendor_type    = as.factor(vendor_type),
    city           = as.factor(city),
    product_type   = as.factor(product_type)
  )

glimpse(df)
Rows: 120
Columns: 23
$ vendor_id      <dbl> 100000, 100001, 100002, 100003, 100004, 100005, 100006,…
$ vendor_name    <chr> "Vendor_045", "Vendor_048", "Vendor_005", "Vendor_056",…
$ salesperson    <fct> Jeremy, Jack, Lou, Jeremy, Jack, Lou, Jeremy, Jack, Lou…
$ vendor_type    <fct> Event, Supermarket, Supermarket, Restaurant, Pharmacy, …
$ city           <fct> Abuja, Port Harcourt, Lagos, Kano, Lagos, Lagos, Ibadan…
$ product_type   <fct> Events, GMV, GMV, GMV, GMV, GMV, GMV, GMV, GMV, Ads, GM…
$ onboarded_date <date> 2025-01-15, 2025-02-12, 2025-02-05, 2025-01-15, 2025-0…
$ active_weeks   <dbl> 11, 7, 8, 11, 9, 7, 6, 6, 7, 9, 10, 7, 10, 10, 11, 13, …
$ total_orders   <dbl> 106, 32, 97, 1512, 66, 59, 26, 28, 423, 33, 199, 8, 334…
$ total_gmv      <dbl> 456597, 77816, 302908, 6465846, 223174, 134956, 82103, …
$ week_1         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ week_2         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2377…
$ week_3         <dbl> 32788, 0, 0, 557282, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1150…
$ week_4         <dbl> 48222, 0, 0, 688851, 0, 0, 0, 0, 0, 0, 138526, 0, 77313…
$ week_5         <dbl> 29422, 0, 0, 429074, 31077, 0, 0, 0, 0, 21607, 132214, …
$ week_6         <dbl> 46709, 0, 32259, 480979, 23191, 0, 0, 0, 0, 15504, 1143…
$ week_7         <dbl> 42264, 13907, 41631, 673388, 25196, 20538, 0, 0, 166692…
$ week_8         <dbl> 29658, 11613, 33978, 605177, 27460, 26246, 12956, 15872…
$ week_9         <dbl> 39980, 10199, 33579, 679833, 28152, 19994, 16273, 13969…
$ week_10        <dbl> 51538, 7506, 49641, 623911, 25404, 15611, 18170, 13619,…
$ week_11        <dbl> 50543, 9793, 35527, 608379, 17987, 16361, 7793, 14182, …
$ week_12        <dbl> 39166, 12515, 41274, 617542, 21987, 17032, 11358, 16521…
$ week_13        <dbl> 46307, 12283, 35019, 501430, 22720, 19174, 15553, 16489…
Code
import pandas as pd
import numpy as np

df = pd.read_csv("chowdeck_q1_sales_data.csv", parse_dates=["Onboarded_Date"])

for col in ["Salesperson", "Vendor_Type", "City", "Product_Type"]:
    df[col] = df[col].astype("category")

print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120 entries, 0 to 119
Data columns (total 23 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   Vendor_ID       120 non-null    int64         
 1   Vendor_Name     120 non-null    object        
 2   Salesperson     120 non-null    category      
 3   Vendor_Type     120 non-null    category      
 4   City            120 non-null    category      
 5   Product_Type    120 non-null    category      
 6   Onboarded_Date  120 non-null    datetime64[ns]
 7   Active_Weeks    120 non-null    int64         
 8   Total_Orders    120 non-null    int64         
 9   Total_GMV       120 non-null    int64         
 10  Week_1          120 non-null    int64         
 11  Week_2          120 non-null    int64         
 12  Week_3          120 non-null    int64         
 13  Week_4          120 non-null    int64         
 14  Week_5          120 non-null    int64         
 15  Week_6          120 non-null    int64         
 16  Week_7          120 non-null    int64         
 17  Week_8          120 non-null    int64         
 18  Week_9          120 non-null    int64         
 19  Week_10         120 non-null    int64         
 20  Week_11         120 non-null    int64         
 21  Week_12         120 non-null    int64         
 22  Week_13         120 non-null    int64         
dtypes: category(4), datetime64[ns](1), int64(17), object(1)
memory usage: 19.1+ KB
None
Code
print(df.head())
   Vendor_ID Vendor_Name Salesperson  ... Week_11 Week_12 Week_13
0     100000  Vendor_045      Jeremy  ...   50543   39166   46307
1     100001  Vendor_048        Jack  ...    9793   12515   12283
2     100002  Vendor_005         Lou  ...   35527   41274   35019
3     100003  Vendor_056      Jeremy  ...  608379  617542  501430
4     100004  Vendor_027        Jack  ...   17987   21987   22720

[5 rows x 23 columns]

4.2 Variable Dictionary

Variable dictionary for the Chowdeck Q1 2025 dataset
Variable Type Description
Vendor_ID Numeric (ID) Unique vendor identifier
Vendor_Name Text Anonymised vendor codename
Salesperson Categorical Sales team member responsible (Jeremy / Jack / Lou)
Vendor_Type Categorical Vendor business category (Restaurant / Supermarket / Pharmacy / Event)
City Categorical Vendor city (Lagos / Abuja / Port Harcourt / Ibadan / Kano)
Product_Type Categorical Chowdeck product used (GMV / Ads / Events / Voucher)
Onboarded_Date Date Date vendor went live on Chowdeck in Q1 2025
Active_Weeks Numeric Number of weeks vendor traded during the quarter
Total_Orders Numeric Cumulative orders placed through the vendor in Q1
Total_GMV Numeric (₦) Gross Merchandise Value generated in Q1 (primary outcome)
Week_1-Week_13 Numeric (₦) Weekly GMV for each of the 13 weeks of Q1 2025

4.3 Summary Statistics

Code
df |>
  select(active_weeks, total_orders, total_gmv) |>
  skim()
Data summary
Name select(df, active_weeks, …
Number of rows 120
Number of columns 3
_______________________
Column type frequency:
numeric 3
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
active_weeks 0 1 9.34 2.68 5 7.0 9.5 12.00 13 ▆▆▃▆▇
total_orders 0 1 239.92 318.91 8 53.5 107.0 271.75 1726 ▇▁▁▁▁
total_gmv 0 1 923489.92 1450582.35 28674 195853.5 455276.5 1087548.75 11621850 ▇▁▁▁▁
Code
df[["Active_Weeks", "Total_Orders", "Total_GMV"]].describe().round(0)
       Active_Weeks  Total_Orders   Total_GMV
count         120.0         120.0       120.0
mean            9.0         240.0    923490.0
std             3.0         319.0   1450582.0
min             5.0           8.0     28674.0
25%             7.0          54.0    195854.0
50%            10.0         107.0    455276.0
75%            12.0         272.0   1087549.0
max            13.0        1726.0  11621850.0

4.4 Categorical Variable Distributions

Code
df |>
  select(salesperson, vendor_type, city, product_type) |>
  map(tabyl) |>
  map(~ kable(.x, digits = 1) |> kable_styling(bootstrap_options = "striped"))
$salesperson
<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> .x[[i]] </th>
   <th style="text-align:right;"> n </th>
   <th style="text-align:right;"> percent </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Jack </td>
   <td style="text-align:right;"> 40 </td>
   <td style="text-align:right;"> 0.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Jeremy </td>
   <td style="text-align:right;"> 40 </td>
   <td style="text-align:right;"> 0.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Lou </td>
   <td style="text-align:right;"> 40 </td>
   <td style="text-align:right;"> 0.3 </td>
  </tr>
</tbody>
</table>
$vendor_type
<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> .x[[i]] </th>
   <th style="text-align:right;"> n </th>
   <th style="text-align:right;"> percent </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Event </td>
   <td style="text-align:right;"> 11 </td>
   <td style="text-align:right;"> 0.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Pharmacy </td>
   <td style="text-align:right;"> 31 </td>
   <td style="text-align:right;"> 0.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Restaurant </td>
   <td style="text-align:right;"> 37 </td>
   <td style="text-align:right;"> 0.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Supermarket </td>
   <td style="text-align:right;"> 41 </td>
   <td style="text-align:right;"> 0.3 </td>
  </tr>
</tbody>
</table>
$city
<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> .x[[i]] </th>
   <th style="text-align:right;"> n </th>
   <th style="text-align:right;"> percent </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Abuja </td>
   <td style="text-align:right;"> 22 </td>
   <td style="text-align:right;"> 0.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Ibadan </td>
   <td style="text-align:right;"> 11 </td>
   <td style="text-align:right;"> 0.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Kano </td>
   <td style="text-align:right;"> 7 </td>
   <td style="text-align:right;"> 0.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Lagos </td>
   <td style="text-align:right;"> 62 </td>
   <td style="text-align:right;"> 0.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Port Harcourt </td>
   <td style="text-align:right;"> 18 </td>
   <td style="text-align:right;"> 0.1 </td>
  </tr>
</tbody>
</table>
$product_type
<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> .x[[i]] </th>
   <th style="text-align:right;"> n </th>
   <th style="text-align:right;"> percent </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Ads </td>
   <td style="text-align:right;"> 21 </td>
   <td style="text-align:right;"> 0.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Events </td>
   <td style="text-align:right;"> 11 </td>
   <td style="text-align:right;"> 0.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> GMV </td>
   <td style="text-align:right;"> 84 </td>
   <td style="text-align:right;"> 0.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Voucher </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 0.0 </td>
  </tr>
</tbody>
</table>
Code
for col in ["Salesperson", "Vendor_Type", "City", "Product_Type"]:
    print(f"\n{col}:")
    print(df[col].value_counts())

Salesperson:
Salesperson
Jack      40
Jeremy    40
Lou       40
Name: count, dtype: int64

Vendor_Type:
Vendor_Type
Supermarket    41
Restaurant     37
Pharmacy       31
Event          11
Name: count, dtype: int64

City:
City
Lagos            62
Abuja            22
Port Harcourt    18
Ibadan           11
Kano              7
Name: count, dtype: int64

Product_Type:
Product_Type
GMV        84
Ads        21
Events     11
Voucher     4
Name: count, dtype: int64

5 Technique 1 — Exploratory Data Analysis

5.1 Business Justification

Before advising a vendor partner or presenting to management, I need to understand the baseline shape of our sales data: where GMV is concentrated, who the outliers are, whether the data has quality issues, and how active weeks map to performance. This EDA directly replicates my weekly Monday review process, now conducted with formal statistical tools.

5.2 Missing Values

Code
library(naniar)
vis_miss(df)

Code
miss_var_summary(df)
# A tibble: 23 × 3
   variable       n_miss pct_miss
   <chr>           <int>    <num>
 1 vendor_id           0        0
 2 vendor_name         0        0
 3 salesperson         0        0
 4 vendor_type         0        0
 5 city                0        0
 6 product_type        0        0
 7 onboarded_date      0        0
 8 active_weeks        0        0
 9 total_orders        0        0
10 total_gmv           0        0
# ℹ 13 more rows
Code
import missingno as msno
import matplotlib.pyplot as plt

print("Missing values per column:")
Missing values per column:
Code
print(df.isnull().sum())
Vendor_ID         0
Vendor_Name       0
Salesperson       0
Vendor_Type       0
City              0
Product_Type      0
Onboarded_Date    0
Active_Weeks      0
Total_Orders      0
Total_GMV         0
Week_1            0
Week_2            0
Week_3            0
Week_4            0
Week_5            0
Week_6            0
Week_7            0
Week_8            0
Week_9            0
Week_10           0
Week_11           0
Week_12           0
Week_13           0
dtype: int64
Code
msno.matrix(df)
plt.title("Missing Value Matrix")
plt.show()

Finding: The dataset has zero missing values across all 23 columns. This is expected given the data originates from a structured internal system with mandatory fields.

5.3 Outlier Detection

Code
df |>
  select(total_gmv, total_orders, active_weeks) |>
  pivot_longer(everything()) |>
  ggplot(aes(x = name, y = value)) +
  geom_boxplot(fill = "#18a558", alpha = 0.6, outlier.colour = "#f97316") +
  facet_wrap(~name, scales = "free") +
  labs(title = "Boxplots of Key Numeric Variables", x = NULL, y = "Value") +
  theme_minimal()

Code
import seaborn as sns

fig, axes = plt.subplots(1, 3, figsize=(12, 5))
for ax, col in zip(axes, ["Total_GMV", "Total_Orders", "Active_Weeks"]):
    sns.boxplot(y=df[col], ax=ax, color="#18a558")
    ax.set_title(col)
plt.suptitle("Boxplots of Key Numeric Variables")
plt.tight_layout()
plt.show()

Finding — Data Quality Issue 1 (Outliers in Total_GMV): Total_GMV shows several high-value outliers well above the interquartile range. These represent genuinely high-performing vendors (e.g., large restaurants or busy supermarkets) rather than data errors — confirmed by cross-checking their order volumes. They are retained in the dataset but noted as influential observations that could affect regression diagnostics.

5.4 Distribution of the Outcome Variable

Code
library(patchwork)

p1 <- ggplot(df, aes(x = total_gmv)) +
  geom_histogram(bins = 30, fill = "#18a558", colour = "white") +
  labs(title = "Total GMV — Raw", x = "GMV (₦)", y = "Count") +
  theme_minimal()

p2 <- ggplot(df, aes(x = log(total_gmv))) +
  geom_histogram(bins = 30, fill = "#0f6e3a", colour = "white") +
  labs(title = "Total GMV — Log-Transformed", x = "log(GMV)", y = "Count") +
  theme_minimal()

p1 + p2

Code
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].hist(df["Total_GMV"], bins=30, color="#18a558", edgecolor="white")
axes[0].set_title("Total GMV — Raw")
axes[0].set_xlabel("GMV (₦)")

axes[1].hist(np.log(df["Total_GMV"]), bins=30, color="#0f6e3a", edgecolor="white")
axes[1].set_title("Total GMV — Log-Transformed")
axes[1].set_xlabel("log(GMV)")

plt.tight_layout()
plt.show()

Finding — Data Quality Issue 2 (Right Skew): The raw GMV distribution is heavily right-skewed, consistent with a long tail of high-performing vendors. The log-transformed distribution is approximately normal. This has implications for regression — the log transformation of GMV will be used as the dependent variable in Section 9 to satisfy OLS assumptions.

5.5 GMV by Active Weeks

Code
df |>
  group_by(active_weeks) |>
  summarise(
    n = n(),
    median_gmv = median(total_gmv),
    mean_gmv   = mean(total_gmv)
  ) |>
  ggplot(aes(x = active_weeks, y = median_gmv)) +
  geom_col(fill = "#18a558") +
  geom_text(aes(label = scales::comma(median_gmv)), vjust = -0.5, size = 3) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Median GMV by Active Weeks",
       x = "Active Weeks", y = "Median GMV (₦)") +
  theme_minimal()

Code
summary = df.groupby("Active_Weeks")["Total_GMV"].median().reset_index()
plt.figure(figsize=(10, 5))
plt.bar(summary["Active_Weeks"], summary["Total_GMV"], color="#18a558")
plt.xlabel("Active Weeks")
plt.ylabel("Median GMV (₦)")
plt.title("Median GMV by Active Weeks")
plt.tight_layout()
plt.show()

Business interpretation: Vendors with more active weeks generate substantially higher GMV, confirming that early onboarding is a key driver. This directly supports the Q2 strategy of prioritising early vendor activation.


6 Technique 2 — Data Visualisation

6.1 Business Justification

Chowdeck’s weekly sales reviews require clear visual summaries that can be shared with team leads and used in vendor partner conversations. The five visualisations below tell a coherent story: GMV is unevenly distributed, concentrated in specific vendor types and cities, and strongly tied to onboarding timing.

6.2 Plot 1 — GMV Distribution by Vendor Type

Code
ggplot(df, aes(x = reorder(vendor_type, total_gmv, median), y = total_gmv, fill = vendor_type)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21) +
  scale_y_log10(labels = scales::comma) +
  scale_fill_manual(values = c("#18a558","#0f6e3a","#f97316","#e6f5ee")) +
  coord_flip() +
  labs(title = "GMV Distribution by Vendor Type",
       subtitle = "Log scale; ordered by median GMV",
       x = NULL, y = "Total GMV (₦, log scale)") +
  theme_minimal() +
  theme(legend.position = "none")

Code
order = df.groupby("Vendor_Type")["Total_GMV"].median().sort_values().index
plt.figure(figsize=(10, 5))
sns.boxplot(data=df, y="Vendor_Type", x="Total_GMV", order=order,
            palette=["#18a558","#0f6e3a","#f97316","#e6f5ee"], log_scale=True)
plt.title("GMV Distribution by Vendor Type (log scale)")
plt.xlabel("Total GMV (₦)")
plt.ylabel(None)
plt.tight_layout()
plt.show()

6.3 Plot 2 — GMV by City

Code
df |>
  group_by(city) |>
  summarise(total = sum(total_gmv), n = n()) |>
  ggplot(aes(x = reorder(city, total), y = total, fill = city)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = paste0("n=", n)), hjust = -0.2, size = 3.5) +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_manual(values = c("#18a558","#0f6e3a","#2ecc71","#f97316","#e6f5ee")) +
  coord_flip() +
  labs(title = "Total Q1 GMV by City", x = NULL, y = "Total GMV (₦)") +
  theme_minimal()

Code
city_summary = df.groupby("City")["Total_GMV"].sum().sort_values()
colors = ["#e6f5ee","#18a558","#0f6e3a","#2ecc71","#f97316"]
plt.figure(figsize=(10, 5))
city_summary.plot(kind="barh", color=colors)
plt.title("Total Q1 GMV by City")
plt.xlabel("Total GMV (₦)")
plt.tight_layout()
plt.show()

6.4 Plot 3 — Weekly GMV Trend Across the Quarter

Code
week_cols <- paste0("week_", 1:13)

df |>
  select(all_of(week_cols)) |>
  summarise(across(everything(), sum)) |>
  pivot_longer(everything(), names_to = "week", values_to = "gmv") |>
  mutate(week_num = as.integer(str_extract(week, "\\d+"))) |>
  ggplot(aes(x = week_num, y = gmv)) +
  geom_line(colour = "#18a558", linewidth = 1.4) +
  geom_point(size = 3.5, colour = "#0f6e3a", fill = "#18a558", shape = 21, stroke = 1.5) +
  scale_y_continuous(labels = scales::comma) +
  scale_x_continuous(breaks = 1:13, labels = paste0("W", 1:13)) +
  labs(title = "Total Weekly GMV Across Q1 2025",
       x = "Week", y = "Total GMV (₦)") +
  theme_minimal()

Code
week_cols = [f"Week_{i}" for i in range(1, 14)]
weekly_totals = df[week_cols].sum()
plt.figure(figsize=(12, 5))
plt.plot(range(1, 14), weekly_totals.values, marker="o", color="#18a558",
         markerfacecolor="#0f6e3a", markersize=8, linewidth=2.5)
plt.xticks(range(1, 14), [f"W{i}" for i in range(1, 14)])
([<matplotlib.axis.XTick object at 0x16b953700>, <matplotlib.axis.XTick object at 0x16b953460>, <matplotlib.axis.XTick object at 0x16b86dfd0>, <matplotlib.axis.XTick object at 0x16b98c8e0>, <matplotlib.axis.XTick object at 0x16b985910>, <matplotlib.axis.XTick object at 0x16b98cf70>, <matplotlib.axis.XTick object at 0x16b993a60>, <matplotlib.axis.XTick object at 0x16b998550>, <matplotlib.axis.XTick object at 0x16b998ca0>, <matplotlib.axis.XTick object at 0x16b993220>, <matplotlib.axis.XTick object at 0x16b99f940>, <matplotlib.axis.XTick object at 0x16b9a6430>, <matplotlib.axis.XTick object at 0x16b9a6ee0>], [Text(1, 0, 'W1'), Text(2, 0, 'W2'), Text(3, 0, 'W3'), Text(4, 0, 'W4'), Text(5, 0, 'W5'), Text(6, 0, 'W6'), Text(7, 0, 'W7'), Text(8, 0, 'W8'), Text(9, 0, 'W9'), Text(10, 0, 'W10'), Text(11, 0, 'W11'), Text(12, 0, 'W12'), Text(13, 0, 'W13')])
Code
plt.title("Total Weekly GMV Across Q1 2025")
plt.xlabel("Week")
plt.ylabel("Total GMV (₦)")
plt.tight_layout()
plt.show()

6.5 Plot 4 — GMV by Salesperson and Product Type

Code
df |>
  group_by(salesperson, product_type) |>
  summarise(total_gmv = sum(total_gmv), .groups = "drop") |>
  ggplot(aes(x = salesperson, y = total_gmv, fill = product_type)) +
  geom_col(position = "stack") +
  scale_fill_manual(values = c("#18a558","#0f6e3a","#f97316","#fde68a")) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Q1 GMV by Salesperson and Product Type",
       x = "Salesperson", y = "Total GMV (₦)", fill = "Product Type") +
  theme_minimal()

Code
pivot = df.groupby(["Salesperson", "Product_Type"])["Total_GMV"].sum().unstack(fill_value=0)
pivot.plot(kind="bar", stacked=True, figsize=(10, 6),
           color=["#18a558","#0f6e3a","#f97316","#fde68a"])
plt.title("Q1 GMV by Salesperson and Product Type")
plt.ylabel("Total GMV (₦)")
plt.xlabel("Salesperson")
plt.xticks(rotation=0)
(array([0, 1, 2]), [Text(0, 0, 'Jack'), Text(1, 0, 'Jeremy'), Text(2, 0, 'Lou')])
Code
plt.tight_layout()
plt.show()

6.6 Plot 5 — GMV vs Orders: Scatter by Vendor Type

Code
ggplot(df, aes(x = total_orders, y = total_gmv, colour = vendor_type)) +
  geom_point(alpha = 0.75, size = 2.5) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.9) +
  scale_colour_manual(values = c("#18a558","#0f6e3a","#f97316","#888")) +
  scale_x_log10(labels = scales::comma) +
  scale_y_log10(labels = scales::comma) +
  labs(title = "Total GMV vs Total Orders by Vendor Type",
       subtitle = "Both axes on log scale",
       x = "Total Orders (log scale)", y = "Total GMV (₦, log scale)",
       colour = "Vendor Type") +
  theme_minimal()

Code
colors = {"Restaurant":"#18a558","Supermarket":"#0f6e3a","Pharmacy":"#f97316","Event":"#888"}
plt.figure(figsize=(10, 6))
for vtype in df["Vendor_Type"].cat.categories:
    subset = df[df["Vendor_Type"] == vtype]
    plt.scatter(subset["Total_Orders"], subset["Total_GMV"],
                label=vtype, alpha=0.75, color=colors.get(vtype, "#999"))
plt.xscale("log")
plt.yscale("log")
plt.title("Total GMV vs Total Orders by Vendor Type (log-log scale)")
plt.xlabel("Total Orders")
plt.ylabel("Total GMV (₦)")
plt.legend()
plt.tight_layout()
plt.show()

Visual narrative summary: The five charts collectively tell one story — GMV on Chowdeck is concentrated in Lagos, driven disproportionately by restaurant and event vendors, grew week-on-week as more vendors were onboarded, and is most predictable when order volume is high. Early-onboarded vendors account for the bulk of team GMV, underscoring the value of a front-loaded pipeline strategy.


7 Technique 3 — Hypothesis Testing

7.1 Business Justification

The sales team regularly debates whether some vendor types or cities perform materially better than others. These debates drive resource allocation decisions. Hypothesis testing replaces opinion with a statistically grounded answer, informing which vertical or geography to prioritise in Q2.

7.2 Hypothesis 1 — Does GMV differ significantly across Vendor Types?

H₀: Mean GMV is the same across all vendor types (Restaurant, Supermarket, Pharmacy, Event). H₁: At least one vendor type has a significantly different mean GMV.

Test: One-way ANOVA (followed by Tukey HSD post-hoc if significant). Log GMV is used to satisfy the normality assumption.

Code
library(car)
library(emmeans)
library(effectsize)

df <- df |> mutate(log_gmv = log(total_gmv))

leveneTest(log_gmv ~ vendor_type, data = df)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   3  1.4209 0.2402
      116               
Code
model_aov <- aov(log_gmv ~ vendor_type, data = df)
summary(model_aov)
             Df Sum Sq Mean Sq F value Pr(>F)
vendor_type   3   4.68   1.561   1.172  0.324
Residuals   116 154.57   1.333               
Code
emmeans(model_aov, pairwise ~ vendor_type, adjust = "tukey")
$emmeans
 vendor_type emmean    SE  df lower.CL upper.CL
 Event         13.1 0.348 116     12.4     13.8
 Pharmacy      12.8 0.207 116     12.4     13.2
 Restaurant    13.3 0.190 116     13.0     13.7
 Supermarket   13.0 0.180 116     12.6     13.3

Confidence level used: 0.95 

$contrasts
 contrast                 estimate    SE  df t.ratio p.value
 Event - Pharmacy            0.240 0.405 116   0.593  0.9339
 Event - Restaurant         -0.252 0.396 116  -0.637  0.9200
 Event - Supermarket         0.123 0.392 116   0.314  0.9893
 Pharmacy - Restaurant      -0.493 0.281 116  -1.753  0.3014
 Pharmacy - Supermarket     -0.117 0.275 116  -0.427  0.9737
 Restaurant - Supermarket    0.375 0.262 116   1.434  0.4810

P value adjustment: tukey method for comparing a family of 4 estimates 
Code
eta_squared(model_aov)
# Effect Size for ANOVA

Parameter   | Eta2 |       95% CI
---------------------------------
vendor_type | 0.03 | [0.00, 1.00]

- One-sided CIs: upper bound fixed at [1.00].
Code
from scipy import stats
import pingouin as pg

df["log_gmv"] = np.log(df["Total_GMV"])

groups = [grp["log_gmv"].values for _, grp in df.groupby("Vendor_Type")]
f_stat, p_val = stats.f_oneway(*groups)
print(f"ANOVA: F = {f_stat:.3f}, p = {p_val:.4f}")
ANOVA: F = 1.172, p = 0.3237
Code
aov_result = pg.anova(data=df, dv="log_gmv", between="Vendor_Type", detailed=True)
print(aov_result)
        Source          SS   DF        MS         F    p-unc      np2
0  Vendor_Type    4.683547    3  1.561182  1.171626  0.32369  0.02941
1       Within  154.569094  116  1.332492       NaN      NaN      NaN
Code
tukey = pg.pairwise_tukey(data=df, dv="log_gmv", between="Vendor_Type")
print(tukey)
            A            B    mean(A)  ...         T   p-tukey    hedges
0       Event     Pharmacy  13.079639  ...  0.593285  0.933929  0.225374
1       Event   Restaurant  13.079639  ... -0.636551  0.919992 -0.210012
2       Event  Supermarket  13.079639  ...  0.313642  0.989259  0.110431
3    Pharmacy   Restaurant  12.839290  ... -1.752951  0.301424 -0.407141
4    Pharmacy  Supermarket  12.839290  ... -0.427358  0.973685 -0.102256
5  Restaurant  Supermarket  13.331981  ...  1.433726  0.481049  0.307761

[6 rows x 9 columns]

Interpretation: If p < 0.05 we reject H₀ and conclude that vendor type is a significant driver of GMV. Practically, this means the sales team should not treat all vendor categories as equally valuable — resource allocation should reflect performance differences across types. The Tukey post-hoc identifies which specific pairs differ significantly. Update this paragraph with your actual p-value and effect size (eta²) once you run the code.

7.3 Hypothesis 2 — Does GMV differ significantly across Cities?

H₀: Mean GMV is the same across all five cities. H₁: At least one city has a significantly different mean GMV.

Test: Kruskal-Wallis test (non-parametric alternative to ANOVA, used because city group sizes are highly unequal — Kano has only 7 vendors).

Code
kruskal.test(log_gmv ~ city, data = df)

    Kruskal-Wallis rank sum test

data:  log_gmv by city
Kruskal-Wallis chi-squared = 3.6856, df = 4, p-value = 0.4502
Code
ggplot(df, aes(x = reorder(city, log_gmv, median), y = log_gmv, fill = city)) +
  geom_boxplot(alpha = 0.7, show.legend = FALSE) +
  scale_fill_manual(values = c("#18a558","#0f6e3a","#2ecc71","#f97316","#e6f5ee")) +
  coord_flip() +
  labs(title = "Log GMV Distribution by City", x = NULL, y = "log(GMV)") +
  theme_minimal()

Code
city_groups = [grp["log_gmv"].values for _, grp in df.groupby("City")]
h_stat, p_val = stats.kruskal(*city_groups)
print(f"Kruskal-Wallis: H = {h_stat:.3f}, p = {p_val:.4f}")
Kruskal-Wallis: H = 3.686, p = 0.4502
Code
df.boxplot(column="log_gmv", by="City", figsize=(10, 5))
plt.suptitle("")
plt.title("Log GMV Distribution by City")
plt.ylabel("log(GMV)")
plt.tight_layout()
plt.show()

Interpretation: A significant result here tells the sales team that city matters beyond just vendor count — meaning a Lagos vendor is not merely expected to produce more GMV because there are more Lagos vendors, but because Lagos vendors are fundamentally different in their trading patterns. Update this paragraph with your actual H statistic and p-value once you run the code.


8 Technique 4 — Correlation Analysis

8.1 Business Justification

Before building a regression model, I need to understand which variables are linearly related to GMV, and whether any predictors are so strongly correlated with each other that multicollinearity could distort the regression. Correlation analysis also tells me which operational levers — orders, active weeks, onboarding timing — are most closely tied to revenue.

8.2 Correlation Matrix

Code
library(corrplot)

numeric_vars <- df |>
  select(log_gmv, total_orders, active_weeks, starts_with("week_")) |>
  rename_with(~ str_replace(.x, "week_", "W"))

cor_matrix <- cor(numeric_vars, method = "pearson", use = "complete.obs")

corrplot(cor_matrix, method = "color", type = "lower",
         tl.cex = 0.7, number.cex = 0.6,
         addCoef.col = "black",
         col = colorRampPalette(c("#f97316", "white", "#18a558"))(200),
         title = "Pearson Correlation Matrix", mar = c(0,0,2,0))

Code
from matplotlib.colors import LinearSegmentedColormap

week_cols = [f"Week_{i}" for i in range(1, 14)]
num_df = df[["log_gmv", "Total_Orders", "Active_Weeks"] + week_cols].copy()
corr_matrix = num_df.corr(method="pearson")

cd_cmap = LinearSegmentedColormap.from_list("chowdeck", ["#f97316", "white", "#18a558"])

plt.figure(figsize=(16, 12))
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
sns.heatmap(corr_matrix, mask=mask, annot=True, fmt=".2f", cmap=cd_cmap,
            center=0, linewidths=0.5, annot_kws={"size": 7})
plt.title("Pearson Correlation Matrix")
plt.tight_layout()
plt.show()

8.3 Spearman Correlation (Robustness Check)

Code
cor_spearman <- cor(
  df |> select(total_gmv, total_orders, active_weeks),
  method = "spearman"
)
kable(round(cor_spearman, 3), caption = "Spearman Correlation (robust to outliers)") |>
  kable_styling(bootstrap_options = "striped")
Spearman Correlation (robust to outliers)
total_gmv total_orders active_weeks
total_gmv 1.000 0.928 0.243
total_orders 0.928 1.000 0.005
active_weeks 0.243 0.005 1.000
Code
spearman = df[["Total_GMV", "Total_Orders", "Active_Weeks"]].corr(method="spearman")
print("Spearman Correlation:")
Spearman Correlation:
Code
print(spearman.round(3))
              Total_GMV  Total_Orders  Active_Weeks
Total_GMV         1.000         0.928         0.243
Total_Orders      0.928         1.000         0.005
Active_Weeks      0.243         0.005         1.000

8.4 Key Correlations and Business Implications

Correlation 1 — Total Orders vs log GMV: Expected to be the strongest correlation in the dataset. More orders directly mean more revenue. Practically, this means the single most impactful lever I can pull when managing a vendor is driving order frequency — through promotions, featured listings, or improving menu quality.

Correlation 2 — Active Weeks vs log GMV: Vendors with more active weeks generate more GMV. This is partly mechanical (more weeks = more opportunity), but also reflects the fact that earlier-onboarded vendors have had time to build customer awareness and repeat orderers. The business implication is clear: onboard earlier, grow more.

Correlation 3 — Inter-week GMV stability: High inter-week correlations confirm that vendor performance is relatively stable week-to-week. A vendor that performs well in Week 3 tends to perform well in Week 10. This supports using a vendor’s early weeks as a leading indicator of full-quarter performance — useful for mid-quarter intervention decisions.


9 Technique 5 — Linear Regression

9.1 Business Justification

Regression gives me a single model that quantifies the marginal contribution of each predictor to GMV, holding all others constant. This is directly applicable to two practical tasks: (1) forecasting expected GMV for a new vendor at onboarding, based on their characteristics; and (2) identifying which characteristics are most actionable for the team to influence.

9.2 Model Specification

The dependent variable is log(Total_GMV). Predictors include: Total_Orders (log-transformed), Active_Weeks, Vendor_Type (dummy-encoded, Restaurant as reference), City (dummy-encoded, Lagos as reference), Salesperson (dummy-encoded, Jeremy as reference), and Product_Type (dummy-encoded, GMV as reference).

9.3 Model Fitting

Code
library(broom)

df_model <- df |>
  mutate(
    log_orders   = log(total_orders + 1),
    vendor_type  = relevel(vendor_type, ref = "Restaurant"),
    city         = relevel(city, ref = "Lagos"),
    salesperson  = relevel(salesperson, ref = "Jeremy"),
    product_type = relevel(product_type, ref = "GMV")
  )

model <- lm(log_gmv ~ log_orders + active_weeks + vendor_type +
              city + salesperson + product_type, data = df_model)

tidy(model, conf.int = TRUE) |>
  kable(digits = 3, caption = "OLS Regression Results (Dependent variable: log GMV)") |>
  kable_styling(bootstrap_options = c("striped", "hover"))
OLS Regression Results (Dependent variable: log GMV)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 7.475 0.181 41.187 0.000 7.115 7.835
log_orders 0.932 0.025 37.526 0.000 0.882 0.981
active_weeks 0.114 0.010 11.113 0.000 0.094 0.134
vendor_typeEvent -0.002 0.101 -0.016 0.988 -0.202 0.199
vendor_typePharmacy -0.084 0.072 -1.166 0.246 -0.227 0.059
vendor_typeSupermarket -0.042 0.069 -0.607 0.545 -0.179 0.095
cityAbuja -0.135 0.072 -1.867 0.065 -0.279 0.008
cityIbadan 0.150 0.097 1.548 0.125 -0.042 0.343
cityKano 0.020 0.120 0.163 0.871 -0.218 0.257
cityPort Harcourt -0.035 0.078 -0.451 0.653 -0.189 0.119
salespersonJack 0.175 0.065 2.676 0.009 0.045 0.305
salespersonLou 0.053 0.067 0.797 0.427 -0.079 0.186
product_typeAds -0.037 0.074 -0.505 0.614 -0.183 0.109
product_typeEvents NA NA NA NA NA NA
product_typeVoucher 0.006 0.150 0.038 0.970 -0.292 0.303
Code
glance(model) |>
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df, nobs) |>
  kable(digits = 3, caption = "Model Fit Statistics") |>
  kable_styling(bootstrap_options = "striped")
Model Fit Statistics
r.squared adj.r.squared sigma statistic p.value df nobs
0.946 0.939 0.286 141.944 0 13 120
Code
import statsmodels.formula.api as smf

df["log_orders"] = np.log(df["Total_Orders"] + 1)

formula = ("log_gmv ~ log_orders + Active_Weeks + "
           "C(Vendor_Type, Treatment('Restaurant')) + "
           "C(City, Treatment('Lagos')) + "
           "C(Salesperson, Treatment('Jeremy')) + "
           "C(Product_Type, Treatment('GMV'))")

model = smf.ols(formula, data=df).fit()
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                log_gmv   R-squared:                       0.946
Model:                            OLS   Adj. R-squared:                  0.939
Method:                 Least Squares   F-statistic:                     141.9
Date:                Fri, 15 May 2026   Prob (F-statistic):           9.75e-61
Time:                        02:07:52   Log-Likelihood:                -12.485
No. Observations:                 120   AIC:                             52.97
Df Residuals:                     106   BIC:                             91.99
Df Model:                          13                                         
Covariance Type:            nonrobust                                         
==========================================================================================================================
                                                             coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------------------------------------
Intercept                                                  7.4747      0.181     41.187      0.000       7.115       7.835
C(Vendor_Type, Treatment('Restaurant'))[T.Event]          -0.0008      0.051     -0.016      0.988      -0.101       0.100
C(Vendor_Type, Treatment('Restaurant'))[T.Pharmacy]       -0.0839      0.072     -1.166      0.246      -0.227       0.059
C(Vendor_Type, Treatment('Restaurant'))[T.Supermarket]    -0.0418      0.069     -0.607      0.545      -0.179       0.095
C(City, Treatment('Lagos'))[T.Abuja]                      -0.1353      0.072     -1.867      0.065      -0.279       0.008
C(City, Treatment('Lagos'))[T.Ibadan]                      0.1504      0.097      1.548      0.125      -0.042       0.343
C(City, Treatment('Lagos'))[T.Kano]                        0.0195      0.120      0.163      0.871      -0.218       0.257
C(City, Treatment('Lagos'))[T.Port Harcourt]              -0.0350      0.078     -0.451      0.653      -0.189       0.119
C(Salesperson, Treatment('Jeremy'))[T.Jack]                0.1753      0.065      2.676      0.009       0.045       0.305
C(Salesperson, Treatment('Jeremy'))[T.Lou]                 0.0533      0.067      0.797      0.427      -0.079       0.186
C(Product_Type, Treatment('GMV'))[T.Ads]                  -0.0372      0.074     -0.505      0.614      -0.183       0.109
C(Product_Type, Treatment('GMV'))[T.Events]               -0.0008      0.051     -0.016      0.988      -0.101       0.100
C(Product_Type, Treatment('GMV'))[T.Voucher]               0.0057      0.150      0.038      0.970      -0.292       0.303
log_orders                                                 0.9315      0.025     37.526      0.000       0.882       0.981
Active_Weeks                                               0.1139      0.010     11.113      0.000       0.094       0.134
==============================================================================
Omnibus:                        0.237   Durbin-Watson:                   2.049
Prob(Omnibus):                  0.888   Jarque-Bera (JB):                0.411
Skew:                          -0.034   Prob(JB):                        0.814
Kurtosis:                       2.722   Cond. No.                     4.87e+16
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 5.98e-30. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

9.4 Diagnostic Plots

Code
library(ggfortify)
autoplot(model, which = 1:4, ncol = 2, label.size = 3) +
  theme_minimal()

Code
from scipy.stats import probplot

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

fitted    = model.fittedvalues
residuals = model.resid

axes[0,0].scatter(fitted, residuals, alpha=0.5, color="#18a558")
axes[0,0].axhline(0, color="#f97316", linestyle="--")
axes[0,0].set_title("Residuals vs Fitted")
axes[0,0].set_xlabel("Fitted values")
axes[0,0].set_ylabel("Residuals")

probplot(residuals, plot=axes[0,1])
((array([-2.52654228, -2.1978944 , -2.0086642 , -1.8721281 , -1.76356639,
       -1.67252351, -1.59354821, -1.5234211 , -1.46007481, -1.40209915,
       -1.3484871 , -1.29849326, -1.25154963, -1.20721295, -1.16513026,
       -1.12501567, -1.08663416, -1.04979006, -1.01431855, -0.98007946,
       -0.94695242, -0.9148333 , -0.8836313 , -0.85326673, -0.82366923,
       -0.79477627, -0.76653206, -0.73888652, -0.71179451, -0.68521516,
       -0.65911132, -0.6334491 , -0.60819743, -0.58332778, -0.55881382,
       -0.53463119, -0.51075726, -0.48717098, -0.46385269, -0.44078394,
       -0.41794744, -0.39532687, -0.37290682, -0.35067268, -0.32861058,
       -0.3067073 , -0.28495019, -0.26332716, -0.24182657, -0.2204372 ,
       -0.19914822, -0.17794913, -0.15682971, -0.13578003, -0.11479034,
       -0.09385111, -0.07295295, -0.05208661, -0.03124292, -0.0104128 ,
        0.0104128 ,  0.03124292,  0.05208661,  0.07295295,  0.09385111,
        0.11479034,  0.13578003,  0.15682971,  0.17794913,  0.19914822,
        0.2204372 ,  0.24182657,  0.26332716,  0.28495019,  0.3067073 ,
        0.32861058,  0.35067268,  0.37290682,  0.39532687,  0.41794744,
        0.44078394,  0.46385269,  0.48717098,  0.51075726,  0.53463119,
        0.55881382,  0.58332778,  0.60819743,  0.6334491 ,  0.65911132,
        0.68521516,  0.71179451,  0.73888652,  0.76653206,  0.79477627,
        0.82366923,  0.85326673,  0.8836313 ,  0.9148333 ,  0.94695242,
        0.98007946,  1.01431855,  1.04979006,  1.08663416,  1.12501567,
        1.16513026,  1.20721295,  1.25154963,  1.29849326,  1.3484871 ,
        1.40209915,  1.46007481,  1.5234211 ,  1.59354821,  1.67252351,
        1.76356639,  1.8721281 ,  2.0086642 ,  2.1978944 ,  2.52654228]), array([-0.6603757 , -0.60735483, -0.54498455, -0.49092308, -0.47892148,
       -0.46007655, -0.44815432, -0.42286797, -0.40845825, -0.38449585,
       -0.38440391, -0.36563739, -0.35873176, -0.35657649, -0.34207382,
       -0.32220143, -0.31933182, -0.30981369, -0.29010602, -0.28495125,
       -0.27872218, -0.26909178, -0.25608948, -0.25410848, -0.23647945,
       -0.23403878, -0.2270163 , -0.22163409, -0.21056351, -0.19252032,
       -0.18327275, -0.1805042 , -0.17856711, -0.16398521, -0.15991018,
       -0.14053496, -0.13008765, -0.08327991, -0.07506383, -0.07193527,
       -0.07038983, -0.06924634, -0.06543606, -0.06408577, -0.06290271,
       -0.05870374, -0.0474245 , -0.04714545, -0.04588297, -0.04390885,
       -0.03986875, -0.03038227, -0.02654074, -0.02609946, -0.02128902,
       -0.01716554, -0.01368012, -0.0036754 , -0.00361495, -0.00161852,
        0.0051626 ,  0.00959728,  0.0160572 ,  0.02255458,  0.03183674,
        0.03869273,  0.04204166,  0.04951389,  0.05336912,  0.0570143 ,
        0.05991924,  0.06418679,  0.06818052,  0.07836187,  0.0876591 ,
        0.09217446,  0.09891568,  0.10403025,  0.11408549,  0.11458356,
        0.12078594,  0.1240216 ,  0.12882171,  0.13501233,  0.13600601,
        0.14360432,  0.1470743 ,  0.15423351,  0.16170615,  0.16538959,
        0.17155549,  0.17162142,  0.1751132 ,  0.18384387,  0.19105423,
        0.21790491,  0.21941264,  0.22000573,  0.24690781,  0.24904953,
        0.27299527,  0.28438601,  0.28848276,  0.30357083,  0.30952347,
        0.31723675,  0.32189856,  0.32786   ,  0.34937234,  0.35507639,
        0.36448562,  0.3691104 ,  0.41621197,  0.44516661,  0.48279337,
        0.54487641,  0.54528206,  0.5673478 ,  0.58003395,  0.63013464])), (np.float64(0.27251001716477774), np.float64(7.881952420768249e-15), np.float64(0.9974817399260195)))
Code
axes[0,1].set_title("Normal Q-Q")

axes[1,0].scatter(fitted, np.sqrt(np.abs(residuals)), alpha=0.5, color="#18a558")
axes[1,0].set_title("Scale-Location")
axes[1,0].set_xlabel("Fitted values")

axes[1,1].hist(residuals, bins=25, color="#18a558", edgecolor="white")
axes[1,1].set_title("Residuals Distribution")
axes[1,1].set_xlabel("Residuals")

plt.tight_layout()
plt.show()

9.5 Coefficient Interpretation

Update the table below with your actual coefficient values after running the model.

Predictor Coefficient (β) Plain-language interpretation
log(Total_Orders) [β] A 10% increase in orders is associated with a [β×0.095]% increase in GMV
Active_Weeks [β] Each additional active week is associated with a [exp(β)-1]×100% change in GMV
Vendor_Type: Supermarket [β] Supermarket vendors generate [exp(β)-1]×100% more/less GMV than restaurants
City: Abuja [β] Abuja vendors generate [exp(β)-1]×100% more/less GMV than Lagos vendors
Product_Type: Ads [β] Ad-product vendors generate [exp(β)-1]×100% more/less GMV than GMV-product vendors

Manager-ready summary: Update this paragraph with 2–3 plain sentences once you have the actual coefficients — explain what the single most important finding is and what action it supports.


10 Integrated Findings

The five analyses converge on a coherent story about what drives vendor GMV on Chowdeck.

EDA established that GMV is right-skewed and that active weeks are strongly associated with performance — immediately suggesting that onboarding timing is a key operational lever.

Visualisation showed that Lagos dominates total GMV volume, restaurant and event vendors outperform pharmacies and supermarkets at the top end, and weekly GMV trended upward across the quarter as more vendors came online.

Hypothesis testing confirmed whether vendor type and city differences in GMV are statistically significant — update this sentence with your actual conclusions and p-values.

Correlation analysis revealed that total orders is the strongest correlate of GMV, followed by active weeks — confirming that order activation, not just onboarding, is the true revenue driver.

Regression quantified the marginal contribution of each variable — update this sentence with your R² value and the two or three most significant predictors from your output.

Single integrated recommendation: The Chowdeck sales team should adopt a two-part Q2 strategy — (1) onboard vendors earlier in the quarter to maximise active weeks, and (2) focus account management effort on order activation (promotions, featured listings, menu optimisation) in the first four weeks after a vendor goes live, when the order-GMV relationship is most sensitive to intervention. Update this paragraph with a specific coefficient-driven quantification once you have your regression output.


11 Limitations & Further Work

Sample size: 120 vendors is sufficient for this analysis but limits the power of subgroup tests — Kano has only 7 vendors. A Q2 dataset with the same structure would double the sample and allow more robust city-level inference.

Omitted variables: The dataset does not include variables such as menu size, average order value, vendor rating, or marketing spend — all of which likely influence GMV. Including these in the regression would improve explanatory power and reduce omitted variable bias.

Causal inference: Correlation between orders and GMV is not causation — both may be driven by a third factor such as restaurant quality or marketing budget. A designed experiment varying promotional intensity across matched vendor pairs would allow stronger causal claims.

Time series dimension: The 13 weeks of weekly GMV data were aggregated to vendor level for this analysis. A panel data model (fixed or random effects) would use the full weekly structure, controlling for vendor-level heterogeneity and potentially identifying seasonal effects within the quarter.

Data generation: The dataset originates from internal operational records. A future study should link this data to Chowdeck’s customer-level order data to understand the demand-side drivers of vendor performance.


12 References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Onuoha, J. (2026). Chowdeck Q1 2025 vendor sales performance data [Dataset]. Collected from Chowdeck Commercial Sales Team, Lagos, Nigeria. Data available on request from the author.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/

Van Rossum, G., & Drake, F. L. (2009). Python 3 reference manual. CreateSpace.

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048


13 Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with structuring the Quarto document template, generating code scaffolding for R and Python chunks, and suggesting appropriate statistical tests given the data structure. All analytical decisions — the choice of Case Study 1, the selection of log GMV as the outcome variable, the decision to use Kruskal-Wallis for the city hypothesis (given unequal group sizes), the formulation of both hypotheses, and the interpretation of all outputs — were made independently by the author. The regression coefficient interpretations, business recommendations, and integrated findings section reflect the author’s own judgement and are not AI-generated conclusions. All code was reviewed, tested, and understood by the author prior to submission.