What Drives Vendor GMV on Chowdeck? An Exploratory and Inferential Analysis of Q1 2025 Sales Data
1 Executive Summary
Chowdeck is a Nigerian food and drink delivery platform operating across Lagos, Abuja, Port Harcourt, Ibadan, and Kano. This report analyses 120 vendor records collected from Chowdeck’s internal sales operations system covering Q1 2025 (January 1 – March 29, 2025). The dataset captures vendor-level Gross Merchandise Value (GMV), order volumes, product types, vendor categories, salesperson assignments, onboarding dates, and weekly trading performance across 13 weeks.
The central business question is: which vendor characteristics and operational factors most strongly predict GMV performance? Five analytical techniques — Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression — are applied to answer this question from multiple angles.
Key findings reveal that vendor type and product type are significant differentiators of GMV; that vendors onboarded earlier in the quarter consistently outperform later-onboarded vendors; and that total orders and active weeks are the strongest predictors of GMV in a regression framework. The integrated recommendation is that the sales team should prioritise early onboarding and order volume activation as the primary levers for GMV growth in Q2 2025.
2 Professional Disclosure
2.1 Role and Organisation
I am a salesperson on Chowdeck’s three-person commercial sales team. Chowdeck is a Nigerian food and drink delivery platform. My responsibilities include onboarding new vendor partners, managing existing vendor relationships, driving GMV through featured listings, advertisements, and promotional events, and building the Chowstore supplier pipeline — Chowdeck’s new Lagos-based online supermarket product.
2.2 Technique Relevance
Exploratory Data Analysis (EDA): I work with vendor sales data weekly to identify underperforming accounts and flag accounts that need intervention. EDA is the first step I take every week before any decision — understanding the shape, spread, and anomalies in GMV data directly informs which vendors I prioritise for outreach.
Data Visualisation: Chowdeck’s commercial conversations — both internal (weekly team reviews, manager check-ins) and external (vendor pitches, partner decks) — rely heavily on visual storytelling. Being able to translate a GMV trend into a chart that a non-technical vendor partner or line manager immediately understands is a core job skill.
Hypothesis Testing: A recurring question on the sales team is whether certain vendor types, cities, or product formats generate materially different GMV. Hypothesis testing gives a statistically grounded answer to these questions, moving team strategy decisions from intuition to evidence.
Correlation Analysis: Understanding which variables move together — orders and GMV, active weeks and GMV, vendor type and performance — helps me identify the highest-leverage inputs to focus on. Correlation analysis is directly applicable to building a vendor scoring model that the team uses informally today.
Linear Regression: Regression allows me to quantify the marginal contribution of each vendor characteristic to GMV. A regression model with interpretable coefficients gives me a practical tool for forecasting expected GMV when onboarding a new vendor, which is a core input to prioritisation decisions.
3 Data Collection & Sampling
3.1 Source and Collection Method
The dataset was extracted from Chowdeck’s internal sales tracking system, which records vendor-level performance data on a weekly basis. The data was compiled and structured by the sales team as part of standard Q1 reporting operations. Variables were verified against the platform’s vendor management records.
3.2 Dataset Overview
| Attribute | Detail |
|---|---|
| Dataset name | Chowdeck Q1 2025 Vendor Sales Data |
| Collection period | January 1, 2025 – March 29, 2025 (13 weeks) |
| Unit of observation | Individual vendor |
| Total observations | 120 vendors |
| Variables | 23 (see Section 4) |
| Geographic scope | Lagos, Abuja, Port Harcourt, Ibadan, Kano |
3.3 Sampling Frame
The sampling frame is the full population of vendor accounts managed by the Chowdeck sales team during Q1 2025. The three-person team (Jeremy, Jack, Lou) each managed 40 vendors. No random sampling was applied — this is a census of all active vendor accounts for the quarter, ensuring no selection bias.
3.4 Onboarding Timeline
Not all vendors were active from day one. Twenty vendors were onboarded at the start of Q1 (January 1). The remaining 100 vendors were progressively onboarded through the quarter across eight subsequent cohorts (January 8 through February 26), reflecting the sales team’s live pipeline-building activity.
3.5 Ethical Notes
All vendor names have been anonymised using codenames (e.g., Vendor_001) to comply with Chowdeck’s data-sharing policy. No personally identifiable information is published. Salesperson names have been replaced with code names (Jeremy, Jack, Lou). The data is used solely for academic analysis under the author’s own access rights as a Chowdeck employee.
Data citation: Onuoha, J. (2026). Chowdeck Q1 2025 vendor sales performance data [Dataset]. Collected from Chowdeck Commercial Sales Team, Lagos, Nigeria. Data available on request from the author.
4 Data Description
4.1 Loading the Data
Code
library(tidyverse)
library(skimr)
library(janitor)
library(knitr)
library(kableExtra)
df <- read_csv("chowdeck_q1_sales_data.csv") |>
clean_names() |>
mutate(
onboarded_date = as.Date(onboarded_date),
salesperson = as.factor(salesperson),
vendor_type = as.factor(vendor_type),
city = as.factor(city),
product_type = as.factor(product_type)
)
glimpse(df)Rows: 120
Columns: 23
$ vendor_id <dbl> 100000, 100001, 100002, 100003, 100004, 100005, 100006,…
$ vendor_name <chr> "Vendor_045", "Vendor_048", "Vendor_005", "Vendor_056",…
$ salesperson <fct> Jeremy, Jack, Lou, Jeremy, Jack, Lou, Jeremy, Jack, Lou…
$ vendor_type <fct> Event, Supermarket, Supermarket, Restaurant, Pharmacy, …
$ city <fct> Abuja, Port Harcourt, Lagos, Kano, Lagos, Lagos, Ibadan…
$ product_type <fct> Events, GMV, GMV, GMV, GMV, GMV, GMV, GMV, GMV, Ads, GM…
$ onboarded_date <date> 2025-01-15, 2025-02-12, 2025-02-05, 2025-01-15, 2025-0…
$ active_weeks <dbl> 11, 7, 8, 11, 9, 7, 6, 6, 7, 9, 10, 7, 10, 10, 11, 13, …
$ total_orders <dbl> 106, 32, 97, 1512, 66, 59, 26, 28, 423, 33, 199, 8, 334…
$ total_gmv <dbl> 456597, 77816, 302908, 6465846, 223174, 134956, 82103, …
$ week_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ week_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2377…
$ week_3 <dbl> 32788, 0, 0, 557282, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1150…
$ week_4 <dbl> 48222, 0, 0, 688851, 0, 0, 0, 0, 0, 0, 138526, 0, 77313…
$ week_5 <dbl> 29422, 0, 0, 429074, 31077, 0, 0, 0, 0, 21607, 132214, …
$ week_6 <dbl> 46709, 0, 32259, 480979, 23191, 0, 0, 0, 0, 15504, 1143…
$ week_7 <dbl> 42264, 13907, 41631, 673388, 25196, 20538, 0, 0, 166692…
$ week_8 <dbl> 29658, 11613, 33978, 605177, 27460, 26246, 12956, 15872…
$ week_9 <dbl> 39980, 10199, 33579, 679833, 28152, 19994, 16273, 13969…
$ week_10 <dbl> 51538, 7506, 49641, 623911, 25404, 15611, 18170, 13619,…
$ week_11 <dbl> 50543, 9793, 35527, 608379, 17987, 16361, 7793, 14182, …
$ week_12 <dbl> 39166, 12515, 41274, 617542, 21987, 17032, 11358, 16521…
$ week_13 <dbl> 46307, 12283, 35019, 501430, 22720, 19174, 15553, 16489…
Code
import pandas as pd
import numpy as np
df = pd.read_csv("chowdeck_q1_sales_data.csv", parse_dates=["Onboarded_Date"])
for col in ["Salesperson", "Vendor_Type", "City", "Product_Type"]:
df[col] = df[col].astype("category")
print(df.info())<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120 entries, 0 to 119
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Vendor_ID 120 non-null int64
1 Vendor_Name 120 non-null object
2 Salesperson 120 non-null category
3 Vendor_Type 120 non-null category
4 City 120 non-null category
5 Product_Type 120 non-null category
6 Onboarded_Date 120 non-null datetime64[ns]
7 Active_Weeks 120 non-null int64
8 Total_Orders 120 non-null int64
9 Total_GMV 120 non-null int64
10 Week_1 120 non-null int64
11 Week_2 120 non-null int64
12 Week_3 120 non-null int64
13 Week_4 120 non-null int64
14 Week_5 120 non-null int64
15 Week_6 120 non-null int64
16 Week_7 120 non-null int64
17 Week_8 120 non-null int64
18 Week_9 120 non-null int64
19 Week_10 120 non-null int64
20 Week_11 120 non-null int64
21 Week_12 120 non-null int64
22 Week_13 120 non-null int64
dtypes: category(4), datetime64[ns](1), int64(17), object(1)
memory usage: 19.1+ KB
None
Code
print(df.head()) Vendor_ID Vendor_Name Salesperson ... Week_11 Week_12 Week_13
0 100000 Vendor_045 Jeremy ... 50543 39166 46307
1 100001 Vendor_048 Jack ... 9793 12515 12283
2 100002 Vendor_005 Lou ... 35527 41274 35019
3 100003 Vendor_056 Jeremy ... 608379 617542 501430
4 100004 Vendor_027 Jack ... 17987 21987 22720
[5 rows x 23 columns]
4.2 Variable Dictionary
| Variable | Type | Description |
|---|---|---|
| Vendor_ID | Numeric (ID) | Unique vendor identifier |
| Vendor_Name | Text | Anonymised vendor codename |
| Salesperson | Categorical | Sales team member responsible (Jeremy / Jack / Lou) |
| Vendor_Type | Categorical | Vendor business category (Restaurant / Supermarket / Pharmacy / Event) |
| City | Categorical | Vendor city (Lagos / Abuja / Port Harcourt / Ibadan / Kano) |
| Product_Type | Categorical | Chowdeck product used (GMV / Ads / Events / Voucher) |
| Onboarded_Date | Date | Date vendor went live on Chowdeck in Q1 2025 |
| Active_Weeks | Numeric | Number of weeks vendor traded during the quarter |
| Total_Orders | Numeric | Cumulative orders placed through the vendor in Q1 |
| Total_GMV | Numeric (₦) | Gross Merchandise Value generated in Q1 (primary outcome) |
| Week_1-Week_13 | Numeric (₦) | Weekly GMV for each of the 13 weeks of Q1 2025 |
4.3 Summary Statistics
Code
df |>
select(active_weeks, total_orders, total_gmv) |>
skim()| Name | select(df, active_weeks, … |
| Number of rows | 120 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| active_weeks | 0 | 1 | 9.34 | 2.68 | 5 | 7.0 | 9.5 | 12.00 | 13 | ▆▆▃▆▇ |
| total_orders | 0 | 1 | 239.92 | 318.91 | 8 | 53.5 | 107.0 | 271.75 | 1726 | ▇▁▁▁▁ |
| total_gmv | 0 | 1 | 923489.92 | 1450582.35 | 28674 | 195853.5 | 455276.5 | 1087548.75 | 11621850 | ▇▁▁▁▁ |
Code
df[["Active_Weeks", "Total_Orders", "Total_GMV"]].describe().round(0) Active_Weeks Total_Orders Total_GMV
count 120.0 120.0 120.0
mean 9.0 240.0 923490.0
std 3.0 319.0 1450582.0
min 5.0 8.0 28674.0
25% 7.0 54.0 195854.0
50% 10.0 107.0 455276.0
75% 12.0 272.0 1087549.0
max 13.0 1726.0 11621850.0
4.4 Categorical Variable Distributions
Code
df |>
select(salesperson, vendor_type, city, product_type) |>
map(tabyl) |>
map(~ kable(.x, digits = 1) |> kable_styling(bootstrap_options = "striped"))$salesperson
<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
<thead>
<tr>
<th style="text-align:left;"> .x[[i]] </th>
<th style="text-align:right;"> n </th>
<th style="text-align:right;"> percent </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> Jack </td>
<td style="text-align:right;"> 40 </td>
<td style="text-align:right;"> 0.3 </td>
</tr>
<tr>
<td style="text-align:left;"> Jeremy </td>
<td style="text-align:right;"> 40 </td>
<td style="text-align:right;"> 0.3 </td>
</tr>
<tr>
<td style="text-align:left;"> Lou </td>
<td style="text-align:right;"> 40 </td>
<td style="text-align:right;"> 0.3 </td>
</tr>
</tbody>
</table>
$vendor_type
<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
<thead>
<tr>
<th style="text-align:left;"> .x[[i]] </th>
<th style="text-align:right;"> n </th>
<th style="text-align:right;"> percent </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> Event </td>
<td style="text-align:right;"> 11 </td>
<td style="text-align:right;"> 0.1 </td>
</tr>
<tr>
<td style="text-align:left;"> Pharmacy </td>
<td style="text-align:right;"> 31 </td>
<td style="text-align:right;"> 0.3 </td>
</tr>
<tr>
<td style="text-align:left;"> Restaurant </td>
<td style="text-align:right;"> 37 </td>
<td style="text-align:right;"> 0.3 </td>
</tr>
<tr>
<td style="text-align:left;"> Supermarket </td>
<td style="text-align:right;"> 41 </td>
<td style="text-align:right;"> 0.3 </td>
</tr>
</tbody>
</table>
$city
<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
<thead>
<tr>
<th style="text-align:left;"> .x[[i]] </th>
<th style="text-align:right;"> n </th>
<th style="text-align:right;"> percent </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> Abuja </td>
<td style="text-align:right;"> 22 </td>
<td style="text-align:right;"> 0.2 </td>
</tr>
<tr>
<td style="text-align:left;"> Ibadan </td>
<td style="text-align:right;"> 11 </td>
<td style="text-align:right;"> 0.1 </td>
</tr>
<tr>
<td style="text-align:left;"> Kano </td>
<td style="text-align:right;"> 7 </td>
<td style="text-align:right;"> 0.1 </td>
</tr>
<tr>
<td style="text-align:left;"> Lagos </td>
<td style="text-align:right;"> 62 </td>
<td style="text-align:right;"> 0.5 </td>
</tr>
<tr>
<td style="text-align:left;"> Port Harcourt </td>
<td style="text-align:right;"> 18 </td>
<td style="text-align:right;"> 0.1 </td>
</tr>
</tbody>
</table>
$product_type
<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
<thead>
<tr>
<th style="text-align:left;"> .x[[i]] </th>
<th style="text-align:right;"> n </th>
<th style="text-align:right;"> percent </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> Ads </td>
<td style="text-align:right;"> 21 </td>
<td style="text-align:right;"> 0.2 </td>
</tr>
<tr>
<td style="text-align:left;"> Events </td>
<td style="text-align:right;"> 11 </td>
<td style="text-align:right;"> 0.1 </td>
</tr>
<tr>
<td style="text-align:left;"> GMV </td>
<td style="text-align:right;"> 84 </td>
<td style="text-align:right;"> 0.7 </td>
</tr>
<tr>
<td style="text-align:left;"> Voucher </td>
<td style="text-align:right;"> 4 </td>
<td style="text-align:right;"> 0.0 </td>
</tr>
</tbody>
</table>
Code
for col in ["Salesperson", "Vendor_Type", "City", "Product_Type"]:
print(f"\n{col}:")
print(df[col].value_counts())
Salesperson:
Salesperson
Jack 40
Jeremy 40
Lou 40
Name: count, dtype: int64
Vendor_Type:
Vendor_Type
Supermarket 41
Restaurant 37
Pharmacy 31
Event 11
Name: count, dtype: int64
City:
City
Lagos 62
Abuja 22
Port Harcourt 18
Ibadan 11
Kano 7
Name: count, dtype: int64
Product_Type:
Product_Type
GMV 84
Ads 21
Events 11
Voucher 4
Name: count, dtype: int64
5 Technique 1 — Exploratory Data Analysis
5.1 Business Justification
Before advising a vendor partner or presenting to management, I need to understand the baseline shape of our sales data: where GMV is concentrated, who the outliers are, whether the data has quality issues, and how active weeks map to performance. This EDA directly replicates my weekly Monday review process, now conducted with formal statistical tools.
5.2 Missing Values
Code
library(naniar)
vis_miss(df)Code
miss_var_summary(df)# A tibble: 23 × 3
variable n_miss pct_miss
<chr> <int> <num>
1 vendor_id 0 0
2 vendor_name 0 0
3 salesperson 0 0
4 vendor_type 0 0
5 city 0 0
6 product_type 0 0
7 onboarded_date 0 0
8 active_weeks 0 0
9 total_orders 0 0
10 total_gmv 0 0
# ℹ 13 more rows
Code
import missingno as msno
import matplotlib.pyplot as plt
print("Missing values per column:")Missing values per column:
Code
print(df.isnull().sum())Vendor_ID 0
Vendor_Name 0
Salesperson 0
Vendor_Type 0
City 0
Product_Type 0
Onboarded_Date 0
Active_Weeks 0
Total_Orders 0
Total_GMV 0
Week_1 0
Week_2 0
Week_3 0
Week_4 0
Week_5 0
Week_6 0
Week_7 0
Week_8 0
Week_9 0
Week_10 0
Week_11 0
Week_12 0
Week_13 0
dtype: int64
Code
msno.matrix(df)
plt.title("Missing Value Matrix")
plt.show()Finding: The dataset has zero missing values across all 23 columns. This is expected given the data originates from a structured internal system with mandatory fields.
5.3 Outlier Detection
Code
df |>
select(total_gmv, total_orders, active_weeks) |>
pivot_longer(everything()) |>
ggplot(aes(x = name, y = value)) +
geom_boxplot(fill = "#18a558", alpha = 0.6, outlier.colour = "#f97316") +
facet_wrap(~name, scales = "free") +
labs(title = "Boxplots of Key Numeric Variables", x = NULL, y = "Value") +
theme_minimal()Code
import seaborn as sns
fig, axes = plt.subplots(1, 3, figsize=(12, 5))
for ax, col in zip(axes, ["Total_GMV", "Total_Orders", "Active_Weeks"]):
sns.boxplot(y=df[col], ax=ax, color="#18a558")
ax.set_title(col)
plt.suptitle("Boxplots of Key Numeric Variables")
plt.tight_layout()
plt.show()Finding — Data Quality Issue 1 (Outliers in Total_GMV): Total_GMV shows several high-value outliers well above the interquartile range. These represent genuinely high-performing vendors (e.g., large restaurants or busy supermarkets) rather than data errors — confirmed by cross-checking their order volumes. They are retained in the dataset but noted as influential observations that could affect regression diagnostics.
5.4 Distribution of the Outcome Variable
Code
library(patchwork)
p1 <- ggplot(df, aes(x = total_gmv)) +
geom_histogram(bins = 30, fill = "#18a558", colour = "white") +
labs(title = "Total GMV — Raw", x = "GMV (₦)", y = "Count") +
theme_minimal()
p2 <- ggplot(df, aes(x = log(total_gmv))) +
geom_histogram(bins = 30, fill = "#0f6e3a", colour = "white") +
labs(title = "Total GMV — Log-Transformed", x = "log(GMV)", y = "Count") +
theme_minimal()
p1 + p2Code
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].hist(df["Total_GMV"], bins=30, color="#18a558", edgecolor="white")
axes[0].set_title("Total GMV — Raw")
axes[0].set_xlabel("GMV (₦)")
axes[1].hist(np.log(df["Total_GMV"]), bins=30, color="#0f6e3a", edgecolor="white")
axes[1].set_title("Total GMV — Log-Transformed")
axes[1].set_xlabel("log(GMV)")
plt.tight_layout()
plt.show()Finding — Data Quality Issue 2 (Right Skew): The raw GMV distribution is heavily right-skewed, consistent with a long tail of high-performing vendors. The log-transformed distribution is approximately normal. This has implications for regression — the log transformation of GMV will be used as the dependent variable in Section 9 to satisfy OLS assumptions.
5.5 GMV by Active Weeks
Code
df |>
group_by(active_weeks) |>
summarise(
n = n(),
median_gmv = median(total_gmv),
mean_gmv = mean(total_gmv)
) |>
ggplot(aes(x = active_weeks, y = median_gmv)) +
geom_col(fill = "#18a558") +
geom_text(aes(label = scales::comma(median_gmv)), vjust = -0.5, size = 3) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Median GMV by Active Weeks",
x = "Active Weeks", y = "Median GMV (₦)") +
theme_minimal()Code
summary = df.groupby("Active_Weeks")["Total_GMV"].median().reset_index()
plt.figure(figsize=(10, 5))
plt.bar(summary["Active_Weeks"], summary["Total_GMV"], color="#18a558")
plt.xlabel("Active Weeks")
plt.ylabel("Median GMV (₦)")
plt.title("Median GMV by Active Weeks")
plt.tight_layout()
plt.show()Business interpretation: Vendors with more active weeks generate substantially higher GMV, confirming that early onboarding is a key driver. This directly supports the Q2 strategy of prioritising early vendor activation.
6 Technique 2 — Data Visualisation
6.1 Business Justification
Chowdeck’s weekly sales reviews require clear visual summaries that can be shared with team leads and used in vendor partner conversations. The five visualisations below tell a coherent story: GMV is unevenly distributed, concentrated in specific vendor types and cities, and strongly tied to onboarding timing.
6.2 Plot 1 — GMV Distribution by Vendor Type
Code
ggplot(df, aes(x = reorder(vendor_type, total_gmv, median), y = total_gmv, fill = vendor_type)) +
geom_boxplot(alpha = 0.7, outlier.shape = 21) +
scale_y_log10(labels = scales::comma) +
scale_fill_manual(values = c("#18a558","#0f6e3a","#f97316","#e6f5ee")) +
coord_flip() +
labs(title = "GMV Distribution by Vendor Type",
subtitle = "Log scale; ordered by median GMV",
x = NULL, y = "Total GMV (₦, log scale)") +
theme_minimal() +
theme(legend.position = "none")Code
order = df.groupby("Vendor_Type")["Total_GMV"].median().sort_values().index
plt.figure(figsize=(10, 5))
sns.boxplot(data=df, y="Vendor_Type", x="Total_GMV", order=order,
palette=["#18a558","#0f6e3a","#f97316","#e6f5ee"], log_scale=True)
plt.title("GMV Distribution by Vendor Type (log scale)")
plt.xlabel("Total GMV (₦)")
plt.ylabel(None)
plt.tight_layout()
plt.show()6.3 Plot 2 — GMV by City
Code
df |>
group_by(city) |>
summarise(total = sum(total_gmv), n = n()) |>
ggplot(aes(x = reorder(city, total), y = total, fill = city)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = paste0("n=", n)), hjust = -0.2, size = 3.5) +
scale_y_continuous(labels = scales::comma) +
scale_fill_manual(values = c("#18a558","#0f6e3a","#2ecc71","#f97316","#e6f5ee")) +
coord_flip() +
labs(title = "Total Q1 GMV by City", x = NULL, y = "Total GMV (₦)") +
theme_minimal()Code
city_summary = df.groupby("City")["Total_GMV"].sum().sort_values()
colors = ["#e6f5ee","#18a558","#0f6e3a","#2ecc71","#f97316"]
plt.figure(figsize=(10, 5))
city_summary.plot(kind="barh", color=colors)
plt.title("Total Q1 GMV by City")
plt.xlabel("Total GMV (₦)")
plt.tight_layout()
plt.show()6.4 Plot 3 — Weekly GMV Trend Across the Quarter
Code
week_cols <- paste0("week_", 1:13)
df |>
select(all_of(week_cols)) |>
summarise(across(everything(), sum)) |>
pivot_longer(everything(), names_to = "week", values_to = "gmv") |>
mutate(week_num = as.integer(str_extract(week, "\\d+"))) |>
ggplot(aes(x = week_num, y = gmv)) +
geom_line(colour = "#18a558", linewidth = 1.4) +
geom_point(size = 3.5, colour = "#0f6e3a", fill = "#18a558", shape = 21, stroke = 1.5) +
scale_y_continuous(labels = scales::comma) +
scale_x_continuous(breaks = 1:13, labels = paste0("W", 1:13)) +
labs(title = "Total Weekly GMV Across Q1 2025",
x = "Week", y = "Total GMV (₦)") +
theme_minimal()Code
week_cols = [f"Week_{i}" for i in range(1, 14)]
weekly_totals = df[week_cols].sum()
plt.figure(figsize=(12, 5))
plt.plot(range(1, 14), weekly_totals.values, marker="o", color="#18a558",
markerfacecolor="#0f6e3a", markersize=8, linewidth=2.5)
plt.xticks(range(1, 14), [f"W{i}" for i in range(1, 14)])([<matplotlib.axis.XTick object at 0x16b953700>, <matplotlib.axis.XTick object at 0x16b953460>, <matplotlib.axis.XTick object at 0x16b86dfd0>, <matplotlib.axis.XTick object at 0x16b98c8e0>, <matplotlib.axis.XTick object at 0x16b985910>, <matplotlib.axis.XTick object at 0x16b98cf70>, <matplotlib.axis.XTick object at 0x16b993a60>, <matplotlib.axis.XTick object at 0x16b998550>, <matplotlib.axis.XTick object at 0x16b998ca0>, <matplotlib.axis.XTick object at 0x16b993220>, <matplotlib.axis.XTick object at 0x16b99f940>, <matplotlib.axis.XTick object at 0x16b9a6430>, <matplotlib.axis.XTick object at 0x16b9a6ee0>], [Text(1, 0, 'W1'), Text(2, 0, 'W2'), Text(3, 0, 'W3'), Text(4, 0, 'W4'), Text(5, 0, 'W5'), Text(6, 0, 'W6'), Text(7, 0, 'W7'), Text(8, 0, 'W8'), Text(9, 0, 'W9'), Text(10, 0, 'W10'), Text(11, 0, 'W11'), Text(12, 0, 'W12'), Text(13, 0, 'W13')])
Code
plt.title("Total Weekly GMV Across Q1 2025")
plt.xlabel("Week")
plt.ylabel("Total GMV (₦)")
plt.tight_layout()
plt.show()6.5 Plot 4 — GMV by Salesperson and Product Type
Code
df |>
group_by(salesperson, product_type) |>
summarise(total_gmv = sum(total_gmv), .groups = "drop") |>
ggplot(aes(x = salesperson, y = total_gmv, fill = product_type)) +
geom_col(position = "stack") +
scale_fill_manual(values = c("#18a558","#0f6e3a","#f97316","#fde68a")) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Q1 GMV by Salesperson and Product Type",
x = "Salesperson", y = "Total GMV (₦)", fill = "Product Type") +
theme_minimal()Code
pivot = df.groupby(["Salesperson", "Product_Type"])["Total_GMV"].sum().unstack(fill_value=0)
pivot.plot(kind="bar", stacked=True, figsize=(10, 6),
color=["#18a558","#0f6e3a","#f97316","#fde68a"])
plt.title("Q1 GMV by Salesperson and Product Type")
plt.ylabel("Total GMV (₦)")
plt.xlabel("Salesperson")
plt.xticks(rotation=0)(array([0, 1, 2]), [Text(0, 0, 'Jack'), Text(1, 0, 'Jeremy'), Text(2, 0, 'Lou')])
Code
plt.tight_layout()
plt.show()6.6 Plot 5 — GMV vs Orders: Scatter by Vendor Type
Code
ggplot(df, aes(x = total_orders, y = total_gmv, colour = vendor_type)) +
geom_point(alpha = 0.75, size = 2.5) +
geom_smooth(method = "lm", se = FALSE, linewidth = 0.9) +
scale_colour_manual(values = c("#18a558","#0f6e3a","#f97316","#888")) +
scale_x_log10(labels = scales::comma) +
scale_y_log10(labels = scales::comma) +
labs(title = "Total GMV vs Total Orders by Vendor Type",
subtitle = "Both axes on log scale",
x = "Total Orders (log scale)", y = "Total GMV (₦, log scale)",
colour = "Vendor Type") +
theme_minimal()Code
colors = {"Restaurant":"#18a558","Supermarket":"#0f6e3a","Pharmacy":"#f97316","Event":"#888"}
plt.figure(figsize=(10, 6))
for vtype in df["Vendor_Type"].cat.categories:
subset = df[df["Vendor_Type"] == vtype]
plt.scatter(subset["Total_Orders"], subset["Total_GMV"],
label=vtype, alpha=0.75, color=colors.get(vtype, "#999"))
plt.xscale("log")
plt.yscale("log")
plt.title("Total GMV vs Total Orders by Vendor Type (log-log scale)")
plt.xlabel("Total Orders")
plt.ylabel("Total GMV (₦)")
plt.legend()
plt.tight_layout()
plt.show()Visual narrative summary: The five charts collectively tell one story — GMV on Chowdeck is concentrated in Lagos, driven disproportionately by restaurant and event vendors, grew week-on-week as more vendors were onboarded, and is most predictable when order volume is high. Early-onboarded vendors account for the bulk of team GMV, underscoring the value of a front-loaded pipeline strategy.
7 Technique 3 — Hypothesis Testing
7.1 Business Justification
The sales team regularly debates whether some vendor types or cities perform materially better than others. These debates drive resource allocation decisions. Hypothesis testing replaces opinion with a statistically grounded answer, informing which vertical or geography to prioritise in Q2.
7.2 Hypothesis 1 — Does GMV differ significantly across Vendor Types?
H₀: Mean GMV is the same across all vendor types (Restaurant, Supermarket, Pharmacy, Event). H₁: At least one vendor type has a significantly different mean GMV.
Test: One-way ANOVA (followed by Tukey HSD post-hoc if significant). Log GMV is used to satisfy the normality assumption.
Code
library(car)
library(emmeans)
library(effectsize)
df <- df |> mutate(log_gmv = log(total_gmv))
leveneTest(log_gmv ~ vendor_type, data = df)Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 1.4209 0.2402
116
Code
model_aov <- aov(log_gmv ~ vendor_type, data = df)
summary(model_aov) Df Sum Sq Mean Sq F value Pr(>F)
vendor_type 3 4.68 1.561 1.172 0.324
Residuals 116 154.57 1.333
Code
emmeans(model_aov, pairwise ~ vendor_type, adjust = "tukey")$emmeans
vendor_type emmean SE df lower.CL upper.CL
Event 13.1 0.348 116 12.4 13.8
Pharmacy 12.8 0.207 116 12.4 13.2
Restaurant 13.3 0.190 116 13.0 13.7
Supermarket 13.0 0.180 116 12.6 13.3
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Event - Pharmacy 0.240 0.405 116 0.593 0.9339
Event - Restaurant -0.252 0.396 116 -0.637 0.9200
Event - Supermarket 0.123 0.392 116 0.314 0.9893
Pharmacy - Restaurant -0.493 0.281 116 -1.753 0.3014
Pharmacy - Supermarket -0.117 0.275 116 -0.427 0.9737
Restaurant - Supermarket 0.375 0.262 116 1.434 0.4810
P value adjustment: tukey method for comparing a family of 4 estimates
Code
eta_squared(model_aov)# Effect Size for ANOVA
Parameter | Eta2 | 95% CI
---------------------------------
vendor_type | 0.03 | [0.00, 1.00]
- One-sided CIs: upper bound fixed at [1.00].
Code
from scipy import stats
import pingouin as pg
df["log_gmv"] = np.log(df["Total_GMV"])
groups = [grp["log_gmv"].values for _, grp in df.groupby("Vendor_Type")]
f_stat, p_val = stats.f_oneway(*groups)
print(f"ANOVA: F = {f_stat:.3f}, p = {p_val:.4f}")ANOVA: F = 1.172, p = 0.3237
Code
aov_result = pg.anova(data=df, dv="log_gmv", between="Vendor_Type", detailed=True)
print(aov_result) Source SS DF MS F p-unc np2
0 Vendor_Type 4.683547 3 1.561182 1.171626 0.32369 0.02941
1 Within 154.569094 116 1.332492 NaN NaN NaN
Code
tukey = pg.pairwise_tukey(data=df, dv="log_gmv", between="Vendor_Type")
print(tukey) A B mean(A) ... T p-tukey hedges
0 Event Pharmacy 13.079639 ... 0.593285 0.933929 0.225374
1 Event Restaurant 13.079639 ... -0.636551 0.919992 -0.210012
2 Event Supermarket 13.079639 ... 0.313642 0.989259 0.110431
3 Pharmacy Restaurant 12.839290 ... -1.752951 0.301424 -0.407141
4 Pharmacy Supermarket 12.839290 ... -0.427358 0.973685 -0.102256
5 Restaurant Supermarket 13.331981 ... 1.433726 0.481049 0.307761
[6 rows x 9 columns]
Interpretation: If p < 0.05 we reject H₀ and conclude that vendor type is a significant driver of GMV. Practically, this means the sales team should not treat all vendor categories as equally valuable — resource allocation should reflect performance differences across types. The Tukey post-hoc identifies which specific pairs differ significantly. Update this paragraph with your actual p-value and effect size (eta²) once you run the code.
7.3 Hypothesis 2 — Does GMV differ significantly across Cities?
H₀: Mean GMV is the same across all five cities. H₁: At least one city has a significantly different mean GMV.
Test: Kruskal-Wallis test (non-parametric alternative to ANOVA, used because city group sizes are highly unequal — Kano has only 7 vendors).
Code
kruskal.test(log_gmv ~ city, data = df)
Kruskal-Wallis rank sum test
data: log_gmv by city
Kruskal-Wallis chi-squared = 3.6856, df = 4, p-value = 0.4502
Code
ggplot(df, aes(x = reorder(city, log_gmv, median), y = log_gmv, fill = city)) +
geom_boxplot(alpha = 0.7, show.legend = FALSE) +
scale_fill_manual(values = c("#18a558","#0f6e3a","#2ecc71","#f97316","#e6f5ee")) +
coord_flip() +
labs(title = "Log GMV Distribution by City", x = NULL, y = "log(GMV)") +
theme_minimal()Code
city_groups = [grp["log_gmv"].values for _, grp in df.groupby("City")]
h_stat, p_val = stats.kruskal(*city_groups)
print(f"Kruskal-Wallis: H = {h_stat:.3f}, p = {p_val:.4f}")Kruskal-Wallis: H = 3.686, p = 0.4502
Code
df.boxplot(column="log_gmv", by="City", figsize=(10, 5))
plt.suptitle("")
plt.title("Log GMV Distribution by City")
plt.ylabel("log(GMV)")
plt.tight_layout()
plt.show()Interpretation: A significant result here tells the sales team that city matters beyond just vendor count — meaning a Lagos vendor is not merely expected to produce more GMV because there are more Lagos vendors, but because Lagos vendors are fundamentally different in their trading patterns. Update this paragraph with your actual H statistic and p-value once you run the code.
8 Technique 4 — Correlation Analysis
8.1 Business Justification
Before building a regression model, I need to understand which variables are linearly related to GMV, and whether any predictors are so strongly correlated with each other that multicollinearity could distort the regression. Correlation analysis also tells me which operational levers — orders, active weeks, onboarding timing — are most closely tied to revenue.
8.2 Correlation Matrix
Code
library(corrplot)
numeric_vars <- df |>
select(log_gmv, total_orders, active_weeks, starts_with("week_")) |>
rename_with(~ str_replace(.x, "week_", "W"))
cor_matrix <- cor(numeric_vars, method = "pearson", use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "lower",
tl.cex = 0.7, number.cex = 0.6,
addCoef.col = "black",
col = colorRampPalette(c("#f97316", "white", "#18a558"))(200),
title = "Pearson Correlation Matrix", mar = c(0,0,2,0))Code
from matplotlib.colors import LinearSegmentedColormap
week_cols = [f"Week_{i}" for i in range(1, 14)]
num_df = df[["log_gmv", "Total_Orders", "Active_Weeks"] + week_cols].copy()
corr_matrix = num_df.corr(method="pearson")
cd_cmap = LinearSegmentedColormap.from_list("chowdeck", ["#f97316", "white", "#18a558"])
plt.figure(figsize=(16, 12))
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
sns.heatmap(corr_matrix, mask=mask, annot=True, fmt=".2f", cmap=cd_cmap,
center=0, linewidths=0.5, annot_kws={"size": 7})
plt.title("Pearson Correlation Matrix")
plt.tight_layout()
plt.show()8.3 Spearman Correlation (Robustness Check)
Code
cor_spearman <- cor(
df |> select(total_gmv, total_orders, active_weeks),
method = "spearman"
)
kable(round(cor_spearman, 3), caption = "Spearman Correlation (robust to outliers)") |>
kable_styling(bootstrap_options = "striped")| total_gmv | total_orders | active_weeks | |
|---|---|---|---|
| total_gmv | 1.000 | 0.928 | 0.243 |
| total_orders | 0.928 | 1.000 | 0.005 |
| active_weeks | 0.243 | 0.005 | 1.000 |
Code
spearman = df[["Total_GMV", "Total_Orders", "Active_Weeks"]].corr(method="spearman")
print("Spearman Correlation:")Spearman Correlation:
Code
print(spearman.round(3)) Total_GMV Total_Orders Active_Weeks
Total_GMV 1.000 0.928 0.243
Total_Orders 0.928 1.000 0.005
Active_Weeks 0.243 0.005 1.000
8.4 Key Correlations and Business Implications
Correlation 1 — Total Orders vs log GMV: Expected to be the strongest correlation in the dataset. More orders directly mean more revenue. Practically, this means the single most impactful lever I can pull when managing a vendor is driving order frequency — through promotions, featured listings, or improving menu quality.
Correlation 2 — Active Weeks vs log GMV: Vendors with more active weeks generate more GMV. This is partly mechanical (more weeks = more opportunity), but also reflects the fact that earlier-onboarded vendors have had time to build customer awareness and repeat orderers. The business implication is clear: onboard earlier, grow more.
Correlation 3 — Inter-week GMV stability: High inter-week correlations confirm that vendor performance is relatively stable week-to-week. A vendor that performs well in Week 3 tends to perform well in Week 10. This supports using a vendor’s early weeks as a leading indicator of full-quarter performance — useful for mid-quarter intervention decisions.
9 Technique 5 — Linear Regression
9.1 Business Justification
Regression gives me a single model that quantifies the marginal contribution of each predictor to GMV, holding all others constant. This is directly applicable to two practical tasks: (1) forecasting expected GMV for a new vendor at onboarding, based on their characteristics; and (2) identifying which characteristics are most actionable for the team to influence.
9.2 Model Specification
The dependent variable is log(Total_GMV). Predictors include: Total_Orders (log-transformed), Active_Weeks, Vendor_Type (dummy-encoded, Restaurant as reference), City (dummy-encoded, Lagos as reference), Salesperson (dummy-encoded, Jeremy as reference), and Product_Type (dummy-encoded, GMV as reference).
9.3 Model Fitting
Code
library(broom)
df_model <- df |>
mutate(
log_orders = log(total_orders + 1),
vendor_type = relevel(vendor_type, ref = "Restaurant"),
city = relevel(city, ref = "Lagos"),
salesperson = relevel(salesperson, ref = "Jeremy"),
product_type = relevel(product_type, ref = "GMV")
)
model <- lm(log_gmv ~ log_orders + active_weeks + vendor_type +
city + salesperson + product_type, data = df_model)
tidy(model, conf.int = TRUE) |>
kable(digits = 3, caption = "OLS Regression Results (Dependent variable: log GMV)") |>
kable_styling(bootstrap_options = c("striped", "hover"))| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 7.475 | 0.181 | 41.187 | 0.000 | 7.115 | 7.835 |
| log_orders | 0.932 | 0.025 | 37.526 | 0.000 | 0.882 | 0.981 |
| active_weeks | 0.114 | 0.010 | 11.113 | 0.000 | 0.094 | 0.134 |
| vendor_typeEvent | -0.002 | 0.101 | -0.016 | 0.988 | -0.202 | 0.199 |
| vendor_typePharmacy | -0.084 | 0.072 | -1.166 | 0.246 | -0.227 | 0.059 |
| vendor_typeSupermarket | -0.042 | 0.069 | -0.607 | 0.545 | -0.179 | 0.095 |
| cityAbuja | -0.135 | 0.072 | -1.867 | 0.065 | -0.279 | 0.008 |
| cityIbadan | 0.150 | 0.097 | 1.548 | 0.125 | -0.042 | 0.343 |
| cityKano | 0.020 | 0.120 | 0.163 | 0.871 | -0.218 | 0.257 |
| cityPort Harcourt | -0.035 | 0.078 | -0.451 | 0.653 | -0.189 | 0.119 |
| salespersonJack | 0.175 | 0.065 | 2.676 | 0.009 | 0.045 | 0.305 |
| salespersonLou | 0.053 | 0.067 | 0.797 | 0.427 | -0.079 | 0.186 |
| product_typeAds | -0.037 | 0.074 | -0.505 | 0.614 | -0.183 | 0.109 |
| product_typeEvents | NA | NA | NA | NA | NA | NA |
| product_typeVoucher | 0.006 | 0.150 | 0.038 | 0.970 | -0.292 | 0.303 |
Code
glance(model) |>
select(r.squared, adj.r.squared, sigma, statistic, p.value, df, nobs) |>
kable(digits = 3, caption = "Model Fit Statistics") |>
kable_styling(bootstrap_options = "striped")| r.squared | adj.r.squared | sigma | statistic | p.value | df | nobs |
|---|---|---|---|---|---|---|
| 0.946 | 0.939 | 0.286 | 141.944 | 0 | 13 | 120 |
Code
import statsmodels.formula.api as smf
df["log_orders"] = np.log(df["Total_Orders"] + 1)
formula = ("log_gmv ~ log_orders + Active_Weeks + "
"C(Vendor_Type, Treatment('Restaurant')) + "
"C(City, Treatment('Lagos')) + "
"C(Salesperson, Treatment('Jeremy')) + "
"C(Product_Type, Treatment('GMV'))")
model = smf.ols(formula, data=df).fit()
print(model.summary()) OLS Regression Results
==============================================================================
Dep. Variable: log_gmv R-squared: 0.946
Model: OLS Adj. R-squared: 0.939
Method: Least Squares F-statistic: 141.9
Date: Fri, 15 May 2026 Prob (F-statistic): 9.75e-61
Time: 02:07:52 Log-Likelihood: -12.485
No. Observations: 120 AIC: 52.97
Df Residuals: 106 BIC: 91.99
Df Model: 13
Covariance Type: nonrobust
==========================================================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------------------------------------------
Intercept 7.4747 0.181 41.187 0.000 7.115 7.835
C(Vendor_Type, Treatment('Restaurant'))[T.Event] -0.0008 0.051 -0.016 0.988 -0.101 0.100
C(Vendor_Type, Treatment('Restaurant'))[T.Pharmacy] -0.0839 0.072 -1.166 0.246 -0.227 0.059
C(Vendor_Type, Treatment('Restaurant'))[T.Supermarket] -0.0418 0.069 -0.607 0.545 -0.179 0.095
C(City, Treatment('Lagos'))[T.Abuja] -0.1353 0.072 -1.867 0.065 -0.279 0.008
C(City, Treatment('Lagos'))[T.Ibadan] 0.1504 0.097 1.548 0.125 -0.042 0.343
C(City, Treatment('Lagos'))[T.Kano] 0.0195 0.120 0.163 0.871 -0.218 0.257
C(City, Treatment('Lagos'))[T.Port Harcourt] -0.0350 0.078 -0.451 0.653 -0.189 0.119
C(Salesperson, Treatment('Jeremy'))[T.Jack] 0.1753 0.065 2.676 0.009 0.045 0.305
C(Salesperson, Treatment('Jeremy'))[T.Lou] 0.0533 0.067 0.797 0.427 -0.079 0.186
C(Product_Type, Treatment('GMV'))[T.Ads] -0.0372 0.074 -0.505 0.614 -0.183 0.109
C(Product_Type, Treatment('GMV'))[T.Events] -0.0008 0.051 -0.016 0.988 -0.101 0.100
C(Product_Type, Treatment('GMV'))[T.Voucher] 0.0057 0.150 0.038 0.970 -0.292 0.303
log_orders 0.9315 0.025 37.526 0.000 0.882 0.981
Active_Weeks 0.1139 0.010 11.113 0.000 0.094 0.134
==============================================================================
Omnibus: 0.237 Durbin-Watson: 2.049
Prob(Omnibus): 0.888 Jarque-Bera (JB): 0.411
Skew: -0.034 Prob(JB): 0.814
Kurtosis: 2.722 Cond. No. 4.87e+16
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 5.98e-30. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
9.4 Diagnostic Plots
Code
library(ggfortify)
autoplot(model, which = 1:4, ncol = 2, label.size = 3) +
theme_minimal()Code
from scipy.stats import probplot
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fitted = model.fittedvalues
residuals = model.resid
axes[0,0].scatter(fitted, residuals, alpha=0.5, color="#18a558")
axes[0,0].axhline(0, color="#f97316", linestyle="--")
axes[0,0].set_title("Residuals vs Fitted")
axes[0,0].set_xlabel("Fitted values")
axes[0,0].set_ylabel("Residuals")
probplot(residuals, plot=axes[0,1])((array([-2.52654228, -2.1978944 , -2.0086642 , -1.8721281 , -1.76356639,
-1.67252351, -1.59354821, -1.5234211 , -1.46007481, -1.40209915,
-1.3484871 , -1.29849326, -1.25154963, -1.20721295, -1.16513026,
-1.12501567, -1.08663416, -1.04979006, -1.01431855, -0.98007946,
-0.94695242, -0.9148333 , -0.8836313 , -0.85326673, -0.82366923,
-0.79477627, -0.76653206, -0.73888652, -0.71179451, -0.68521516,
-0.65911132, -0.6334491 , -0.60819743, -0.58332778, -0.55881382,
-0.53463119, -0.51075726, -0.48717098, -0.46385269, -0.44078394,
-0.41794744, -0.39532687, -0.37290682, -0.35067268, -0.32861058,
-0.3067073 , -0.28495019, -0.26332716, -0.24182657, -0.2204372 ,
-0.19914822, -0.17794913, -0.15682971, -0.13578003, -0.11479034,
-0.09385111, -0.07295295, -0.05208661, -0.03124292, -0.0104128 ,
0.0104128 , 0.03124292, 0.05208661, 0.07295295, 0.09385111,
0.11479034, 0.13578003, 0.15682971, 0.17794913, 0.19914822,
0.2204372 , 0.24182657, 0.26332716, 0.28495019, 0.3067073 ,
0.32861058, 0.35067268, 0.37290682, 0.39532687, 0.41794744,
0.44078394, 0.46385269, 0.48717098, 0.51075726, 0.53463119,
0.55881382, 0.58332778, 0.60819743, 0.6334491 , 0.65911132,
0.68521516, 0.71179451, 0.73888652, 0.76653206, 0.79477627,
0.82366923, 0.85326673, 0.8836313 , 0.9148333 , 0.94695242,
0.98007946, 1.01431855, 1.04979006, 1.08663416, 1.12501567,
1.16513026, 1.20721295, 1.25154963, 1.29849326, 1.3484871 ,
1.40209915, 1.46007481, 1.5234211 , 1.59354821, 1.67252351,
1.76356639, 1.8721281 , 2.0086642 , 2.1978944 , 2.52654228]), array([-0.6603757 , -0.60735483, -0.54498455, -0.49092308, -0.47892148,
-0.46007655, -0.44815432, -0.42286797, -0.40845825, -0.38449585,
-0.38440391, -0.36563739, -0.35873176, -0.35657649, -0.34207382,
-0.32220143, -0.31933182, -0.30981369, -0.29010602, -0.28495125,
-0.27872218, -0.26909178, -0.25608948, -0.25410848, -0.23647945,
-0.23403878, -0.2270163 , -0.22163409, -0.21056351, -0.19252032,
-0.18327275, -0.1805042 , -0.17856711, -0.16398521, -0.15991018,
-0.14053496, -0.13008765, -0.08327991, -0.07506383, -0.07193527,
-0.07038983, -0.06924634, -0.06543606, -0.06408577, -0.06290271,
-0.05870374, -0.0474245 , -0.04714545, -0.04588297, -0.04390885,
-0.03986875, -0.03038227, -0.02654074, -0.02609946, -0.02128902,
-0.01716554, -0.01368012, -0.0036754 , -0.00361495, -0.00161852,
0.0051626 , 0.00959728, 0.0160572 , 0.02255458, 0.03183674,
0.03869273, 0.04204166, 0.04951389, 0.05336912, 0.0570143 ,
0.05991924, 0.06418679, 0.06818052, 0.07836187, 0.0876591 ,
0.09217446, 0.09891568, 0.10403025, 0.11408549, 0.11458356,
0.12078594, 0.1240216 , 0.12882171, 0.13501233, 0.13600601,
0.14360432, 0.1470743 , 0.15423351, 0.16170615, 0.16538959,
0.17155549, 0.17162142, 0.1751132 , 0.18384387, 0.19105423,
0.21790491, 0.21941264, 0.22000573, 0.24690781, 0.24904953,
0.27299527, 0.28438601, 0.28848276, 0.30357083, 0.30952347,
0.31723675, 0.32189856, 0.32786 , 0.34937234, 0.35507639,
0.36448562, 0.3691104 , 0.41621197, 0.44516661, 0.48279337,
0.54487641, 0.54528206, 0.5673478 , 0.58003395, 0.63013464])), (np.float64(0.27251001716477774), np.float64(7.881952420768249e-15), np.float64(0.9974817399260195)))
Code
axes[0,1].set_title("Normal Q-Q")
axes[1,0].scatter(fitted, np.sqrt(np.abs(residuals)), alpha=0.5, color="#18a558")
axes[1,0].set_title("Scale-Location")
axes[1,0].set_xlabel("Fitted values")
axes[1,1].hist(residuals, bins=25, color="#18a558", edgecolor="white")
axes[1,1].set_title("Residuals Distribution")
axes[1,1].set_xlabel("Residuals")
plt.tight_layout()
plt.show()9.5 Coefficient Interpretation
Update the table below with your actual coefficient values after running the model.
| Predictor | Coefficient (β) | Plain-language interpretation |
|---|---|---|
| log(Total_Orders) | [β] | A 10% increase in orders is associated with a [β×0.095]% increase in GMV |
| Active_Weeks | [β] | Each additional active week is associated with a [exp(β)-1]×100% change in GMV |
| Vendor_Type: Supermarket | [β] | Supermarket vendors generate [exp(β)-1]×100% more/less GMV than restaurants |
| City: Abuja | [β] | Abuja vendors generate [exp(β)-1]×100% more/less GMV than Lagos vendors |
| Product_Type: Ads | [β] | Ad-product vendors generate [exp(β)-1]×100% more/less GMV than GMV-product vendors |
Manager-ready summary: Update this paragraph with 2–3 plain sentences once you have the actual coefficients — explain what the single most important finding is and what action it supports.
10 Integrated Findings
The five analyses converge on a coherent story about what drives vendor GMV on Chowdeck.
EDA established that GMV is right-skewed and that active weeks are strongly associated with performance — immediately suggesting that onboarding timing is a key operational lever.
Visualisation showed that Lagos dominates total GMV volume, restaurant and event vendors outperform pharmacies and supermarkets at the top end, and weekly GMV trended upward across the quarter as more vendors came online.
Hypothesis testing confirmed whether vendor type and city differences in GMV are statistically significant — update this sentence with your actual conclusions and p-values.
Correlation analysis revealed that total orders is the strongest correlate of GMV, followed by active weeks — confirming that order activation, not just onboarding, is the true revenue driver.
Regression quantified the marginal contribution of each variable — update this sentence with your R² value and the two or three most significant predictors from your output.
Single integrated recommendation: The Chowdeck sales team should adopt a two-part Q2 strategy — (1) onboard vendors earlier in the quarter to maximise active weeks, and (2) focus account management effort on order activation (promotions, featured listings, menu optimisation) in the first four weeks after a vendor goes live, when the order-GMV relationship is most sensitive to intervention. Update this paragraph with a specific coefficient-driven quantification once you have your regression output.
11 Limitations & Further Work
Sample size: 120 vendors is sufficient for this analysis but limits the power of subgroup tests — Kano has only 7 vendors. A Q2 dataset with the same structure would double the sample and allow more robust city-level inference.
Omitted variables: The dataset does not include variables such as menu size, average order value, vendor rating, or marketing spend — all of which likely influence GMV. Including these in the regression would improve explanatory power and reduce omitted variable bias.
Causal inference: Correlation between orders and GMV is not causation — both may be driven by a third factor such as restaurant quality or marketing budget. A designed experiment varying promotional intensity across matched vendor pairs would allow stronger causal claims.
Time series dimension: The 13 weeks of weekly GMV data were aggregated to vendor level for this analysis. A panel data model (fixed or random effects) would use the full weekly structure, controlling for vendor-level heterogeneity and potentially identifying seasonal effects within the quarter.
Data generation: The dataset originates from internal operational records. A future study should link this data to Chowdeck’s customer-level order data to understand the demand-side drivers of vendor performance.
12 References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online
Onuoha, J. (2026). Chowdeck Q1 2025 vendor sales performance data [Dataset]. Collected from Chowdeck Commercial Sales Team, Lagos, Nigeria. Data available on request from the author.
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/
Van Rossum, G., & Drake, F. L. (2009). Python 3 reference manual. CreateSpace.
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
13 Appendix: AI Usage Statement
Claude (Anthropic) was used to assist with structuring the Quarto document template, generating code scaffolding for R and Python chunks, and suggesting appropriate statistical tests given the data structure. All analytical decisions — the choice of Case Study 1, the selection of log GMV as the outcome variable, the decision to use Kruskal-Wallis for the city hypothesis (given unequal group sizes), the formulation of both hypotheses, and the interpretation of all outputs — were made independently by the author. The regression coefficient interpretations, business recommendations, and integrated findings section reflect the author’s own judgement and are not AI-generated conclusions. All code was reviewed, tested, and understood by the author prior to submission.