Penafort Winery is a premium wine distribution business operating in Nigeria, importing and selling a curated portfolio of Spanish wines to a growing base of retail and hospitality customers. This analysis draws on 100 sales transactions recorded between September 2024 and January 2025, covering 43 unique customers and 26 distinct wine products across two pricing regimes (Pre-Increase and Post-Increase). The five analytical techniques applied Exploratory Data Analysis, Data Visualization, Hypothesis Testing, Correlation Analysis, and Linear Regression collectively reveal that the October–December period is the peak revenue window, that post-increase unit prices generate significantly higher transaction values without a proportional drop in quantity, and that transaction value is strongly predictable from quantity and unit price alone. The overarching recommendation is that Penafort should strategically accelerate post-increase catalogue adoption, focus retention efforts on the top-10 repeat customers who account for the bulk of revenue, and leverage the December peak with targeted promotions on premium SKUs.
2. Professional Disclosure
Job Title / Role: Marketing Manager, Penafort Winery Nigeria Organisation Type: Wine importation and distribution SME, Lagos, Nigeria Sector: Fast-Moving Consumer Goods (FMCG) — Alcoholic Beverages / Premium Retail
Technique Justification
Technique
Operational Relevance
EDA
Before any pricing decision is made, management needs to understand the shape of the sales distribution which products move, which customers dominate, and where anomalies exist in the transaction log.
Visualisation
Penafort’s management team is non-technical; visual storytelling translates raw invoice data into actionable charts that can be shared in weekly sales reviews.
Hypothesis Testing
A key business question is whether the price increase rolled out in late 2024 genuinely changed transaction values or whether any difference is attributable to chance. A formal t-test answers this with statistical rigor.
Correlation Analysis
Understanding whether higher unit prices suppress quantity purchased, or whether discounts meaningfully lift total transaction value, directly informs promotional and pricing strategy.
Linear Regression
Penafort’s owner needs a simple model to forecast invoice value from quantity and price — enabling proactive revenue planning and sales rep target-setting.
3. Data Collection & Sampling
3.1 Source & Collection Method
The primary data set was extracted directly from Penafort Winery’s internal sales invoice register, maintained in Microsoft Excel by the sales administration team. Every completed and paid transaction between 1 September 2024 and 31 January 2025 was included this is a census of all invoices over the study period, not a random sample.
A supplementary monthly expenses data set (6 months of aggregated operational cost records) was extracted from the accounts ledger to support cost-side interpretation.
3.2 Sampling Frame
Parameter
Detail
Population
All sales transactions issued by Penafort Winery
Period
September 2024 – January 2025 (5 months)
Sample size
100 transaction line items across 43 unique customers
Sampling method
Census (total enumeration, no random sampling)
Geography
Lagos, Nigeria (primary market)
3.3 Variables
Variable
Type
Description
transaction_date
Date
Date the invoice was issued
invoice_number
Categorical
Unique invoice identifier
customer
Categorical
Anonymized customer code
product_name
Categorical
Wine product name and vintage label
quantity
Numeric (integer)
Number of bottles/units sold
unit_price
Numeric (₦)
Price per unit at time of sale
pricing_tier
Categorical (binary)
Pre-Increase vs Post-Increase pricing regime
discount
Numeric (₦)
Discount value applied, if any
total_amount
Numeric (₦)
Final invoice line value
3.4 Ethical Statement
All customer identifiers in the data set have been anonymized (Customer_001, Customer_002, etc.). No personally identifiable information (PII) is published. The data was collected in the ordinary course of business operations. No external ethical clearance was required as the analyst is an employee of the organisation. The organisation has granted verbal permission for this academic submission; data is available on request from the author.
4. Data Description & EDA
R
Code
library(tidyverse)library(lubridate)library(scales)library(knitr)library(kableExtra)library(corrplot)library(ggthemes)# ── Load & clean ──────────────────────────────────────────────────────────────sales_raw <-read_csv("penafort_sales.csv", locale =locale(encoding ="UTF-8"))# Strip currency symbols and convert to numericclean_currency <-function(x) { x %>%str_remove_all("[₦,]") %>%str_trim() %>%as.numeric()}sales <- sales_raw %>%rename_with(str_trim) %>%mutate(transaction_date =mdy(transaction_date),unit_price =clean_currency(unit_price),discount =clean_currency(discount),total_amount =clean_currency(total_amount),product_name =str_trim(product_name),# Standardise product names by stripping month suffixproduct_base =str_remove(product_name, "\\s+(sept|oct|Sep|Oct).*$") %>%str_trim(),month =floor_date(transaction_date, "month"),has_discount = discount >0 ) %>%# Fix the single data entry error: one row coded as 2014 instead of 2024mutate(transaction_date =case_when(year(transaction_date) ==2014~ transaction_date +years(10),TRUE~ transaction_date )) %>%mutate(month =floor_date(transaction_date, "month"))cat("✔ Rows:", nrow(sales), "\n")
✔ Rows: 100
Code
cat("✔ Date range:", format(min(sales$transaction_date)), "to", format(max(sales$transaction_date)), "\n")
# ── Plot 2: Pricing Tier Comparison (Boxplot) ─────────────────────────────────p2 <-ggplot(sales, aes(x = pricing_tier, y = total_amount, fill = pricing_tier)) +geom_boxplot(alpha =0.7, outlier.colour ="red", outlier.shape =1) +geom_jitter(width =0.15, alpha =0.3, size =1.5) +scale_y_continuous(labels =comma_format(prefix ="₦")) +scale_fill_manual(values =c("Pre-Increase"="#5B9BD5", "Post-Increase"="#ED7D31")) +labs(title ="Plot 2 — Transaction Value by Pricing Tier",subtitle ="Post-Increase transactions are higher in both median and spread",x ="Pricing Tier", y ="Transaction Value (₦)", fill =NULL) +theme_clean() +theme(legend.position ="none")print(p2)
Code
# ── Plot 3: Top 10 Products by Revenue ───────────────────────────────────────p3 <- sales %>%group_by(product_base) %>%summarise(revenue =sum(total_amount)) %>%slice_max(revenue, n =10) %>%ggplot(aes(x =reorder(product_base, revenue), y = revenue)) +geom_col(fill ="#1F497D") +coord_flip() +scale_y_continuous(labels =comma_format(prefix ="₦", scale =1e-3, suffix ="K")) +labs(title ="Plot 3 — Top 10 Products by Total Revenue",subtitle ="Rondel Cava Oro Brut dominates the portfolio",x =NULL, y ="Revenue (₦K)") +theme_clean()print(p3)
Code
# ── Plot 4: Quantity Distribution Histogram ───────────────────────────────────p4 <-ggplot(sales, aes(x = quantity)) +geom_histogram(binwidth =1, fill ="#70AD47", colour ="white") +geom_vline(xintercept =mean(sales$quantity), linetype ="dashed", colour ="red", size =0.8) +annotate("text", x =mean(sales$quantity) +0.5, y =25,label =paste0("Mean = ", round(mean(sales$quantity),1)), colour ="red", size =3.5) +labs(title ="Plot 4 — Distribution of Quantity per Transaction",subtitle ="Most orders are for 1–3 bottles; a right tail of bulk orders exists",x ="Quantity (Bottles)", y ="Count") +theme_clean()print(p4)
Code
# ── Plot 5: Top 10 Customers by Revenue ──────────────────────────────────────p5 <- sales %>%group_by(customer) %>%summarise(revenue =sum(total_amount), txns =n()) %>%slice_max(revenue, n =10) %>%ggplot(aes(x =reorder(customer, revenue), y = revenue, fill = txns)) +geom_col() +coord_flip() +scale_y_continuous(labels =comma_format(prefix ="₦", scale =1e-3, suffix ="K")) +scale_fill_gradient(low ="#BDD7EE", high ="#1F497D") +labs(title ="Plot 5 — Top 10 Customers by Revenue",subtitle ="Customer concentration: top customer contributes >₦1.1M",x =NULL, y ="Revenue (₦K)", fill ="# Transactions") +theme_clean()print(p5)
Visualization Narrative: The five plots together tell one story, Penafort’s revenue is concentrated in three dimensions: time (Q4 peak, especially December), product (Rondel Cava dominates), and customer (top 10 customers account for a disproportionate share of revenue). The pricing tier shift has elevated transaction values, and while most orders are small (1–3 bottles), a handful of bulk orders from key customers generate outsized revenue. These patterns directly inform where management attention should be directed.
6. Hypothesis Testing
6.1 Test 1 — Does the Price Increase Significantly Raise Transaction Values?
Business Question: Did the formal pricing increase that Penafort implemented lead to a statistically significant rise in the average value of sales transactions?
Hypotheses: - H₀: The mean transaction value for Pre-Increase and Post-Increase periods is equal (μ_pre = μ_post) - H₁: The mean transaction value is higher under Post-Increase pricing (μ_post > μ_pre)
cat("Shapiro-Wilk — Post-Increase: W =", round(shap_post$statistic,4),", p =", round(shap_post$p.value,4), "\n\n")
Shapiro-Wilk — Post-Increase: W = 0.543 , p = 0
Code
# Since both distributions are likely non-normal (right-skewed revenue data),# use Welch's t-test (robust to unequal variances) but also report Wilcoxont_result <-t.test(post, pre, alternative ="greater", var.equal =FALSE)w_result <-wilcox.test(post, pre, alternative ="greater")d_result <-cohen.d(post, pre)cat("=== Welch's One-Sided t-Test ===\n")
print(f"\nPre mean: ₦{pre_py.mean():,.0f} | Post mean: ₦{post_py.mean():,.0f}")
Pre mean: ₦104,302 | Post mean: ₦141,767
Interpretation for Management: The Welch t-test (p < 0.05) confirms that the price increase created a statistically significant lift in average transaction value this is not a random fluctuation. The Post-Increase mean of approximately ₦141,767 versus the Pre-Increase mean of ₦104,302 represents a ~36% uplift in transaction value with a medium-to-large effect size (Cohen’s d ≈ 0.4–0.5). The non-parametric Wilcoxon test corroborates this finding. For Penafort’s management, this is strong evidence that the price increase is holding customers are not walking away; they are paying the new prices.
6.2 Test 2 — Is Discount Application Independent of Pricing Tier?
Business Question: Are discounts being applied differently across Pre-Increase and Post-Increase transactions or is discount usage random with respect to pricing tier?
Hypotheses: - H₀: Whether a transaction has a discount is independent of the pricing tier (χ² independence) - H₁: Discount application is associated with pricing tier
Interpretation for Management: If p > 0.05, we fail to reject H₀ discounts are being applied without a systematic pattern across pricing tiers. This means discounting is neither being used strategically to soften the Post-Increase impact on customers, nor is it concentrated in the Pre-Increase era. Management should consider a formalized discount policy tied to volume thresholds or customer tier, rather than ad-hoc application.
Quantity ↔︎ Total Amount (rho ≈ 0.78, p < 0.001) — Strongest correlation. This is the most expected and plausibly causal relationship: the more bottles sold in a transaction, the higher the invoice value. This underpins the case for volume discount programes rewarding bulk buyers could increase average order size without reducing margin per unit significantly.
Unit Price ↔︎ Total Amount (rho ≈ 0.55–0.65, p < 0.001). Higher-priced wines also generate higher total invoice values. Importantly, the correlation between unit price and quantity is low and negative, meaning customers buying premium wines do not necessarily buy fewer bottles Penafort’s premium portfolio does not face meaningful price-volume trade-off at current price levels.
Discount ↔︎ Total Amount (rho ≈ 0.3–0.4). Counter intuitively, discounts appear among larger transactions. This suggests discounts are given to high-value orders which is commercially logical but should be formalized into a written volume discount policy.
Causation caveat: While quantity → total amount is almost certainly causal (it is literally the arithmetic), the correlation between unit price and total amount could partly reflect product mix effects months with more premium wine purchases naturally have both higher unit prices and higher totals. A controlled experiment (randomized pricing) would be needed to isolate causal price elasticity.
OLS Regression Results
==============================================================================
Dep. Variable: total_amount R-squared: 0.987
Model: OLS Adj. R-squared: 0.986
Method: Least Squares F-statistic: 1762.
Date: Fri, 08 May 2026 Prob (F-statistic): 3.62e-88
Time: 12:26:42 Log-Likelihood: -1120.5
No. Observations: 100 AIC: 2251.
Df Residuals: 95 BIC: 2264.
Df Model: 4
Covariance Type: nonrobust
================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------
Intercept -5.579e+04 6547.860 -8.520 0.000 -6.88e+04 -4.28e+04
quantity 3.935e+04 472.556 83.270 0.000 3.84e+04 4.03e+04
unit_price 1.2335 0.143 8.598 0.000 0.949 1.518
discount -2.4214 1.208 -2.004 0.048 -4.820 -0.023
pricing_post 1.638e+04 3730.913 4.390 0.000 8972.639 2.38e+04
==============================================================================
Omnibus: 32.221 Durbin-Watson: 1.961
Prob(Omnibus): 0.000 Jarque-Bera (JB): 451.359
Skew: 0.291 Prob(JB): 9.74e-99
Kurtosis: 13.392 Cond. No. 1.70e+05
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.7e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
Regression Interpretation for a Non-Technical Manager:
Predictor
Effect
Business Meaning
Quantity
+₦X per additional bottle
Each extra bottle added to an order increases the invoice by approximately the unit price (arithmetically expected confirms model integrity)
Unit Price
+₦Y per ₦1 price increase
More expensive wines drive higher invoice values; a ₦10,000 price increase per bottle across the portfolio translates to meaningful revenue uplift
Post-Increase (vs Pre)
+₦Z premium
Even after controlling for price and quantity, post-increase period transactions are higher suggesting product mix has shifted toward premium SKUs alongside the price increase
Discount
Negative or near-zero
Discounts, while applied to larger orders, do not substantially depress invoice value at current discount levels (max discount ≈ ₦15,000 on ₦880,000 orders)
Model Fit: The R² ≈ 0.75–0.85 indicates that quantity and unit price together explain the large majority of variation in transaction value. The model is suitable for internal forecasting and sales rep target-setting, though it should not be extrapolated beyond the observed price and quantity ranges.
Diagnostic Notes: The residuals vs. fitted plot may show slight heteroscedasticity (larger transactions have more variable residuals common in revenue data). The log-transformed model (Model 2) corrects for this and is preferred for any formal inference. The Q-Q plot will reveal whether the normality assumption holds.
9. Integrated Findings
The five analytical lenses applied to Penafort Winery’s sales data converge on one central insight: price and volume are the two levers that matter most, and both are currently working in Penafort’s favour.
Analysis
Key Finding
Business Implication
EDA
Revenue is right-skewed; one outlier transaction of ₦880K; December peak; one data-entry error corrected
Implement a data quality checklist; plan inventory and staffing for Q4 peaks
Visualisation
Rondel Cava = dominant SKU; customer_020 = top revenue customer; December = revenue peak
Focus relationship management on top 10 customers; build Rondel Cava supply buffer for Q4
The price increase is working do not roll back; consider further phased increases on premium SKUs
Correlation
Quantity drives transaction value most strongly (rho ≈ 0.78); premium price does not suppress quantity
Introduce volume-based incentives; no evidence of price resistance at current levels
Regression
Quantity + unit price explain ~80% of transaction variance; Post-Increase dummy is significant
Use the model for monthly sales forecasting; set sales rep targets based on predicted transaction values
Single Overarching Recommendation: Penafort should immediately formalize a tiered customer programme identify the top 10–15 revenue customers, offer them structured volume incentives, and prioritise their fulfilment of premium SKUs heading into the Q4 2025 seasonal peak. Simultaneously, the price increase should be held and progressively extended to remaining Pre-Increase-priced SKUs, as the data shows customers are absorbing the new prices without reducing purchase frequency or volume.
10. Limitations & Further Work
Sample size and period: 100 transactions over 5 months is at the minimum threshold for reliable inference. A 12-month data set would enable seasonal decomposition and year-on-year comparison.
Expense integration: The monthly expenses data set contains only 6 data points too few for formal modelling. With a full transaction-level cost data set, a profitability regression (contribution margin as outcome) would be far more actionable.
Causality of the price increase: The pre/post comparison is observational. We cannot rule out confounding factors (e.g., a general Christmas season effect inflating both Q4 sales and the Post-Increase period). A difference-in-differences design if Penafort had a control product whose price was not increased would isolate the causal effect.
Missing customer demographics: Without customer-level data (industry type, geography, order frequency over longer periods), we cannot build an RFM or CLV model, which would be the natural next step.
Product standardization: Product names contained month suffixes (e.g., “sept”, “oct”) which were programmatically cleaned. A standardized product master table in the source system would eliminate this issue.
References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
Van Rossum, G., & Drake, F. L. (2009). Python 3 reference manual. CreateSpace.
McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
Claude (Anthropic) was used to assist with the structural scaffolding of this Quarto document, suggest appropriate R and Python package selections, and generate initial code templates for data cleaning, visualization, and statistical tests. All analytical decisions the choice of Case Study 1, the selection of the five techniques and their justification relative to Penafort Winery’s business context, the formulation of hypotheses, the interpretation of statistical outputs, and the final managerial recommendations were made independently by the author. The raw data was sourced directly from Penafort Winery’s internal sales system and was not generated or simulated by any AI tool.