Shrink Films Pricing Analytics Showcase

Executive-Educational Template for Price Variance Decomposition

Author

Liveo Research

1 Executive Purpose

This showcase demonstrates a complete data analysis pipeline for shrink films price variance decomposition. The recurring business question is:

How much of net price variability is driven by structural internal drivers (country, product segment, customer segment, utilization) versus external shocks (COVID and Iran conflict freight disruption)?

The document is designed for two audiences at once:

  • Decision-makers who need clear pricing insights and business implications.
  • Analysts who need a reproducible, statistically rigorous workflow in Quarto and R.

2 1. Data Analysis Pipeline

2.1 1.1 Standard lifecycle reference

The project follows the classic data science lifecycle:

Import -> Tidy -> Transform -> Model -> Visualize -> Communicate

Data analysis pipeline

2.2 1.2 Tidyverse workflow in business terms

  • Import: load raw transactional data and enrich with event flags.
  • Tidy: enforce one observation per transaction row, consistent variable types.
  • Transform: build inflation, seasonality, and event indicators.
  • Model: decompose variance via ANOVA, mixed-effects, and regularized regression.
  • Visualize: show event windows, dispersion, and driver contributions.
  • Communicate: translate statistical findings into pricing governance actions.

3 2. Data Generation for the Showcase

3.1 2.1 Design assumptions

The synthetic data is generated to mimic a realistic shrink films market:

  • Daily transactions from 2016-01-01 to 2026-04-27.
  • Multiple entries per day.
  • Baseline annual inflation around 2%.
  • COVID shock (Q2 2020 to end 2022): medium positive price shock and utilization volatility.
  • Iran conflict freight shock (last phase from 2026-02-28 to 2026-04-27, with prior buildup): high freight and price pressure, stronger for Global customers.
  • Year-end seasonality and strong utilization-driven price pressure.
Show code
set.seed(240426)

start_date <- as.Date("2016-01-01")
end_date <- as.Date("2026-04-27")
dates <- seq.Date(start_date, end_date, by = "day")

countries <- c("Germany", "France", "Italy", "Spain", "Poland", "Turkey", "UAE", "USA")
product_segments <- c("Standard", "Barrier", "High-Shrink", "Eco")
customer_segments <- c("Local", "Regional", "Global")

country_effect <- c(
  Germany = 1.14,
  France = 1.08,
  Italy = 1.00,
  Spain = 0.92,
  Poland = 0.84,
  Turkey = 0.78,
  UAE = 1.07,
  USA = 1.20
)

country_shock_sensitivity <- c(
  Germany = 1.03,
  France = 1.01,
  Italy = 1.00,
  Spain = 0.98,
  Poland = 0.95,
  Turkey = 0.93,
  UAE = 1.05,
  USA = 1.08
)

product_effect <- c(
  Standard = 0.90,
  Barrier = 1.10,
  `High-Shrink` = 1.20,
  Eco = 1.05
)

customer_effect <- c(
  Local = 0.95,
  Regional = 1.00,
  Global = 1.07
)

rows_per_day <- sample(2:4, length(dates), replace = TRUE, prob = c(0.40, 0.40, 0.20))
Date <- rep(dates, rows_per_day)
n <- length(Date)

Country <- sample(countries, n, replace = TRUE, prob = c(0.15, 0.12, 0.10, 0.10, 0.12, 0.11, 0.10, 0.20))
Product_Segment <- sample(product_segments, n, replace = TRUE, prob = c(0.38, 0.24, 0.20, 0.18))
Customer_Segment <- sample(customer_segments, n, replace = TRUE, prob = c(0.46, 0.34, 0.20))

year_num <- as.integer(format(Date, "%Y"))
month_num <- as.integer(format(Date, "%m"))

base_price <- 145
inflation_factor <- 1.02^(as.numeric(Date - as.Date("2016-01-01")) / 365.25)
seasonality <- 1 + 0.018 * sin(2 * pi * month_num / 12) + ifelse(month_num %in% c(11, 12), 0.030, 0)

covid_window <- Date >= as.Date("2020-04-01") & Date <= as.Date("2022-12-31")
covid_shock <- rep(1, n)
covid_shock[covid_window] <- pmin(1.20, pmax(1.14, rnorm(sum(covid_window), mean = 1.165, sd = 0.014)))

iran_prewindow <- Date >= as.Date("2024-10-01") & Date < as.Date("2026-02-28")
iran_window <- Date >= as.Date("2026-02-28") & Date <= as.Date("2026-04-27")

freight_prebuildup <- rep(1, n)
if (sum(iran_prewindow) > 0) {
  d <- as.numeric(Date[iran_prewindow] - as.Date("2024-10-01"))
  freight_prebuildup[iran_prewindow] <- 1.05 + 0.18 * (d / max(d))
}

freight_conflict <- rep(1, n)
if (sum(iran_window) > 0) {
  dd <- as.numeric(Date[iran_window] - as.Date("2026-02-28"))
  freight_conflict[iran_window] <- 1.54 - 0.0022 * dd
}

freight_covid <- rep(1, n)
freight_covid[covid_window] <- pmin(1.16, pmax(1.09, rnorm(sum(covid_window), mean = 1.125, sd = 0.015)))

freight_noise <- rnorm(n, mean = 0, sd = 3.8)
Freight_Cost_Index <- 100 * (1 + 0.010 * (year_num - 2016)) *
  (1 + 0.012 * sin(2 * pi * month_num / 12)) *
  freight_covid * freight_prebuildup * freight_conflict + freight_noise
Freight_Cost_Index <- pmax(70, Freight_Cost_Index)

util_base <- 0.76 + 0.06 * sin(2 * pi * (month_num - 2) / 12)
util_noise <- rnorm(n, mean = 0, sd = 0.035)
util_noise[covid_window] <- rnorm(sum(covid_window), mean = 0, sd = 0.09)
Factory_Utilisation <- util_base + util_noise
Factory_Utilisation <- pmax(0.52, pmin(0.98, Factory_Utilisation))

country_mult <- country_effect[Country]
country_sensitivity <- country_shock_sensitivity[Country]
product_mult <- product_effect[Product_Segment]
customer_mult <- customer_effect[Customer_Segment]

global_multiplier <- ifelse(Customer_Segment == "Global", 1.30, ifelse(Customer_Segment == "Regional", 1.12, 1.00))
freight_pressure <- (Freight_Cost_Index - 100) / 100

event_price_impulse <- 1 +
  ifelse(covid_window, 0.04 * country_sensitivity, 0) +
  ifelse(iran_prewindow, 0.03 * country_sensitivity, 0) +
  ifelse(iran_window, 0.12 * global_multiplier * country_sensitivity, 0)

util_pressure <- 1 + 0.90 * (Factory_Utilisation - 0.76)
noise <- rnorm(n, mean = 0, sd = 4.8)

Net_Price <- base_price * inflation_factor * seasonality * country_mult * product_mult * customer_mult *
  (1 + 0.14 * freight_pressure * global_multiplier * country_sensitivity) * util_pressure * covid_shock * event_price_impulse + noise

pricing_data_v2 <- data.frame(
  Date = Date,
  Country = Country,
  Product_Segment = Product_Segment,
  Customer_Segment = Customer_Segment,
  Factory_Utilisation = round(Factory_Utilisation, 3),
  Freight_Cost_Index = round(Freight_Cost_Index, 2),
  Net_Price = round(pmax(95, Net_Price), 2),
  stringsAsFactors = FALSE
)

write.csv(pricing_data_v2, "pricing_data_v2.csv", row.names = FALSE)

if (requireNamespace("writexl", quietly = TRUE)) {
  writexl::write_xlsx(pricing_data_v2, "pricing_data_v2.xlsx")
} else if (requireNamespace("openxlsx", quietly = TRUE)) {
  openxlsx::write.xlsx(pricing_data_v2, "pricing_data_v2.xlsx", overwrite = TRUE)
}

save(pricing_data_v2, file = "pricing_data_v2.RData")

head(pricing_data_v2)
        Date Country Product_Segment Customer_Segment Factory_Utilisation
1 2016-01-01 Germany        Standard         Regional               0.760
2 2016-01-01 Germany        Standard         Regional               0.704
3 2016-01-01 Germany         Barrier         Regional               0.744
4 2016-01-02     USA     High-Shrink         Regional               0.744
5 2016-01-02  Poland             Eco         Regional               0.733
6 2016-01-02     UAE     High-Shrink         Regional               0.750
  Freight_Cost_Index Net_Price
1             103.01    157.74
2              98.37    140.89
3             102.68    181.83
4              99.11    211.63
5             103.57    117.38
6             102.98    182.03

4 3. Descriptive Statistics: Understanding Dispersion

4.1 3.1 Arithmetic mean and Sample variance

\mu = \frac{1}{n}\sum_{i=1}^{n} x_i

Where:

  • \mu: population mean net price (or any selected metric).
  • n: number of observations.
  • x_i: observed value for transaction i.

s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}

Where:

  • s^2: sample variance, the dispersion around sample mean.
  • \bar{x}: sample mean.
  • x_i: observed transaction value.
  • n: sample size.

Business interpretation: high variance signals unstable pricing behavior and likely stronger influence of macro-events or inconsistent pricing discipline.

4.2 3.2 Descriptive Statistics

Show code
library(dplyr)
library(ggplot2)

group_stats <- pricing_data %>%
  group_by(Country, Product_Segment, Customer_Segment) %>%
  summarise(
    n = n(),
    mean_net_price = mean(Net_Price),
    sd_net_price = sd(Net_Price),
    .groups = "drop"
  ) %>%
  arrange(desc(mean_net_price))

group_stats %>% dplyr::slice_head(n = 20)
# A tibble: 20 × 6
   Country Product_Segment Customer_Segment     n mean_net_price sd_net_price
   <chr>   <chr>           <chr>            <int>          <dbl>        <dbl>
 1 USA     High-Shrink     Global              84           266.         40.7
 2 Germany High-Shrink     Global              59           257.         41.1
 3 USA     High-Shrink     Regional           151           254.         37.9
 4 USA     High-Shrink     Local              187           245.         36.4
 5 USA     Barrier         Global             104           244.         33.4
 6 USA     Eco             Global              94           240.         38.5
 7 Germany High-Shrink     Regional           109           238.         31.2
 8 USA     Barrier         Regional           175           236.         35.9
 9 Germany Barrier         Global              74           235.         30.7
10 UAE     High-Shrink     Global              43           234.         29.9
11 France  High-Shrink     Global              51           231.         33.5
12 Germany High-Shrink     Local              133           228.         32.0
13 Italy   High-Shrink     Global              41           226.         44.4
14 France  High-Shrink     Regional            81           226.         31.4
15 UAE     High-Shrink     Regional            72           224.         35.5
16 UAE     Barrier         Global              46           223.         34.1
17 Germany Eco             Global              60           219.         31.1
18 Germany Barrier         Regional           132           219.         32.2
19 USA     Barrier         Local              228           218.         31.1
20 USA     Eco             Regional           126           218.         29.5
Show code
boxplot_data <- pricing_data %>%
  mutate(
    Segment_Key = paste(Product_Segment, Customer_Segment, sep = " | ")
  )

ggplot(boxplot_data, aes(x = Country, y = Net_Price, fill = Customer_Segment)) +
  geom_boxplot(outlier.alpha = 0.15, alpha = 0.80) +
  facet_wrap(~ Product_Segment, ncol = 2) +
  labs(
    title = "Grouped net price dispersion by country, product, and customer segment",
    subtitle = "Boxplots show distributional spread across the core decomposition groups",
    x = "Country",
    y = "Net price",
    fill = "Customer segment"
  ) +
  theme_minimal(base_size = 12)

4.3 3.3 ANOVA for initial decomposition

Show code
anova_model <- aov(
  Net_Price ~ Country + Product_Segment + Customer_Segment + factor(Month) + Factory_Utilisation + Covid_Shock + Iran_Shock,
  data = pricing_data
)

summary(anova_model)
                       Df  Sum Sq Mean Sq F value Pr(>F)    
Country                 7 7364636 1052091  4402.7 <2e-16 ***
Product_Segment         3 4234520 1411507  5906.8 <2e-16 ***
Customer_Segment        2  704448  352224  1474.0 <2e-16 ***
factor(Month)          11  631132   57376   240.1 <2e-16 ***
Factory_Utilisation     1  974352  974352  4077.4 <2e-16 ***
Covid_Shock             1 3262415 3262415 13652.4 <2e-16 ***
Iran_Shock              1  681995  681995  2854.0 <2e-16 ***
Residuals           10537 2517951     239                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
if (requireNamespace("ggstatsplot", quietly = TRUE)) {
  ggstatsplot::ggbetweenstats(
    data = pricing_data,
    x = Country,
    y = Net_Price,
    type = "parametric",
    pairwise.comparisons = FALSE,
    title = "ANOVA visualization for country-level net price differences",
    xlab = "Country",
    ylab = "Net price"
  )
} else {
  message("Package 'ggstatsplot' not installed. Install it to render ANOVA visualization with ggstatsplot.")
}

4.4 3.4 Multi-visualization over time with ggplot2

Show code
library(ggplot2)

pricing_monthly <- pricing_data %>%
  mutate(Month_Start = as.Date(format(Date, "%Y-%m-01"))) %>%
  group_by(Month_Start) %>%
  summarise(
    Net_Price = mean(Net_Price),
    Freight_Cost_Index = mean(Freight_Cost_Index),
    Factory_Utilisation = mean(Factory_Utilisation),
    Covid_Shock = max(Covid_Shock),
    Iran_Shock = max(Iran_Shock),
    .groups = "drop"
  )

ggplot(pricing_monthly, aes(Month_Start, Net_Price)) +
  annotate("rect", xmin = as.Date("2020-04-01"), xmax = as.Date("2022-12-31"), ymin = -Inf, ymax = Inf, alpha = 0.12, fill = "#f6c794") +
  annotate("rect", xmin = as.Date("2026-02-01"), xmax = as.Date("2026-04-30"), ymin = -Inf, ymax = Inf, alpha = 0.18, fill = "#e57373") +
  geom_line(color = "#1f4e5f", linewidth = 0.8) +
  geom_smooth(method = "loess", se = FALSE, color = "#8c3b2a", linewidth = 0.9) +
  labs(
    title = "Monthly net price trend with macro-event windows",
    subtitle = "Orange: COVID shock, Red: Iran-related freight shock",
    x = "Month",
    y = "Average net price"
  ) +
  theme_minimal(base_size = 12)

Show code
driver_long <- pricing_monthly %>%
  transmute(
    Month_Start,
    Freight_Cost_Index,
    Factory_Utilisation = Factory_Utilisation * 100
  ) %>%
  tidyr::pivot_longer(
    cols = c(Freight_Cost_Index, Factory_Utilisation),
    names_to = "Driver",
    values_to = "Value"
  )

ggplot(driver_long, aes(Month_Start, Value, color = Driver)) +
  geom_line(linewidth = 0.8) +
  facet_wrap(~ Driver, scales = "free_y", ncol = 1) +
  scale_color_manual(values = c("Freight_Cost_Index" = "#8c3b2a", "Factory_Utilisation" = "#1f4e5f")) +
  labs(
    title = "Core operational drivers over time",
    subtitle = "Factory utilization is shown as percent",
    x = "Month",
    y = "Value"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

Show code
country_monthly <- pricing_data %>%
  mutate(Month_Start = as.Date(format(Date, "%Y-%m-01"))) %>%
  group_by(Country, Month_Start) %>%
  summarise(Net_Price = mean(Net_Price), .groups = "drop")

top_countries <- country_monthly %>%
  group_by(Country) %>%
  summarise(avg_price = mean(Net_Price), .groups = "drop") %>%
  arrange(desc(avg_price)) %>%
  slice_head(n = 4) %>%
  pull(Country)

ggplot(filter(country_monthly, Country %in% top_countries), aes(Month_Start, Net_Price, color = Country)) +
  geom_line(linewidth = 0.7) +
  labs(
    title = "Top-country price trajectories",
    subtitle = "Countries with the highest average prices",
    x = "Month",
    y = "Average net price"
  ) +
  theme_minimal(base_size = 12)

Show code
event_summary <- pricing_data %>%
  mutate(
    Event_Period = dplyr::case_when(
      Date < as.Date("2020-04-01") ~ "Pre-COVID",
      Date >= as.Date("2020-04-01") & Date <= as.Date("2022-12-31") ~ "COVID",
      Date >= as.Date("2026-02-28") & Date <= as.Date("2026-04-27") ~ "Iran Window",
      TRUE ~ "Other"
    )
  ) %>%
  filter(Event_Period %in% c("Pre-COVID", "COVID", "Iran Window")) %>%
  group_by(Event_Period) %>%
  summarise(
    Mean_Net_Price = mean(Net_Price),
    Mean_Freight_Cost_Index = mean(Freight_Cost_Index),
    .groups = "drop"
  )

pre_price <- event_summary$Mean_Net_Price[event_summary$Event_Period == "Pre-COVID"]
covid_price <- event_summary$Mean_Net_Price[event_summary$Event_Period == "COVID"]
iran_price <- event_summary$Mean_Net_Price[event_summary$Event_Period == "Iran Window"]

event_summary <- event_summary %>%
  mutate(
    Uplift_vs_Pre_COVID_pct = round(100 * (Mean_Net_Price / pre_price - 1), 1),
    Iran_vs_COVID_pct = round(100 * (iran_price / covid_price - 1), 1)
  )

event_summary
# A tibble: 3 × 5
  Event_Period Mean_Net_Price Mean_Freight_Cost_Index Uplift_vs_Pre_COVID_pct
  <chr>                 <dbl>                   <dbl>                   <dbl>
1 COVID                  213.                    118.                    32.5
2 Iran Window            241.                    165.                    50.5
3 Pre-COVID              160.                    102.                     0  
# ℹ 1 more variable: Iran_vs_COVID_pct <dbl>

5 4. Inferential Statistics: Mixed-Effects Models (Hierarchical Data)

5.1 4.1 Model rationale

Pricing data is naturally nested: transactions are grouped by countries and customer segments. A mixed-effects model captures shared market structure plus local deviations.

5.2 4.2 Mixed-effects formula

y_{ij} = \beta_0 + \beta_1 x_{ij} + u_j + \epsilon_{ij}

Where:

  • y_{ij}: net price for transaction i in country/group j.
  • \beta_0: global intercept.
  • \beta_1: slope for predictor x_{ij} (for example factory utilization or freight index).
  • x_{ij}: predictor value for observation i in group j.
  • u_j: random effect for group j (country-specific deviation).
  • \epsilon_{ij}: residual error for observation i within group j.

5.3 4.3 Classical hierarchical model formulation (random intercept and random slope)

Following classical multilevel notation from hierarchical model teaching material, the two-level formulation is:

\text{Level 1: } Y_{ij} = \beta_{0j} + \beta_{1j}X_{ij} + \epsilon_{ij}

\text{Level 2: } \beta_{0j} = \gamma_{00} + u_{0j}, \qquad \beta_{1j} = \gamma_{10} + u_{1j}

Combined model:

Y_{ij} = (\gamma_{00} + u_{0j}) + (\gamma_{10} + u_{1j})X_{ij} + \epsilon_{ij}

Where:

  • Y_{ij}: outcome for transaction i in country j.
  • X_{ij}: level-1 predictor (for example freight pressure at transaction level).
  • \beta_{0j}: country-specific intercept.
  • \beta_{1j}: country-specific slope.
  • \gamma_{00}: grand mean intercept across countries.
  • \gamma_{10}: average slope across countries.
  • u_{0j}: random intercept deviation for country j.
  • u_{1j}: random slope deviation for country j.
  • \epsilon_{ij}: level-1 residual.

5.4 4.4 Intraclass correlation coefficient (ICC)

ICC = \frac{\sigma_u^2}{\sigma_u^2 + \sigma_e^2}

Where:

  • \sigma_u^2: between-country variance component.
  • \sigma_e^2: within-country residual variance component.
  • ICC: proportion of total variance attributable to country-level clustering.
Show code
if (requireNamespace("lme4", quietly = TRUE)) {
  library(lme4)
  baseline_ri_model <- lmer(Net_Price ~ 1 + (1 | Country), data = pricing_data)

  vc <- as.data.frame(VarCorr(baseline_ri_model))
  sigma_u2 <- vc$vcov[vc$grp == "Country"]
  sigma_e2 <- vc$vcov[vc$grp == "Residual"]
  icc_value <- sigma_u2 / (sigma_u2 + sigma_e2)

  cat("Baseline random-intercept ICC (Country):", round(icc_value, 3), "\n")

  mixed_model <- lmer(
    Net_Price ~ Product_Segment + Customer_Segment + Factory_Utilisation + Freight_Cost_Index + Covid_Shock + Iran_Shock + (1 | Country),
    data = pricing_data
  )

  list(
    Baseline_Random_Intercept_Model = summary(baseline_ri_model),
    Mixed_Model = summary(mixed_model)
  )

  random_intercepts <- ranef(baseline_ri_model)$Country
  random_intercepts$Country <- rownames(random_intercepts)
  names(random_intercepts)[1] <- "Random_Intercept"

  ggplot(random_intercepts, aes(reorder(Country, Random_Intercept), Random_Intercept)) +
    geom_col(fill = "#1f4e5f", alpha = 0.85) +
    geom_hline(yintercept = 0, color = "#555555", linewidth = 0.6) +
    coord_flip() +
    labs(
      title = "Country-level random intercepts from baseline model",
      subtitle = paste0("Higher bars indicate higher baseline price levels; ICC = ", round(icc_value, 3)),
      x = "Country",
      y = "Random intercept"
    ) +
    theme_minimal(base_size = 12)
} else {
  message("Package 'lme4' not installed. Install it to run mixed-effects modeling.")
}
Baseline random-intercept ICC (Country): 0.37 

6 5. Autoregressive Models for Temporal Dependence

6.1 5.1 AR(p) process

y_t = c + \sum_{j=1}^{p} \phi_j y_{t-j} + \eta_t

Where:

  • y_t: monthly average net price at time t.
  • c: intercept term.
  • p: autoregressive order.
  • \phi_j: autoregressive coefficient for lag j.
  • y_{t-j}: lagged monthly net price at lag j.
  • \eta_t: innovation shock at time t.

6.2 5.2 Autocorrelation and best-fitting AR order

Show code
ts_price <- pricing_monthly$Net_Price
acf_obj <- acf(ts_price, plot = FALSE, lag.max = 24)
acf_data <- data.frame(
  lag = as.numeric(acf_obj$lag),
  acf = as.numeric(acf_obj$acf)
) %>%
  dplyr::filter(lag > 0)

ggplot(acf_data, aes(x = lag, y = acf)) +
  geom_col(fill = "#1f4e5f", alpha = 0.85) +
  geom_hline(yintercept = c(-1.96 / sqrt(length(ts_price)), 1.96 / sqrt(length(ts_price))),
             linetype = "dashed", color = "#8c3b2a") +
  labs(
    title = "Autocorrelation function of monthly net price",
    subtitle = "Dashed lines indicate approximate significance bounds",
    x = "Lag",
    y = "ACF"
  ) +
  theme_minimal(base_size = 12)

Show code
candidate_orders <- 1:6
ar_fits <- lapply(candidate_orders, function(p) arima(ts_price, order = c(p, 0, 0)))
aic_table <- data.frame(
  AR_Order = candidate_orders,
  AIC = sapply(ar_fits, AIC)
) %>%
  arrange(AIC)

best_p <- aic_table$AR_Order[1]
best_ar_model <- ar_fits[[which(candidate_orders == best_p)]]

pricing_monthly$AR_Best_Fitted <- as.numeric(ts_price - residuals(best_ar_model))

aic_table
  AR_Order      AIC
1        1 904.8025
2        3 905.7617
3        2 906.4930
4        4 907.5448
5        6 907.8233
6        5 909.4950
Show code
fit_long <- pricing_monthly %>%
  select(Month_Start, Net_Price, AR_Best_Fitted) %>%
  tidyr::pivot_longer(cols = c(Net_Price, AR_Best_Fitted), names_to = "Series", values_to = "Value")

ggplot(fit_long, aes(Month_Start, Value, color = Series)) +
  geom_line(linewidth = 0.8) +
  scale_color_manual(values = c("Net_Price" = "#222222", "AR_Best_Fitted" = "#1f4e5f")) +
  labs(
    title = "Observed vs autoregressive fitted series",
    subtitle = paste0("Best-fitting AR model selected by minimum AIC (AR(", best_p, "))"),
    x = "Month",
    y = "Average net price",
    color = "Series"
  ) +
  theme_minimal(base_size = 12)

7 6. Regularized Regression (High-Dimensional Pricing Features)

7.1 6.1 Lasso objective

\min_{\beta} \left\{ \sum_{i=1}^{n} \left(y_i - \beta_0 - \sum_{j=1}^{p} \beta_j x_{ij}\right)^2 + \lambda \sum_{j=1}^{p} |\beta_j| \right\}

Where:

  • y_i: net price outcome for observation i.
  • x_{ij}: predictor j for observation i.
  • \beta_0: intercept.
  • \beta_j: coefficient for predictor j.
  • p: number of predictors.
  • n: number of observations.
  • \lambda: penalty strength controlling shrinkage.

Interpretation: as \lambda increases, weak predictors are shrunk toward zero; many become exactly zero, enabling feature selection.

Show code
if (requireNamespace("glmnet", quietly = TRUE)) {
  library(glmnet)

  x <- model.matrix(
    Net_Price ~ Country + Product_Segment + Customer_Segment + Factory_Utilisation + Freight_Cost_Index + Covid_Shock + Iran_Shock + factor(Month),
    data = pricing_data
  )[, -1]
  y <- pricing_data$Net_Price

  cv_fit <- cv.glmnet(x, y, alpha = 1)
  coef(cv_fit, s = "lambda.min")
} else {
  message("Package 'glmnet' not installed. Install it to run Lasso regression.")
}
28 x 1 sparse Matrix of class "dgCMatrix"
                            lambda.min
(Intercept)                -68.7855830
CountryGermany              11.8536240
CountryItaly               -13.6072746
CountryPoland              -42.2143906
CountrySpain               -28.2700746
CountryTurkey              -52.4508707
CountryUAE                  -0.4157159
CountryUSA                  23.3854944
Product_SegmentEco          -8.5203468
Product_SegmentHigh-Shrink  17.6051704
Product_SegmentStandard    -35.3002368
Customer_SegmentLocal      -22.8815748
Customer_SegmentRegional   -13.1393879
Factory_Utilisation        177.7144447
Freight_Cost_Index           1.3097526
Covid_Shock                 26.8016876
Iran_Shock                  -9.4898765
factor(Month)2               1.2696899
factor(Month)3               0.5271629
factor(Month)4               0.4825060
factor(Month)5              -0.4425473
factor(Month)6              -0.7547919
factor(Month)7              -1.3282704
factor(Month)8              -1.1349208
factor(Month)9              -1.1971985
factor(Month)10             -0.8471749
factor(Month)11              5.8489051
factor(Month)12              6.3383577

8 7. Clustering Procedure

8.1 7.1 Objective function (K-means)

J = \sum_{j=1}^{k} \sum_{i \in S_j} \|x_i - \mu_j\|^2

Where:

  • J: total within-cluster sum of squares.
  • k: number of clusters.
  • S_j: set of observations assigned to cluster j.
  • x_i: feature vector for observation i.
  • \mu_j: centroid of cluster j.

Interpretation: K-means minimizes the distance between observations and their assigned centroid, uncovering hidden behavioral segments in pricing dynamics.

Show code
cluster_data <- pricing_data %>%
  transmute(
    Net_Price,
    Factory_Utilisation,
    Freight_Cost_Index,
    Covid_Shock,
    Iran_Shock
  )

set.seed(240426)
cluster_data <- cluster_data %>%
  dplyr::slice_sample(n = min(2500, nrow(cluster_data)))

if (requireNamespace("mlr3", quietly = TRUE) && requireNamespace("mlr3cluster", quietly = TRUE)) {
  library(mlr3)
  library(mlr3cluster)

  tsk <- as_task_clust(cluster_data)
  k <- 5

  kmeans_learner <- lrn("clust.kmeans", centers = k, nstart = 25)
  kmeans_learner$train(tsk)
  kmeans_pred <- kmeans_learner$predict(tsk)

  print(table(kmeans_pred$partition))

  pca_fit <- prcomp(scale(cluster_data), center = TRUE, scale. = TRUE)
  pca_plot_data <- data.frame(
    PC1 = pca_fit$x[, 1],
    PC2 = pca_fit$x[, 2],
    Cluster = factor(kmeans_pred$partition)
  )

  print(
    ggplot(pca_plot_data, aes(PC1, PC2, color = Cluster)) +
      geom_point(alpha = 0.55, size = 1.6) +
      labs(
        title = "K-means clustering (mlr3) on pricing behavior",
        subtitle = "PCA projection of transaction-level feature space",
        x = "PC1",
        y = "PC2"
      ) +
      theme_minimal(base_size = 12)
  )

  agnes_learner <- lrn("clust.agnes", stand = TRUE, k = k, method = "ward")
  agnes_learner$train(tsk)
  plot(agnes_learner$model, hang = -1, which.plots = 2, main = "Hierarchical clustering dendrogram")
  rect.hclust(as.hclust(agnes_learner$model), k = k, border = "#8c3b2a")

  cluster_profiles <- cluster_data %>%
    mutate(Cluster = factor(kmeans_pred$partition)) %>%
    group_by(Cluster) %>%
    summarise(
      n = n(),
      mean_net_price = mean(Net_Price),
      mean_freight = mean(Freight_Cost_Index),
      mean_util = mean(Factory_Utilisation),
      .groups = "drop"
    )

  high_cluster <- cluster_profiles %>% arrange(desc(mean_net_price)) %>% slice(1)
  low_cluster <- cluster_profiles %>% arrange(mean_net_price) %>% slice(1)

  extreme_clusters <- bind_rows(
    high_cluster %>% mutate(Profile = "Highest-price cluster"),
    low_cluster %>% mutate(Profile = "Lowest-price cluster")
  ) %>%
    select(Profile, everything())

  print(cluster_profiles)
  print(extreme_clusters)
} else {
  message("Packages 'mlr3' and 'mlr3cluster' are required for this clustering workflow.")
}

  1   2   3   4   5 
496 416 703 741 144 

# A tibble: 5 × 5
  Cluster     n mean_net_price mean_freight mean_util
  <fct>   <int>          <dbl>        <dbl>     <dbl>
1 1         496           128.         104.     0.747
2 2         416           234.         117.     0.778
3 3         703           196.         112.     0.762
4 4         741           163.         109.     0.754
5 5         144           285.         125.     0.814
# A tibble: 2 × 6
  Profile               Cluster     n mean_net_price mean_freight mean_util
  <chr>                 <fct>   <int>          <dbl>        <dbl>     <dbl>
1 Highest-price cluster 5         144           285.         125.     0.814
2 Lowest-price cluster  1         496           128.         104.     0.747

9 8. Variance Decomposition and Executive Communication

9.1 8.1 External vs internal driver contribution

Show code
internal_model <- lm(Net_Price ~ Country + Product_Segment + Customer_Segment + Factory_Utilisation + factor(Month), data = pricing_data)
full_model <- lm(Net_Price ~ Country + Product_Segment + Customer_Segment + Factory_Utilisation + factor(Month) + Covid_Shock + Iran_Shock + Freight_Cost_Index, data = pricing_data)

r2_internal <- summary(internal_model)$r.squared
r2_full <- summary(full_model)$r.squared
r2_external_increment <- r2_full - r2_internal

data.frame(
  R2_Internal_Drivers = round(r2_internal, 3),
  R2_Total_Full_Model = round(r2_full, 3),
  R2_Increment_External_Events = round(r2_external_increment, 3)
)
  R2_Internal_Drivers R2_Total_Full_Model R2_Increment_External_Events
1               0.683               0.941                        0.258

9.2 8.2 Leadership-ready interpretation template

  • Country and segment structure explain persistent baseline pricing differences.
  • Factory utilization captures internal capacity pressure and margin stress.
  • COVID and freight conflict shocks explain abnormal, event-driven price variance.
  • Global customers receive stronger freight pass-through during disruption windows.
  • Governance implication: combine event-based surcharges with utilization-triggered pricing rules.

10 9. Conclusion

This report provides a practical, reproducible, and executive-ready template for shrink films pricing analytics. It links advanced methods to clear business decisions by separating structural price drivers from macro-event shocks, while preserving transparency through explicit formulas and interpretable model stages.