1 Part I — Theory Primer

1.1 1. Introduction

In marketing analytics, data are frequently hierarchical. Examples include transactions nested within stores, impressions nested within campaigns, or weekly KPIs tracked for the same customer over time. Traditional single-level regressions treat all observations as independent, which is not valid when measurements are repeated or clustered. This can lead to underestimated standard errors, overconfident p-values, and biased estimates.

Multilevel modeling (MLM)—also called hierarchical or mixed-effects modeling—handles this structure by modeling both fixed effects (population-average relationships) and random effects (cluster-specific deviations). MLM allows analysts to estimate an average campaign effect while accounting for variability across stores, customers, or regions.

1.1.1 1.1 What is Multilevel Modeling?

MLM is a regression framework that explicitly recognizes clustering. Consider these marketing settings:

Customers within regions. Customer spending is driven by individual factors (income, loyalty), but also by region-level characteristics (local competition, media costs).
Stores within cities. Store sales depend on weekly promotions and footfall (store-level), and also city-wide events or macro factors (city-level).
Weeks within campaign. Weekly conversions for the same campaign are serially related and share the same strategy, budget, and targeting choices.

MLM separates within-unit patterns (e.g., how a store’s sales change week-to-week with media) and between-unit differences (e.g., how stores differ in baseline sales or responsiveness).

1.1.2 1.2 When Do We Need MLM?

Observations are grouped (customers within regions; transactions within stores; weeks within customer).
Within-group observations are correlated (same store/week/customer tends to be similar).
You need to estimate both average relationships and unit-specific deviations.

Ignoring clustering violates the independence assumption, often producing inflated significance and misleading insights.

1.1.3 1.3 Key Concepts

1.1.3.1 Fixed vs. Random Effects

Fixed effects estimate the average relationship that applies across all units. Example: On average, each additional $1k of digital spend increases sales by 2.1 units.
Random effects capture unit-specific deviation from that average. Example: some stores are systematically above/below the average baseline (random intercepts) or more/less responsive to spend (random slopes).

Random Intercepts. Allow each store to have its own baseline sales level (e.g., location, size, local competition).
Random Slopes. Allow the effect of media spend on sales to differ by store (e.g., ad wear-in/wear-out, local creative fit).

Managerial interpretation. Random effects quantify heterogeneity. They reveal which stores/customers respond more to the same tactic and support targeted allocation (e.g., heavier budget where responsiveness is high).

1.1.3.2 Centering (Grand-Mean vs Group-Mean)

Grand-mean centering subtracts the overall mean from a predictor. The intercept is interpreted at an average level of spend. This is useful for cross-level interactions and more stable estimation.
Group-mean centering subtracts the unit’s own mean (e.g., a store’s weekly spend minus its average spend). This isolates within-store effects from between-store differences.

Marketing example. If a store usually spends $8k/week on social and spends $10k this week, group-mean centering evaluates the lift relative to that store’s usual level, not relative to other stores.

1.1.3.3 Covariance Structures

The random-effects covariance matrix reports the variance of random intercepts and slopes and their correlation.
- A positive intercept–slope correlation suggests high-baseline stores also respond more to media (rich-get-richer).
- A negative correlation suggests high-baseline stores respond less (diminishing returns).

1.1.3.4 Nested vs. Cross-Classified

Nested: each unit belongs to one higher-level unit (weeks within store).
Cross-classified: a unit belongs to multiple classifications (customers influenced by region × channel). Cross-classified MLMs can model both higher-level influences.

1.2 2. Conceptual Figures

All figures below are conceptual and use small, simulated points/lines purely for illustration.

1.2.1 2.1 Nested data structure: stores → weeks

# Draw a simple nested tree using base plotting (text & segments)
plot.new(); plot.window(xlim=c(0,10), ylim=c(0,10))
# Level 2: stores
text(2, 9, "Store 1", cex=1)
text(5, 9, "Store 2", cex=1)
text(8, 9, "Store 3", cex=1)
# Level 1: weeks per store
for (i in c(2,5,8)) {
  segments(i, 8.7, i, 6.5)
  for (w in 1:4) {
    y <- 6.5 - w*0.8
    segments(i, 6.5, i-0.8 + 0.5*w, y)
    text(i-0.8 + 0.5*w, y, paste0("Week ", w), cex=0.9)
  }
}
box()

Figure A. Nested data structure: weeks (Level-1) nested within stores (Level-2).

1.2.2 2.2 Random intercept vs random slope (two stores)

par(mfrow=c(1,2), mar=c(4,4,2,1))
# Left: random intercept, same slope
x <- 1:10
y1 <- 20 + 2*x + rnorm(10,0,1)
y2 <- 30 + 2*x + rnorm(10,0,1)
plot(x, y1, pch=16, xlab="Week", ylab="Sales (units)", main="Random Intercept")
abline(lm(y1~x), lwd=2)
points(x, y2, pch=1)
abline(lm(y2~x), lwd=2, lty=2)

# Right: random slope, similar intercepts
y3 <- 25 + 1*x + rnorm(10,0,1)
y4 <- 25 + 3*x + rnorm(10,0,1)
plot(x, y3, pch=16, xlab="Week", ylab="Sales (units)", main="Random Slope")
abline(lm(y3~x), lwd=2)
points(x, y4, pch=1)
abline(lm(y4~x), lwd=2, lty=2)

Figure B. Random intercept (left) vs random slope (right). Left: stores differ in baseline; Right: stores differ in responsiveness to time/media.

par(mfrow=c(1,1))

1.2.3 2.3 Centering illustration (grand-mean vs group-mean)

par(mfrow=c(1,2), mar=c(4,4,2,1))
set.seed(1)
# Simulate 2 stores with media spend
store <- rep(c("A","B"), each=10)
week  <- rep(1:10, 2)
media <- c(rnorm(10, 8, 1.2), rnorm(10, 12, 1.5))  # B spends more on average
sales <- 15 + 0.9*media + ifelse(store=="B", 5, 0) + rnorm(20,0,1.5)

# grand-mean centering
media_gmc <- media - mean(media)
plot(media_gmc, sales, pch=ifelse(store=="A",16,1), xlab="Media (grand-mean centered)", ylab="Sales")
abline(lm(sales~media_gmc), lwd=2)

# group-mean centering
media_gmc_group <- ave(media, store, FUN=function(z) z - mean(z))
plot(media_gmc_group, sales, pch=ifelse(store=="A",16,1), xlab="Media (group-mean centered)", ylab="Sales")
abline(lm(sales~media_gmc_group), lwd=2)

Figure C. Centering. Left: grand-mean centering (subtract overall mean). Right: group-mean centering (subtract store mean). Intercepts change meaning accordingly.

par(mfrow=c(1,1))

1.2.4 2.4 Random-effects covariance (conceptual matrix)

mat <- matrix(c(1.0, 0.4, 0.4, 0.8), nrow=2, byrow=TRUE,
              dimnames=list(c("Intercept","Slope(media)"), c("Intercept","Slope(media)")))
# display as a tile heatmap for intuition
library(ggplot2)
dfm <- as.data.frame(as.table(mat))
ggplot(dfm, aes(Var2, Var1, fill=Freq)) +
  geom_tile() + geom_text(aes(label=sprintf("%.2f", Freq))) +
  scale_fill_gradient(low="#E6F0FF", high="#2B6CB0") +
  labs(x="", y="", fill="Value") + theme_minimal()

Figure D. Conceptual random-effects covariance matrix: variances on the diagonal; off-diagonal entries indicate correlation (e.g., intercept–slope).

2 Part II — Applied Tutorial in R using Marketing Panel data

Prerequisite: Review Part I for conceptual definitions, especially random intercepts (Section 1.3), random slopes, centering, and covariance (Figures B–D).

2.1 3. Data and Structure

We use the marketing dataset from datarium (CRAN). It contains weekly media spend (YouTube, Facebook, Newspaper; in $k) and Sales. We reshape rows into a panel: 20 stores × 10 weeks = 200 observations (weeks nested within stores).

# install packages if needed
pkgs <- c("datarium","lme4","lmerTest","dplyr","ggplot2","performance","broom.mixed")
to_install <- pkgs[!sapply(pkgs, requireNamespace, quietly = TRUE)]
if(length(to_install)) install.packages(to_install)

library(datarium)
library(lme4); library(lmerTest)
library(dplyr); library(ggplot2)
library(performance); library(broom.mixed)

data("marketing", package = "datarium")
# Create store-week panel: 20 stores x 10 weeks = 200 rows
n_store <- 20; n_week <- 10
stopifnot(nrow(marketing) >= n_store * n_week)
panel <- marketing[1:(n_store*n_week), ] %>%
  mutate(store_id = factor(rep(1:n_store, each = n_week)),
         week     = rep(1:n_week, times = n_store))
glimpse(panel); summary(panel)

## Rows: 200
## Columns: 6
## $ youtube   <dbl> 276.12, 53.40, 20.64, 181.80, 216.96, 10.44, 69.00, 144.24, …
## $ facebook  <dbl> 45.36, 47.16, 55.08, 49.56, 12.96, 58.68, 39.36, 23.52, 2.52…
## $ newspaper <dbl> 83.04, 54.12, 83.16, 70.20, 70.08, 90.00, 28.20, 13.92, 1.20…
## $ sales     <dbl> 26.52, 12.48, 11.16, 22.20, 15.48, 8.64, 14.16, 15.84, 5.76,…
## $ store_id  <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ week      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10…

##     youtube          facebook       newspaper          sales      
##  Min.   :  0.84   Min.   : 0.00   Min.   :  0.36   Min.   : 1.92  
##  1st Qu.: 89.25   1st Qu.:11.97   1st Qu.: 15.30   1st Qu.:12.45  
##  Median :179.70   Median :27.48   Median : 30.90   Median :15.48  
##  Mean   :176.45   Mean   :27.92   Mean   : 36.66   Mean   :16.83  
##  3rd Qu.:262.59   3rd Qu.:43.83   3rd Qu.: 54.12   3rd Qu.:20.88  
##  Max.   :355.68   Max.   :59.52   Max.   :136.80   Max.   :32.40  
##                                                                   
##     store_id        week     
##  1      : 10   Min.   : 1.0  
##  2      : 10   1st Qu.: 3.0  
##  3      : 10   Median : 5.5  
##  4      : 10   Mean   : 5.5  
##  5      : 10   3rd Qu.: 8.0  
##  6      : 10   Max.   :10.0  
##  (Other):140

2.2 4. Descriptive Summaries and Figures

panel %>%
  summarise(
    n_obs   = n(),
    n_store = n_distinct(store_id),
    mean_sales = mean(sales), sd_sales = sd(sales),
    mean_youtube = mean(youtube), mean_facebook = mean(facebook), mean_newspaper = mean(newspaper)
  )

Figure 1. Mean sales by week across stores (±SE).

avg_by_week <- panel %>% group_by(week) %>%
  summarise(mean_sales = mean(sales), sd = sd(sales), n = n(), se = sd/sqrt(n), .groups="drop")

ggplot(avg_by_week, aes(week, mean_sales)) +
  geom_point(size = 2) +
  geom_errorbar(aes(ymin = mean_sales - se, ymax = mean_sales + se), width = 0.15) +
  geom_line(linewidth = 0.8) +
  labs(x = "Week", y = "Mean sales (units)") +
  theme_minimal()

Figure 1. Mean sales by week across stores with standard-error bars.

Figure 2. Two example stores (baseline and trend differences).

two_ids <- levels(panel$store_id)[1:2]
mini <- dplyr::filter(panel, store_id %in% two_ids)
ggplot(mini, aes(week, sales, color = store_id, group = store_id)) +
  geom_point(size = 2) + geom_line(linewidth = 0.8) +
  labs(x = "Week", y = "Sales (units)") + theme_minimal() + guides(color="none")

Figure 2. Sales trajectories for two example stores, highlighting baseline and trend differences.

2.3 5. Baseline Model (OLS)

Syntax (recall Part I Section 1.3: fixed vs random). OLS treats all 200 rows as independent and does not include random effects.

m_OLS <- lm(sales ~ week + youtube + facebook + newspaper, data = panel)
summary(m_OLS)

## 
## Call:
## lm(formula = sales ~ week + youtube + facebook + newspaper, data = panel)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3620  -0.9588   0.3004   1.4047   3.4928 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.2893678  0.4716980   6.973 4.68e-11 ***
## week         0.0417883  0.0504827   0.828    0.409    
## youtube      0.0458158  0.0013974  32.787  < 2e-16 ***
## facebook     0.1876387  0.0086852  21.605  < 2e-16 ***
## newspaper   -0.0004013  0.0059258  -0.068    0.946    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.024 on 195 degrees of freedom
## Multiple R-squared:  0.8976, Adjusted R-squared:  0.8955 
## F-statistic: 427.2 on 4 and 195 DF,  p-value: < 2.2e-16

Interpretation (after output). OLS estimates population-average associations but ignores clustering by store. Standard errors are likely too small, overstating significance in repeated measures per store.

2.4 6. Random-Intercept Model (RI)

2.4.1 6.1 Model Syntax

Equation (random intercepts):
\[ \text{sales}_{it} = \beta_0 + \beta_1\,\text{week}_{it} + \beta_2\,\text{youtube}_{it} + \beta_3\,\text{facebook}_{it} + \beta_4\,\text{newspaper}_{it} + u_{0i} + \varepsilon_{it}, \] - $i$ = store, $t$ = week.
- $u_{0i}\sim\mathcal{N}(0,\sigma^2_{u0})$ captures baseline heterogeneity across stores.
- $\varepsilon_{it}\sim\mathcal{N}(0,\sigma^2)$ is within-store residual noise.

lme4 formula: sales ~ week + youtube + facebook + newspaper + (1 | store_id)
- (1 | store_id) adds a random intercept per store. (See Figure B in Part I.)

2.4.2 6.2 Fit the model

m_RI <- lmer(sales ~ week + youtube + facebook + newspaper + (1 | store_id), data = panel)
summary(m_RI)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: sales ~ week + youtube + facebook + newspaper + (1 | store_id)
##    Data: panel
## 
## REML criterion at convergence: 875.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.8465 -0.4620  0.1847  0.6942  1.6482 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  store_id (Intercept) 0.262    0.5118  
##  Residual             3.846    1.9610  
## Number of obs: 200, groups:  store_id, 20
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  3.307e+00  4.740e-01  1.857e+02   6.977 5.15e-11 ***
## week         4.128e-02  4.893e-02  1.767e+02   0.844    0.400    
## youtube      4.557e-02  1.384e-03  1.931e+02  32.920  < 2e-16 ***
## facebook     1.883e-01  8.574e-03  1.916e+02  21.963  < 2e-16 ***
## newspaper   -1.362e-04  5.860e-03  1.923e+02  -0.023    0.981    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) week   youtub facebk
## week      -0.587                     
## youtube   -0.494  0.046              
## facebook  -0.238 -0.127 -0.062       
## newspaper -0.327  0.132 -0.037 -0.360

2.4.3 6.3 Variance, ICC, and R²

vc_RI_df <- as.data.frame(VarCorr(m_RI)); vc_RI_df

icc_RI_obj <- performance::icc(m_RI)
icc_RI_val <- if("ICC_adjusted" %in% names(icc_RI_obj)) icc_RI_obj$ICC_adjusted else icc_RI_obj$ICC
r2_RI_vals <- performance::r2(m_RI); r2_RI_vals

## # R2 for Mixed Models
## 
##   Conditional R2: 0.902
##      Marginal R2: 0.895

2.4.4 6.4 Detailed Interpretation

What this model estimates. The RI model allows each store to start at its own baseline (random intercept) while estimating common slopes for Week and the three media channels. In other words, we separate structural store differences (e.g., location, size, loyal base) from tactical drivers (media and time).

Reading the fixed effects.
- The coefficient on YouTube (e.g., 0.046) means that, holding Facebook, Newspaper, and Week constant, a $1k increase in YouTube spend is associated with an average increase of that many sales units per week across stores. Interpret Facebook and Newspaper similarly.
- The coefficient on Week (e.g., 0.041) captures the average time trend — whether sales drift upward or downward across the 10 weeks, after controlling for media spend.

Reading the random effects (variance components).
- The random intercept variance for store_id (see VarCorr) quantifies between-store heterogeneity in baselines. A larger value indicates meaningful structural differences across locations; small values imply stores begin at similar levels once you control for media and week.
- The residual variance is within-store noise — week-to-week variation not explained by the predictors.

Intraclass correlation (ICC).
- The ICC (here ≈ 0.064) is the share of total variance attributable to between-store differences. For example, an ICC of 0.30 implies 30% of variation in sales is due to differences between stores themselves (baseline structure), while 70% is within-store week-to-week variation.

R² (marginal vs conditional).
- Marginal R² (from performance::r2) indicates the variance explained by fixed effects alone (Week + media).
- Conditional R² indicates variance explained by both fixed + random parts (i.e., after accounting for store baselines). A meaningful gap between conditional and marginal R² shows that modeling store baselines materially improves fit.

What should a marketer conclude?
- Stores differ systematically in their baseline potential. Evaluating campaign performance should therefore avoid comparing raw sales across stores without baseline adjustment.
- The fixed effects quantify average media productivity. These are the starting point for budget decisions before considering heterogeneous slopes.

Managerial insight. For reporting, present both baseline-adjusted store performance and the average media contributions. Use ICC to justify why “one-size-fits-all” conclusions can be misleading.

2.5 7. Random-Intercept + Random-Slope Model (RS)

2.5.1 7.1 Model Syntax (before code)

Equation (random slope on YouTube):
\[ \text{sales}_{it} = \beta_0 + \beta_1\,\text{week}_{it} + \beta_2\,\text{youtube}_{it} + \beta_3\,\text{facebook}_{it} + \beta_4\,\text{newspaper}_{it} + u_{0i} + u_{2i}\,\text{youtube}_{it} + \varepsilon_{it}. \] - $u_{0i}$ = random intercept; $u_{2i}$ = random slope for YouTube at store $i$.
- Random effects are jointly normal with covariance matrix $\begin{pmatrix}\sigma^2_{u0} & \sigma_{u0,u2}\\ \sigma_{u0,u2} & \sigma^2_{u2}\end{pmatrix}$.
- Intercept–slope correlation interprets whether high-baseline stores respond more/less to YouTube (Part I, Figure D).

lme4 formula: sales ~ week + youtube + facebook + newspaper + (1 + youtube | store_id)
- (1 + youtube | store_id) gives each store its own baseline and YouTube responsiveness (Part I, Figure B).

2.5.2 7.2 Fit the model

m_RS <- lmer(sales ~ week + youtube + facebook + newspaper + (1 + youtube | store_id), data = panel,
             control = lmerControl(optimizer = "bobyqa"))
summary(m_RS)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: sales ~ week + youtube + facebook + newspaper + (1 + youtube |  
##     store_id)
##    Data: panel
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 863.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.1605 -0.4898  0.0987  0.6592  2.0252 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  store_id (Intercept) 2.329e+00 1.525945      
##           youtube     3.973e-05 0.006303 -1.00
##  Residual             3.463e+00 1.860855      
## Number of obs: 200, groups:  store_id, 20
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   3.360623   0.563847  51.920582   5.960 2.24e-07 ***
## week          0.035352   0.048221 182.319218   0.733    0.464    
## youtube       0.044782   0.001928  20.318952  23.233 4.14e-16 ***
## facebook      0.199642   0.008397 191.630292  23.776  < 2e-16 ***
## newspaper    -0.002815   0.005718 191.950178  -0.492    0.623    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) week   youtub facebk
## week      -0.493                     
## youtube   -0.717  0.030              
## facebook  -0.172 -0.143 -0.048       
## newspaper -0.291  0.172 -0.012 -0.375
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

2.5.3 7.3 Model comparison & random-effects structure

anova(m_RI, m_RS)

r2_RS_vals <- performance::r2(m_RS); r2_RS_vals

## Random effect variances not available. Returned R2 does not account for random effects.

## # R2 for Mixed Models
## 
##   Conditional R2: NA
##      Marginal R2: 0.911

vc_RS_df <- as.data.frame(VarCorr(m_RS)); vc_RS_df

# Safe extraction of intercept–slope correlation
rho_RS <- tryCatch({
  submat <- VarCorr(m_RS)$store_id
  as.numeric(attr(submat, "correlation")["(Intercept)","youtube"])
}, error = function(e) NA_real_)
rho_RS

## [1] -1

2.5.4 7.4 Detailed Interpretation

What this model adds relative to RI.
The RS model keeps store-specific baselines and allows the effect of YouTube spend to vary by store. This recognizes that some stores will convert YouTube impressions more efficiently than others (creative fit, local competition, or audience mix).

Reading the fixed effects.
- The fixed YouTube coefficient (e.g., 0.045) is the average responsiveness across stores. It is not any one store’s effect; each store’s slope equals this average plus its random deviation.
- Fixed effects for Facebook and Newspaper remain population averages. If those channels also plausibly vary by store, you could extend the model to include additional random slopes — but only if justified by data and business logic.

Reading the random effects.
- The random slope variance for YouTube in VarCorr quantifies how much slopes differ across stores. A larger value indicates meaningful heterogeneity in channel effectiveness.
- The intercept–slope correlation (here ≈ -1) answers: Do high-baseline stores also respond more (positive), less (negative), or similarly (near zero) to YouTube?
- Positive correlation → success breeds success: already-strong stores gain more per $1k.
- Negative correlation → diminishing returns: weaker stores may get relatively more lift per $1k.

Model comparison (RI vs RS).
- The likelihood-ratio test and AIC compare whether adding random slopes significantly improves fit. If RS is favored, the data support store-specific YouTube responsiveness.
- Compare conditional R² between RI and RS. A higher value under RS indicates that allowing slope heterogeneity explains more variance.

Concrete reading of numbers.
- Suppose the random slope variance is NA. This means the store-to-store spread in YouTube effectiveness is non-trivial. If the average slope is 0.045, individual stores may be above or below that value depending on their random effect.
- If the intercept–slope correlation is negative, increasing spend in high-baseline stores may deliver smaller incremental gains than expected; consider testing creative or channel diversification there.

What should a marketer conclude?
- One budget curve does not fit all. YouTube effectiveness varies across stores; allocate incrementally where random slopes are higher.
- For low-slope stores, test message–market fit (creative variants, frequency caps) or channel mix before increasing spend.

Managerial insight (actionable plan).
1) Extract store-level random slopes with ranef(m_RS)$store_id[,'youtube'].
2) Rank stores by slope; create tiers (top, middle, bottom).
3) Shift budget toward the top tier for the next cycle; run experiments (A/B creative, audience) in the bottom tier.
4) Refit the model after the cycle; compare slope distributions to measure whether actions reduced heterogeneity.

2.6 8. Communication Figures

Figure 3. Population-average fitted line (RI) vs observed means.

pred_grid <- expand.grid(
  week = unique(panel$week),
  youtube = mean(panel$youtube),
  facebook = mean(panel$facebook),
  newspaper = mean(panel$newspaper)
)
pred_grid$pred <- predict(m_RI, newdata = pred_grid, re.form = NA)

ggplot() +
  geom_point(data = avg_by_week, aes(week, mean_sales), size = 2) +
  geom_line(data = avg_by_week, aes(week, mean_sales), alpha = 0.5, linewidth = 0.8) +
  geom_line(data = pred_grid, aes(week, pred), linewidth = 1) +
  labs(x = "Week", y = "Sales (units)") +
  theme_minimal()

Figure 3. Mean sales by week with the population-average fitted line from the RI model.

Figure 4. One store’s observed vs RS fitted trajectory.

one_id <- levels(panel$store_id)[1]
one_dat <- dplyr::filter(panel, store_id == one_id)
one_dat$pred <- predict(m_RS, newdata = one_dat)  # includes random effects
ggplot(one_dat, aes(week, sales)) +
  geom_point(size = 2) +
  geom_line(aes(y = pred), linewidth = 1) +
  labs(x = "Week", y = "Sales (units)") +
  theme_minimal()

Figure 4. Observed vs fitted sales trajectory for a single store under RS (random YouTube slope).

2.7 9. Interpretation Templates

# Precompute summary numbers for inline use
vc_RI_df <- as.data.frame(VarCorr(m_RI))
vc_RS_df <- as.data.frame(VarCorr(m_RS))
beta_youtube <- unname(fixef(m_RS)["youtube"])
var_intercept_RI <- vc_RI_df$vcov[vc_RI_df$grp=="store_id" & vc_RI_df$var1=="(Intercept)" & is.na(vc_RI_df$var2)]
var_slope_RS <- vc_RS_df$vcov[vc_RS_df$grp=="store_id" & vc_RS_df$var1=="youtube" & vc_RS_df$var2=="youtube"]
r2_RS_vals <- performance::r2(m_RS)

## Random effect variances not available. Returned R2 does not account for random effects.

R2m_RS <- unname(r2_RS_vals$R2_marginal)
R2c_RS <- unname(r2_RS_vals$R2_conditional)
icc_RI_obj <- performance::icc(m_RI)
icc_RI_val <- if("ICC_adjusted" %in% names(icc_RI_obj)) icc_RI_obj$ICC_adjusted else icc_RI_obj$ICC

Fixed effect (YouTube). “Holding other media constant, a $1k increase in YouTube spend is associated with an average increase of 0.045 units in weekly sales.”
Random intercept variance. “Baseline sales vary across stores (Var = 0.262). This supports store-specific benchmarking.”
Random slope variance (YouTube). “YouTube responsiveness differs across stores (Var = NA). Budget allocation should reflect this heterogeneity.”
Intercept–slope correlation. “Correlation = -1 (baseline vs YouTube responsiveness).”
ICC. “0.064 of variance is between stores; the remainder is within stores across weeks.”
R² (marginal/conditional). “Marginal R² = 0.911, Conditional R² = NA.”

2.8 10. Model Diagnostics and Managerial Implications

2.8.1 10.1 Diagnostic reasoning without plots

Convergence & singular fits. If you see messages about singular fit or near-zero variance components, the model is too complex for the data; simplify by removing unnecessary random slopes or centering predictors (Part I, Section 1.3).
Variance components sanity check. Random-effect variances should be non-negative and reasonably sized. Extremely small slope variance suggests little heterogeneity; extremely large values may indicate overfitting or multicollinearity among predictors.
ICC as a gating metric. If ICC is near zero, baseline differences are negligible and a random-intercept term may be unnecessary; if ICC is moderate/high, MLM is justified.
Centering to stabilize estimation. Grand-mean centering of media spend makes the intercept interpretable (sales at average spend) and often improves optimizer stability, especially in random-slope models.
Collinearity awareness. Media channels often co-move. Inspect pairwise correlations and variance inflation factors in a separate OLS to understand whether two channels are competing to explain the same variance. If multicollinearity is high, consider domain-driven variable selection or regularization (outside the scope of lme4).
Model parsimony principle. Start with RI, move to RS only when theory and diagnostics support heterogeneous responsiveness. Prefer the simplest model that answers the business question.

2.8.2 10.2 Turning estimates into marketing decisions

Budget reallocation. Use store-level random slopes to rank locations by return-to-spend; allocate incremental budget to the upper tier and hold/experiment in the lower tier.
Targeted experimentation. For stores with low slopes, change creative, audience, or frequency caps before adding budget. Measure whether slope dispersion shrinks after experimentation.
Baseline-aware KPIs. Report baseline-adjusted performance so stores aren’t penalized for structural disadvantages (footfall, location).
Cadence for refits. Re-estimate the model on a regular cadence (e.g., monthly) and track movement in random slopes as a KPI for media program health.
Communicating uncertainty. When sharing store-specific effects, emphasize that random effects are estimates with uncertainty; treat them as decision guides, not deterministic truths.

2.9 11. Practice Exercises

Compute and interpret ICC from the RI model for an executive audience.
Compare RI vs RS (random slope on YouTube). Should the team model heterogeneous responsiveness?
Extract store-level random slopes and list top 5 stores for incremental YouTube budget.
Grand-mean center media spends and refit RS; discuss interpretation changes.
Write a 3–5 sentence memo summarizing findings and proposed budget reallocation.

# 1) ICC
performance::icc(m_RI)

# 2) Model comparison
anova(m_RI, m_RS)

# 3) Random slopes per store
re_RS <- ranef(m_RS)$store_id
head(re_RS[order(re_RS$youtube, decreasing = TRUE), , drop = FALSE])

# 4) Centered predictors
panel_c <- panel %>% mutate(across(c(youtube, facebook, newspaper), ~ .x - mean(.x)))
m_RS_c <- lmer(sales ~ week + youtube + facebook + newspaper + (1 + youtube | store_id), data = panel_c)
summary(m_RS_c)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: sales ~ week + youtube + facebook + newspaper + (1 + youtube |  
##     store_id)
##    Data: panel_c
## 
## REML criterion at convergence: 864.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.0834 -0.4964  0.1081  0.6570  2.0544 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  store_id (Intercept) 3.360e-01 0.579683      
##           youtube     4.324e-05 0.006576 -0.90
##  Residual             3.382e+00 1.839029      
## Number of obs: 200, groups:  store_id, 20
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  16.734673   0.321195  97.149648  52.101  < 2e-16 ***
## week          0.034851   0.047668 181.248214   0.731    0.466    
## youtube       0.044692   0.001970  17.814534  22.692 1.36e-14 ***
## facebook      0.200010   0.008362 192.234903  23.918  < 2e-16 ***
## newspaper    -0.002688   0.005693 191.103261  -0.472    0.637    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) week   youtub facebk
## week      -0.816                     
## youtube   -0.303  0.029              
## facebook   0.132 -0.144 -0.052       
## newspaper -0.144  0.173 -0.013 -0.375
## optimizer (nloptwrap) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 3.30756 (tol = 0.002, component 1)
## Model is nearly unidentifiable: very large eigenvalue
##  - Rescale variables?

2.10 12. Reproducibility

sessionInfo()

## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Australia/Melbourne
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] broom.mixed_0.2.9.6 performance_0.14.0  dplyr_1.1.4        
## [4] lmerTest_3.1-3      lme4_1.1-37         Matrix_1.7-2       
## [7] datarium_0.1.0      ggplot2_3.5.2      
## 
## loaded via a namespace (and not attached):
##  [1] future_1.34.0       tidyr_1.3.1         sass_0.4.9         
##  [4] generics_0.1.3      lattice_0.22-6      listenv_0.9.1      
##  [7] digest_0.6.37       magrittr_2.0.3      evaluate_1.0.3     
## [10] grid_4.4.2          fastmap_1.2.0       jsonlite_1.9.0     
## [13] backports_1.5.0     purrr_1.0.4         scales_1.3.0       
## [16] codetools_0.2-20    numDeriv_2016.8-1.1 jquerylib_0.1.4    
## [19] reformulas_0.4.0    Rdpack_2.6.2        cli_3.6.4          
## [22] rlang_1.1.5         rbibutils_2.3       parallelly_1.42.0  
## [25] munsell_0.5.1       splines_4.4.2       withr_3.0.2        
## [28] cachem_1.1.0        yaml_2.3.10         parallel_4.4.2     
## [31] tools_4.4.2         nloptr_2.1.1        minqa_1.2.8        
## [34] colorspace_2.1-1    forcats_1.0.0       globals_0.16.3     
## [37] boot_1.3-31         broom_1.0.7         vctrs_0.6.5        
## [40] R6_2.6.1            lifecycle_1.0.4     MASS_7.3-64        
## [43] furrr_0.3.1         insight_1.3.1       pkgconfig_2.0.3    
## [46] pillar_1.10.1       bslib_0.9.0         gtable_0.3.6       
## [49] glue_1.8.0          Rcpp_1.0.14         xfun_0.52          
## [52] tibble_3.2.1        tidyselect_1.2.1    rstudioapi_0.17.1  
## [55] knitr_1.49          farver_2.1.2        htmltools_0.5.8.1  
## [58] nlme_3.1-167        rmarkdown_2.29      labeling_0.4.3     
## [61] compiler_4.4.2

Multilevel Modeling for Marketing Analytics — Theory & Application (Expanded Teaching Version)

Aakash Mayur Shah

09 October 2025