How to use this document: Every code block is surrounded by two parts:
Before you run it — what the block does and why we need it.
After you run it — how to interpret the output and what it means in business terms.

1 1. What is Multinomial Logit (MNL)?

Before you run it (concept):
MNL models the chance that a decision-maker picks one option from many (brand, channel, SKU). Each option has a latent utility that rises with good features (e.g., catch) and falls with bad ones (e.g., price). Probabilities come from a softmax (logit) over utilities. Coefficients are relative to a baseline alternative.

Alternative-specific variables: vary by option (e.g., price.boat vs price.beach).
Individual-specific variables: vary by person (e.g., income).
IIA: odds between two options don’t depend on other options (good first approximation; revisit if options are near-identical).

After you run it (what to remember):
- A coefficient $\beta$ on an alt-specific feature for an alternative means log-odds vs baseline move by $\beta$ per unit; odds multiply by $e^{\beta}$.
- Probability impacts require prediction because all options share the same probability “pie”.

2 2. Setup

Before you run it:
Install/load required packages, set chunk defaults, and define a safe table helper kbl_safe() that formats nicely in HTML but falls back gracefully elsewhere.

suppressPackageStartupMessages({
  pkgs <- c("nnet","VGAM","mlogit","lmtest","pscl","tidyverse","knitr","kableExtra")
  to_install <- setdiff(pkgs, rownames(installed.packages()))
  if (length(to_install)) install.packages(to_install, repos = "https://cloud.r-project.org")
  library(nnet); library(VGAM); library(mlogit); library(lmtest); library(pscl)
  library(tidyverse); library(knitr); library(kableExtra)
})
set.seed(123)
knitr::opts_chunk$set(comment = NA, fig.align = "center", fig.width = 7, fig.height = 4.5,
                      message = FALSE, warning = FALSE, echo = TRUE)

# tables helper
if (!exists("kbl_safe")) {
  kbl_safe <- function(x, ...) {
    tb <- knitr::kable(x, ...)
    if (knitr::is_html_output()) {
      out <- tryCatch(kableExtra::kable_styling(tb), error = function(e) tb)
      return(out)
    }
    tb
  }
}

After you run it:
- No printed output expected here. If you see installation messages, they are benign.
- You now have all packages and helper functions available.

3 3. Data preview (wide format)

Before you run it:
Load the Fishing dataset (one row per chooser). Columns like price.boat and catch.pier are alt-specific; income is individual-specific.

data("Fishing", package = "mlogit")
str(Fishing)

'data.frame':   1182 obs. of  10 variables:
 $ mode         : Factor w/ 4 levels "beach","pier",..: 4 4 3 2 3 4 1 4 3 3 ...
 $ price.beach  : num  157.9 15.1 161.9 15.1 106.9 ...
 $ price.pier   : num  157.9 15.1 161.9 15.1 106.9 ...
 $ price.boat   : num  157.9 10.5 24.3 55.9 41.5 ...
 $ price.charter: num  182.9 34.5 59.3 84.9 71 ...
 $ catch.beach  : num  0.0678 0.1049 0.5333 0.0678 0.0678 ...
 $ catch.pier   : num  0.0503 0.0451 0.4522 0.0789 0.0503 ...
 $ catch.boat   : num  0.26 0.157 0.241 0.164 0.108 ...
 $ catch.charter: num  0.539 0.467 1.027 0.539 0.324 ...
 $ income       : num  7083 1250 3750 2083 4583 ...

kbl_safe(head(Fishing))

mode	price.beach	price.pier	price.boat	price.charter	catch.beach	catch.pier	catch.boat	catch.charter	income
charter	157.930	157.930	157.930	182.930	0.0678	0.0503	0.2601	0.5391	7083.332
charter	15.114	15.114	10.534	34.534	0.1049	0.0451	0.1574	0.4671	1250.000
boat	161.874	161.874	24.334	59.334	0.5333	0.4522	0.2413	1.0266	3750.000
pier	15.134	15.134	55.930	84.930	0.0678	0.0789	0.1643	0.5391	2083.333
boat	106.930	106.930	41.514	71.014	0.0678	0.0503	0.1082	0.3240	4583.332
charter	192.474	192.474	28.934	63.934	0.5333	0.4522	0.1665	0.3975	4583.332

After you run it (interpret the printout):
- Confirm mode lists the chosen alternative (levels: beach, boat, charter, pier).
- See price.* and catch.* present for each alternative.
- income does not change across alternatives within a chooser.

4 4. Data preparation (3 ways)

We need long choice format: one row per (chooser, alternative) with a mode flag marking which alternative was chosen.

4.1 4A. Shortcut: `mlogit.data`

Before you run it:
Convert the wide table to the long choice format expected by mlogit().

fishing_mlogit <- mlogit.data(Fishing,
                              shape = "wide",
                              choice = "mode",
                              varying = 2:9,
                              sep = ".")
kbl_safe(head(as.data.frame(fishing_mlogit)[, c("chid","alt","mode","price","catch","income")]))

chid	alt	mode	price	catch	income
1	beach	FALSE	157.930	0.0678	7083.332
1	boat	FALSE	157.930	0.2601	7083.332
1	charter	TRUE	182.930	0.5391	7083.332
1	pier	FALSE	157.930	0.0503	7083.332
2	beach	FALSE	15.114	0.1049	1250.000
2	boat	FALSE	10.534	0.1574	1250.000

After you run it (interpret the table):
- chid identifies the chooser; alt names the alternative.
- mode is TRUE exactly once per chid (the chosen row).
- price, catch vary by alt; income repeats within chid.

4.2 4B. Tidyverse: `pivot_longer`

Before you run it:
Recreate the same long format using modern tidyverse — helpful to teach variable reshaping.

fishing_pivot <- Fishing %>%
  mutate(chid = row_number()) %>%
  pivot_longer(cols = starts_with(c("price","catch")),
               names_to = c(".value","alt"),
               names_sep = "\\.") %>%
  mutate(mode = (mode == alt))
kbl_safe(head(as.data.frame(fishing_pivot)[, c("chid","alt","mode","price","catch","income")]))

chid	alt	mode	price	catch	income
1	beach	FALSE	157.930	0.0678	7083.332
1	pier	FALSE	157.930	0.0503	7083.332
1	boat	FALSE	157.930	0.2601	7083.332
1	charter	TRUE	182.930	0.5391	7083.332
2	beach	FALSE	15.114	0.1049	1250.000
2	pier	FALSE	15.114	0.0451	1250.000

After you run it:
- You should see the same structure as fishing_mlogit.
- If mode is not exactly one TRUE per chid, check the transformation.

4.3 4C. Classic tidyverse: `gather → separate → spread`

Before you run it (what/why):
Older verbs make the data engineering explicit: 1) gather melts alt-specific columns to expose the alternative dimension.
2) separate splits names like price.boat into var="price" and alt="boat".
3) spread pivots back so each row has columns price and catch.
4) mode == alt turns the chosen label into a TRUE/FALSE flag per row.

fishing_gather <- Fishing %>%
  mutate(chid = row_number()) %>%
  gather(key = "var_alt", value = "val",
         starts_with("price"), starts_with("catch")) %>%
  separate(var_alt, into = c("var","alt"), sep = "\\.") %>%
  spread(var, val) %>%
  mutate(mode = (mode == alt))
kbl_safe(head(as.data.frame(fishing_gather)[, c("chid","alt","mode","price","catch","income")]))

chid	alt	mode	price	catch	income
604	beach	TRUE	5.934	0.0678	416.6667
604	boat	FALSE	7.740	0.0014	416.6667
604	charter	FALSE	36.740	0.0029	416.6667
604	pier	FALSE	5.934	0.0789	416.6667
893	beach	TRUE	3.870	0.5333	416.6667
893	boat	FALSE	47.730	0.2413	416.6667

After you run it:
- Confirm one row per (chid, alt) with price, catch, and a logical mode.
- Sanity check: exactly one TRUE per chid.

4.4 4D. Equivalence check

Before you run it:
Make sure all three routes created equivalent long tables (critical for trust).

check1 <- all.equal(
  as.data.frame(fishing_mlogit)[,c("chid","alt","mode","price","catch","income")],
  as.data.frame(fishing_pivot)[,c("chid","alt","mode","price","catch","income")]
)
check2 <- all.equal(
  as.data.frame(fishing_mlogit)[,c("chid","alt","mode","price","catch","income")],
  as.data.frame(fishing_gather)[,c("chid","alt","mode","price","catch","income")]
)
check1; check2

[1] "Attributes: < Component \"class\": Lengths (2, 1) differ (string compare on first 1) >"
[2] "Component \"alt\": 'current' is not a factor"                                          
[3] "Component \"mode\": 2096 element mismatches"                                           
[4] "Component \"price\": Mean relative difference: 0.8301712"                              
[5] "Component \"catch\": Mean relative difference: 1.243696"

[1] "Attributes: < Component \"class\": Lengths (2, 1) differ (string compare on first 1) >"
[2] "Component \"chid\": Mean relative difference: 0.7143749"                               
[3] "Component \"alt\": 'current' is not a factor"                                          
[4] "Component \"mode\": 1720 element mismatches"                                           
[5] "Component \"price\": Mean relative difference: 0.9345275"                              
[6] "Component \"catch\": Mean relative difference: 1.222338"                               
[7] "Component \"income\": Mean relative difference: 0.6685676"

After you run it (interpretation):
- TRUE or Mean relative difference: 0 indicates equivalence. Any mismatches mean a reshaping issue.

5 5. Classification view (single row per chooser)

Before you run it:
Keep only the chosen row per chooser and treat alt as a multi-class outcome. Good for teaching; not ideal for what-ifs.

wide_df <- subset(fishing_mlogit, mode == TRUE)
wide_df$alt <- relevel(wide_df$alt, ref = "beach")  # baseline
table(wide_df$alt)


  beach    boat charter    pier 
    134     418     452     178

After you run it:
- You should see counts by chosen alternative. beach is the baseline for all coefficient contrasts below.

5.1 5A. `nnet::multinom`

Before you run it:
Fit a multinomial logistic regression with beach as the baseline class.

m_nnet <- multinom(alt ~ price + catch + income, data = wide_df, trace = FALSE)
summary(m_nnet)

Call:
multinom(formula = alt ~ price + catch + income, data = wide_df, 
    trace = FALSE)

Coefficients:
        (Intercept)        price      catch        income
boat      0.9880772  0.003555725 -1.2997215  6.930617e-05
charter   0.1588430  0.022700714  1.4116330 -1.788990e-04
pier      1.0836892 -0.003027237 -0.9494361 -1.286185e-04

Std. Errors:
         (Intercept)       price        catch       income
boat    1.767921e-05 0.003155494 2.992347e-06 3.161360e-05
charter 1.364347e-05 0.003160311 9.419144e-06 3.691491e-05
pier    2.304065e-05 0.003965765 5.115583e-06 4.049548e-05

Residual Deviance: 2531.988 
AIC: 2555.988

After you run it (how to read):
- One block of coefficients per non-baseline alternative (boat/charter/pier).
- A price coefficient of −0.05 under boat → +1 in boat price multiplies odds(boat vs beach) by exp(−0.05) ≈ 0.95 (≈ −5%).
- A catch coefficient of +0.20 under charter → exp(0.20) ≈ 1.22 (≈ +22% in odds).
- Intercepts behave as ASCs vs baseline. Signs should match intuition (price negative, catch positive).

Before you run it (Wald p-values):
Compute quick, per-coefficient significance.

coefs <- summary(m_nnet)$coefficients
ses   <- summary(m_nnet)$standard.errors
pvals <- 2*(1 - pnorm(abs(coefs/ses)))
kbl_safe(round(pvals, 3))

	price	income
boat	0.260	0.028
charter	0.000	0.000
pier	0.445	0.001

After you run it:
- Small p-values (e.g., < 0.05) suggest the predictor helps explain that specific contrast vs baseline. Prefer LR tests for overall signal.

Before you run it (LR test vs null):
Check whether predictors jointly improve the fit.

m0_nnet <- multinom(alt ~ 1, data = wide_df, trace = FALSE)
anova(m0_nnet, m_nnet, test = "Chisq")

After you run it:
- A small p-value means the full model has explanatory power beyond a model with only intercepts.

Before you run it (confusion matrix):
A quick intuition check of in-sample predictions.

pred_nnet_class <- predict(m_nnet, type = "class")
tab_nnet <- table(True = wide_df$alt, Pred = pred_nnet_class)
acc_nnet <- mean(pred_nnet_class == wide_df$alt)
tab_nnet; acc_nnet

         Pred
True      beach boat charter pier
  beach       0   96      38    0
  boat        0  314     102    2
  charter     0  138     314    0
  pier        0  129      46    3

[1] 0.5338409

After you run it:
- Accuracy is for intuition only; in multi-class choice with uneven shares, it can be misleading for model selection.

5.2 5B. `VGAM::vglm`

Before you run it:
Fit VGAM’s multinomial regression with the same predictors and baseline.

m_vgam <- vglm(alt ~ price + catch + income,
               family = multinomial(refLevel = "beach"),
               data   = wide_df)
summary(m_vgam)


Call:
vglm(formula = alt ~ price + catch + income, family = multinomial(refLevel = "beach"), 
    data = wide_df)

Coefficients: 
                Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  9.880e-01  2.231e-01   4.429 9.49e-06 ***
(Intercept):2  1.589e-01  2.367e-01   0.671 0.502172    
(Intercept):3  1.084e+00  2.584e-01   4.194 2.75e-05 ***
price:1        3.557e-03  3.343e-03   1.064 0.287253    
price:2        2.270e-02  3.333e-03   6.811 9.71e-12 ***
price:3       -3.026e-03  4.180e-03  -0.724 0.469130    
catch:1       -1.300e+00  3.406e-01  -3.816 0.000136 ***
catch:2        1.412e+00  2.760e-01   5.115 3.14e-07 ***
catch:3       -9.493e-01  4.023e-01  -2.360 0.018291 *  
income:1       6.931e-05  4.239e-05   1.635 0.102074    
income:2      -1.789e-04  4.891e-05  -3.658 0.000254 ***
income:3      -1.286e-04  5.514e-05  -2.332 0.019683 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1]), 
log(mu[,4]/mu[,1])

Residual deviance: 2531.988 on 3534 degrees of freedom

Log-likelihood: -1265.994 on 3534 degrees of freedom

Number of Fisher scoring iterations: 5 

No Hauck-Donner effect found in any of the estimates


Reference group is level  1  of the response

After you run it:
- Interpretation is identical to multinom: log-odds vs baseline.
- Check that signs make sense and p-values indicate useful signal.

Before you run it (LR test):

m0_vgam <- vglm(alt ~ 1, family = multinomial(refLevel = "beach"), data = wide_df)
VGAM::lrtest(m0_vgam, m_vgam)

Likelihood ratio test

Model 1: alt ~ 1
Model 2: alt ~ price + catch + income
   #Df  LogLik Df  Chisq Pr(>Chisq)    
1 3543 -1497.7                         
2 3534 -1266.0 -9 463.46  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After you run it:
- Significant LR test → predictors jointly matter.

6 6. Choice model view (recommended for marketing)

Before you run it:
This is the proper discrete choice model using one row per (chooser, alt) and the formula mode ~ price + catch | income (alt-specific on the left, individual-specific on the right).

m_mlogit <- mlogit(mode ~ price + catch | income,
                   data     = fishing_mlogit,
                   reflevel = "beach")
summary(m_mlogit)


Call:
mlogit(formula = mode ~ price + catch | income, data = fishing_mlogit, 
    reflevel = "beach", method = "nr")

Frequencies of alternatives:choice
  beach    boat charter    pier 
0.11337 0.35364 0.38240 0.15059 

nr method
7 iterations, 0h:0m:0s 
g'(-H)^-1g = 1.37E-05 
successive function values within tolerance limits 

Coefficients :
                       Estimate  Std. Error  z-value  Pr(>|z|)    
(Intercept):boat     5.2728e-01  2.2279e-01   2.3667 0.0179485 *  
(Intercept):charter  1.6944e+00  2.2405e-01   7.5624 3.952e-14 ***
(Intercept):pier     7.7796e-01  2.2049e-01   3.5283 0.0004183 ***
price               -2.5117e-02  1.7317e-03 -14.5042 < 2.2e-16 ***
catch                3.5778e-01  1.0977e-01   3.2593 0.0011170 ** 
income:boat          8.9440e-05  5.0067e-05   1.7864 0.0740345 .  
income:charter      -3.3292e-05  5.0341e-05  -0.6613 0.5084031    
income:pier         -1.2758e-04  5.0640e-05  -2.5193 0.0117582 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood: -1215.1
McFadden R^2:  0.18868 
Likelihood ratio test : chisq = 565.17 (p.value = < 2.22e-16)

After you run it (how to read):
- For each non-baseline alternative, read coefficients on price and catch.
- Example: β_price_boat = -0.05 → +1 in boat price multiplies odds(boat vs beach) by exp(-0.05) ≈ 0.95.
- income appears as alternative-specific effects (per non-baseline alt).
- Overall fit: note the Log-Likelihood, LR test, McFadden’s ρ², AIC/BIC.

Before you run it (LR test vs null):

m0_mlogit <- mlogit(mode ~ 1 | 1, data = fishing_mlogit, reflevel = "beach")
lmtest::lrtest(m0_mlogit, m_mlogit)

After you run it:
- Significant LR test indicates the predictors, as a set, explain choice beyond random.

6.1 6.1 Coefficient interpretation deep dive

Before you run it:
Translate key coefficients to odds ratios and plain-English statements.

or_tab <- exp(summary(m_mlogit)$coefficients)
kbl_safe(round(or_tab, 3))

	x
(Intercept):boat	1.694
(Intercept):charter	5.443
(Intercept):pier	2.177
price	0.975
catch	1.430
income:boat	1.000
income:charter	1.000
income:pier	1.000

After you run it:
- Each number is an odds ratio vs beach.
- OR < 1 (e.g., 0.95) means odds decrease per unit; OR > 1 means odds increase.
- For larger realistic changes (e.g., +20), multiply the coefficient first: OR = exp(β × 20).

6.2 6.2 Why `mlogit` is better for marketing inference & what-ifs

6.3 6.3 Managerial translation patterns

“Sign makes sense? price ↓ odds; catch ↑ odds.”
“Scale to realistic $\Delta x$: OR for $\beta\cdot \Delta x$ is $\exp(\beta \Delta x)$.”
“State the effect in odds; then show probability movement with marginal effects.”
“Be explicit about units (rescale price if needed, e.g., per $10).”

6.4 6.4 Checklist while reading output

Baseline stated (here: beach).
Signs match intuition (price $<0$, catch $>0$).
Magnitude via odds ratios $e^{\beta}$.
Significance: SEs, p-values; LR test vs null.
Substitution: who gains when one option loses? (see AMEs/elasticities).
IIA caveat and when to consider nested/mixed logit.

Before you run it (context):
We want credible factor significance and what-if predictions. mlogit (not classification) uses the full choice set, respects variable roles, and updates all alternatives when one attribute changes.

Guard (ensures objects exist if you jump here):

if (!exists("wide_df")) {
  wide_df <- subset(fishing_mlogit, mode == TRUE)
  wide_df$alt <- stats::relevel(wide_df$alt, ref = "beach")
}
if (!exists("m_nnet")) {
  m_nnet <- nnet::multinom(alt ~ price + catch + income, data = wide_df, trace = FALSE)
}

Before you run it (contrast demo plan):
Compare a classification what-if against a proper choice-model what-if.

# Classification shares
p_class0 <- predict(m_nnet, type = "probs")
share_class0 <- colMeans(p_class0)

wide_df_whatif <- wide_df
wide_df_whatif$price[wide_df_whatif$alt == "boat"] <- wide_df_whatif$price[wide_df_whatif$alt == "boat"] + 20
p_class1 <- predict(m_nnet, newdata = wide_df_whatif, type = "probs")
share_class1 <- colMeans(p_class1)

kbl_safe(round(rbind(Classifier_Base = share_class0,
                     Classifier_BoatUp = share_class1), 3))

	beach	boat	charter	pier
Classifier_Base	0.113	0.354	0.382	0.151
Classifier_BoatUp	0.108	0.345	0.407	0.140

After you run it (interpretation):
- Classification only “sees” chosen rows, so many probabilities don’t update realistically. This can understate cross-substitution.

# Proper mlogit what-if
p0_m <- predict(m_mlogit, newdata = fishing_mlogit)
share0_m <- colMeans(p0_m)

whatif_m <- fishing_mlogit
whatif_m$price[whatif_m$alt == "boat"] <- whatif_m$price[whatif_m$alt == "boat"] + 20
p1_m <- predict(m_mlogit, newdata = whatif_m)
share1_m <- colMeans(p1_m)

kbl_safe(round(rbind(mlogit_Base = share0_m,
                     mlogit_BoatUp = share1_m), 3))

	beach	boat	charter	pier
mlogit_Base	0.113	0.354	0.382	0.151
mlogit_BoatUp	0.123	0.259	0.454	0.164

After you run it (interpretation):
- mlogit updates every alternative because they’re in the same denominator; cross-effects appear naturally. This is why it’s the right tool for market-share what-ifs.

7 7. What-if simulation: Boat price +20

Before you run it:
Predict baseline shares, adjust boat price by +20, re-predict, and compare. This is how we turn coefficients into managerial stories.

# Baseline market shares
p0 <- predict(m_mlogit, newdata = fishing_mlogit)
share0 <- colMeans(p0)

# Counterfactual
whatif <- fishing_mlogit
whatif$price[whatif$alt == "boat"] <- whatif$price[whatif$alt == "boat"] + 20
p1 <- predict(m_mlogit, newdata = whatif)
share1 <- colMeans(p1)

shares <- rbind(Baseline = share0, BoatPriceUp = share1)
kbl_safe(round(shares, 3))

	beach	boat	charter	pier
Baseline	0.113	0.354	0.382	0.151
BoatPriceUp	0.123	0.259	0.454	0.164

After you run it (interpretation):
- Own effect: boat’s share should drop.
- Cross substitution: who gains? Those are the closer substitutes (e.g., beach, pier).
- Managerial sentence: “+$20 on boat reduces boat share by X points and shifts Y/Z points to beach/pier.”

Before you run it (visual):

barplot(t(shares), beside = TRUE, legend = TRUE,
        ylab = "Market Share", main = "What-if: Boat Price +20")

What-if: Boat Price +20 (Predicted Market Shares)

After you run it:
- Use the bars to show which alternatives pick up the lost share. Emphasize that probabilities must sum to 1.

8 8. Probability-level summaries (AMEs & Elasticities)

8.1 8A. Average Marginal Effects (AMEs)

Before you run it:
Compute average Δprobability from a small change in a variable (own & cross effects).

ame <- function(model, data, var, delta = 1, target_alt = NULL) {
  p0 <- predict(model, newdata = data)
  d1 <- data
  if (!is.null(target_alt)) {
    d1[[var]][d1$alt == target_alt] <- d1[[var]][d1$alt == target_alt] + delta
  } else {
    d1[[var]] <- d1[[var]] + delta
  }
  p1 <- predict(model, newdata = d1)
  colMeans(p1 - p0)
}

ame_price_boat <- ame(m_mlogit, fishing_mlogit, var = "price", delta = 1, target_alt = "boat")
kbl_safe(round(ame_price_boat, 4))

	x
beach	0.0006
boat	-0.0050
charter	0.0037
pier	0.0007

After you run it (interpretation):
- The boat entry is the own probability change (should be negative).
- Positive entries for others are cross effects — where the lost probability goes.
- Read as share points (not percentages).

8.2 8B. Elasticities (own & cross)

Before you run it:
Compute %Δshare / %Δprice for a small % change in an alt-specific price.

elasticities <- function(model, data, target_alt, pct = 0.01) {
  base_s <- colMeans(predict(model, newdata = data))

  d_up <- data
  idx  <- d_up$alt == target_alt
  d_up$price[idx] <- d_up$price[idx] * (1 + pct)

  s_up <- colMeans(predict(model, newdata = d_up))
  ((s_up - base_s) / base_s) / pct
}

elas_boat <- elasticities(m_mlogit, fishing_mlogit, target_alt = "boat", pct = 0.01)
kbl_safe(round(elas_boat, 3))

	x
beach	0.292
boat	-0.609
charter	0.376
pier	0.255

After you run it (interpretation):
- Own-price elasticity (boat entry) should be negative.
- Cross-price elasticities indicate substitution patterns (positive → substitutes).
- Use these to compare sensitivity across alternatives.

9 9. Diagnostics & good practice

Before you run it:
No code—this is a checklist you apply when reading outputs.

After you run it (what to check):
- One TRUE per chooser; sensible attribute ranges; missing data handled.
- Signs sensible; units interpretable (rescale price if needed).
- LR vs null significant; ρ² in a plausible range; AIC/BIC for like-with-like comparisons.
- Be aware of IIA; switch to nested/mixed logit if near-identical substitutes are present.

10 10. Extensions & Exercises

Targeting/heterogeneity: Interact income with price or catch to see if price sensitivity varies by income.

Before you run it (guard & construction): We explicitly create interaction columns to avoid aliasing/duplication inside mlogit, then keep income on the right-hand side.

# -- Guard: build interaction columns explicitly and ensure numeric types --
needed <- c("price","catch","income")
stopifnot(all(needed %in% names(fishing_mlogit)))

if (!"price_income" %in% names(fishing_mlogit)) {
  fishing_mlogit$price_income <- fishing_mlogit$price * fishing_mlogit$income
}
if (!"catch_income" %in% names(fishing_mlogit)) {
  fishing_mlogit$catch_income <- fishing_mlogit$catch * fishing_mlogit$income
}

for (v in c("price","catch","income","price_income","catch_income")) {
  fishing_mlogit[[v]] <- as.numeric(fishing_mlogit[[v]])
}

Then fit the interaction model:

m_mlogit_ix <- mlogit(
  mode ~ price + catch + price_income + catch_income | income,
  data     = fishing_mlogit,
  reflevel = "beach"
)
summary(m_mlogit_ix)


Call:
mlogit(formula = mode ~ price + catch + price_income + catch_income | 
    income, data = fishing_mlogit, reflevel = "beach", method = "nr")

Frequencies of alternatives:choice
  beach    boat charter    pier 
0.11337 0.35364 0.38240 0.15059 

nr method
7 iterations, 0h:0m:0s 
g'(-H)^-1g = 0.00023 
successive function values within tolerance limits 

Coefficients :
                       Estimate  Std. Error  z-value  Pr(>|z|)    
(Intercept):boat     6.2247e-01  2.1890e-01   2.8436 0.0044602 ** 
(Intercept):charter  2.0179e+00  2.5220e-01   8.0011 1.332e-15 ***
(Intercept):pier     7.9679e-01  2.2345e-01   3.5659 0.0003626 ***
price               -3.3249e-02  3.1352e-03 -10.6052 < 2.2e-16 ***
catch                3.6114e-01  2.0912e-01   1.7270 0.0841765 .  
price_income         1.7812e-06  5.0323e-07   3.5395 0.0004008 ***
catch_income        -2.4950e-06  4.7417e-05  -0.0526 0.9580354    
income:boat          5.6561e-05  4.6315e-05   1.2212 0.2220019    
income:charter      -1.1544e-04  5.4466e-05  -2.1195 0.0340522 *  
income:pier         -1.3280e-04  5.1541e-05  -2.5766 0.0099785 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood: -1210.1
McFadden R^2:  0.19203 
Likelihood ratio test : chisq = 575.2 (p.value = < 2.22e-16)

How to read it: The price effect at income $I$ is $\beta_{\text{price}} + \beta_{\text{price×income}} \cdot I$. If price_income > 0 (while price < 0), higher income reduces price sensitivity (less negative net effect).

Alternative-specific constants (ASCs): Include ASCs to capture average appeal of each option (beyond observed attributes). In mlogit, add 0/1 dummies per alternative (leave one out) or use ~ 1 on the right as needed.
Model comparison: Compare AIC/BIC across specifications; discuss out-of-sample validation by splitting choosers.
IIA discussion: Brainstorm situations where IIA might fail and what nested/mixed logit would change conceptually.
Elasticities: As a challenge, derive own/cross price elasticities via small perturbations to price and observing predicted shares (finite differences).

11 11. Companion Reference — Output & Interpretation

11.1 Reading `mlogit` output (reference)

What you see:
- Frequencies of alternatives (how often each was chosen).
- Coefficients grouped by non-baseline alternative (e.g., boat/charter/pier). You’ll see entries like price:boat, catch:charter, and possibly income effects per alternative.
- Log-Likelihood, McFadden’s $\rho^2$, LR test statistic and p-value, AIC/BIC.

Interpretation of a coefficient
- Example: price under “boat”: $\beta = -0.05$ means a +1 increase in boat price reduces log-odds(boat vs beach) by 0.05; odds multiply by $e^{-0.05} \approx 0.95$.
- Individual-specific variable (e.g., income) appears as separate effects for each non-baseline alternative. income = +0.03 under charter ⇒ higher income shifts odds toward charter vs beach.

Overall fit & tests
- LR test vs null (reported in summary(m_mlogit) or via lmtest::lrtest): if significant, the predictors jointly explain choice.
- McFadden’s $\rho^2$ (pseudo-$R^2$): closer to 0.2–0.4 can indicate good fit in choice contexts (rules-of-thumb, not strict).
- AIC/BIC: lower is better when comparing models fit to the same data.

Why this view is better for marketing
- Uses the full choice set (updates everyone’s probabilities when any attribute changes).
- Naturally separates alt-specific and individual-specific drivers.
- Outputs map to what-if statements: simulate new prices/features and read off share shifts.

11.2 From coefficients to business language

11.2.1 Odds ratio → sentence template

Template:
“A one-unit increase in X for ALT multiplies the odds(ALT vs beach) by $e^{\beta}$. That’s about $100\times (e^{\beta}-1)\%$ change in odds, holding other variables constant.”
Example: β = −0.05 → “−4.9% odds per unit (≈ −5%).”
Scaling: for $\Delta x$ units, use $e^{\beta \Delta x}$.
Example: $\Delta x = 20 \Rightarrow e^{-0.05 \times 20} \approx 0.37$ (odds fall to 37% of original).

11.2.2 Probability changes (Average Marginal Effects, AMEs)

Use the helper in the Rmd: it nudges a variable and averages Δprobability for each alternative.
Read it like this: For boat price +1, “boat probability” decreases by own AME points, and the others increase by cross AME points (where the lost probability goes).
Managerial translation: “A $1 price increase for boat reduces boat probability by X points on average, shifting Y/Z points to beach/pier.”

11.2.3 Elasticities (own & cross)

Elasticity = %Δshare / %Δprice.
Own-price elasticity (boat): should be negative. Cross-price elasticities (others): typically positive if substitutes.
Managerial translation: “A 1% increase in boat price reduces boat share by ε_{boat,price}% and raises beach/pier shares by …”

11.3 What-if simulation — how to read the tables and the plot

What we compute in the Rmd:
1) Baseline probabilities for each observation → average to shares (row Baseline).
2) Modify an attribute (e.g., boat price +20), re-run prediction → average to shares (row BoatPriceUp).
3) Compare rows to see shifts; plot with a grouped bar chart.

How to interpret:
- Own effect: If boat’s share falls, the sign matches intuition. The magnitude tells sensitivity.
- Cross substitution: Which alternatives gain? Those are the closest substitutes in the dataset.
- Policy phrasing: “Increasing boat price by $20 shifts A points from boat to mainly beach and pier.”

Caveats to say out loud:
- This is ceteris paribus under the model, not a causal field experiment.
- IIA means proportional substitution; if options are near clones, consider nested/mixed logit.

11.4 Goodness-of-fit & diagnostics (what matters, what doesn’t)

11.4.1 Overall model

Log-likelihood (LL): less negative is better.
LR test vs null: if significant, model explains choices beyond random.
McFadden’s $\rho^2 = 1 - \frac{\text{LL}_{\text{model}}}{\text{LL}_{\text{null}}}$: higher is better; 0.2–0.4 can be strong in discrete choice.
AIC/BIC: compare models fit on the same sample.

11.4.2 Coefficient sanity checks

Signs: price negative, “good” attributes positive.
Scale/units: rescale price (e.g., per $10) to talk clearly.
Significance: report p-values and confidence intervals where relevant.
Collinearity: highly correlated attributes can inflate SEs and blur interpretation.

11.4.3 Data structure checks

Variation: alt-specific variables must vary within a choice set; individual-specific variables shouldn’t.
One chosen per chooser: exactly one TRUE per chid.
Missing values & outliers: treat/inspect before modeling.

11.4.4 IIA reasonableness

If removing one alt implies unrealistic proportional shifts to the remaining, consider nested or mixed logit.
Optional: run mlogit’s IIA tests (e.g., Hausman–McFadden) with caution on small samples.

11.5 Worked example (fill-in-the-blank templates)

Replace the numbers with those from your summary() and predict() outputs.

Coefficient → odds statement
- “β_{price, boat} = ____ ⇒ OR = exp(β) = ____. A +1 in boat price changes odds(boat vs beach) by about 100×(OR−1)% = ____.”

Scaled change
- “For Δx = 20: OR = exp(β × 20) = ____. Odds become ____% of original.”

Probability AME (from Appendix function)
- “Boat price +1: boat probability Δ = ____ points; beach Δ = ____; pier Δ = ____; charter Δ = ____.”

Elasticity
- “Boat own-price elasticity: ____; cross-price elasticities (beach: ____, pier: ____, charter: ____).”

What-if share table
- “Baseline share (boat): ____; After +20 price: ____; Δ: ____. Biggest gainer: ____.”

Takeaway sentence
- “A $20 increase in boat price reduces its odds by ~____% and its share by ____ points, mainly shifting share to ____.”

11.6 FAQ (plain-English answers)

Q: Why “vs baseline”?
A: Logit needs a reference; all effects are relative to that option (here beach). Changing the baseline changes intercepts, not the substantive contrasts.
Q: Why talk in odds?
A: Coefficients are log-odds; converting to odds ratios is direct and easy to state. Probabilities require recomputing across all options.
Q: Can we say this is causal?
A: Treat results as predictive/associational unless the study was designed for causality (e.g., randomized pricing).
Q: My p-values are large—does that mean no effect?
A: Not necessarily; sample size, noise, and multicollinearity matter. Check signs, effect sizes, and the LR test.
Q: When is MNL’s IIA a problem?
A: If two options are near-identical substitutes; then removing one unrealistically boosts the other. Consider nested or mixed logit.

Multinomial Logit for Marketing — Textbook-Style Walkthrough

Aakash Shah

2025-10-02

1 1. What is Multinomial Logit (MNL)?

2 2. Setup

3 3. Data preview (wide format)

4 4. Data preparation (3 ways)

4.1 4A. Shortcut: `mlogit.data`

4.2 4B. Tidyverse: `pivot_longer`

4.3 4C. Classic tidyverse: `gather → separate → spread`

4.4 4D. Equivalence check

5 5. Classification view (single row per chooser)

5.1 5A. `nnet::multinom`

5.2 5B. `VGAM::vglm`

6 6. Choice model view (recommended for marketing)

6.1 6.1 Coefficient interpretation deep dive

6.2 6.2 Why `mlogit` is better for marketing inference & what-ifs

6.3 6.3 Managerial translation patterns

6.4 6.4 Checklist while reading output

7 7. What-if simulation: Boat price +20

8 8. Probability-level summaries (AMEs & Elasticities)

8.1 8A. Average Marginal Effects (AMEs)

8.2 8B. Elasticities (own & cross)

9 9. Diagnostics & good practice

10 10. Extensions & Exercises

11 11. Companion Reference — Output & Interpretation

11.1 Reading `mlogit` output (reference)

11.2 From coefficients to business language

11.2.1 Odds ratio → sentence template

11.2.2 Probability changes (Average Marginal Effects, AMEs)

11.2.3 Elasticities (own & cross)

11.3 What-if simulation — how to read the tables and the plot

11.4 Goodness-of-fit & diagnostics (what matters, what doesn’t)

11.4.1 Overall model

11.4.2 Coefficient sanity checks

11.4.3 Data structure checks

11.4.4 IIA reasonableness

11.5 Worked example (fill-in-the-blank templates)

11.6 FAQ (plain-English answers)

Multinomial Logit for Marketing — Textbook-Style Walkthrough

Aakash Shah

2025-10-02

1 1. What is Multinomial Logit (MNL)?

2 2. Setup

3 3. Data preview (wide format)

4 4. Data preparation (3 ways)

4.1 4A. Shortcut: mlogit.data

4.2 4B. Tidyverse: pivot_longer

4.3 4C. Classic tidyverse: gather → separate → spread

4.4 4D. Equivalence check

5 5. Classification view (single row per chooser)

5.1 5A. nnet::multinom

5.2 5B. VGAM::vglm

6 6. Choice model view (recommended for marketing)

6.1 6.1 Coefficient interpretation deep dive

6.2 6.2 Why mlogit is better for marketing inference & what-ifs

6.3 6.3 Managerial translation patterns

6.4 6.4 Checklist while reading output

7 7. What-if simulation: Boat price +20

8 8. Probability-level summaries (AMEs & Elasticities)

8.1 8A. Average Marginal Effects (AMEs)

8.2 8B. Elasticities (own & cross)

9 9. Diagnostics & good practice

10 10. Extensions & Exercises

11 11. Companion Reference — Output & Interpretation

11.1 Reading mlogit output (reference)

11.2 From coefficients to business language

11.2.1 Odds ratio → sentence template

11.2.2 Probability changes (Average Marginal Effects, AMEs)

11.2.3 Elasticities (own & cross)

11.3 What-if simulation — how to read the tables and the plot

11.4 Goodness-of-fit & diagnostics (what matters, what doesn’t)

11.4.1 Overall model

11.4.2 Coefficient sanity checks

11.4.3 Data structure checks

11.4.4 IIA reasonableness

11.5 Worked example (fill-in-the-blank templates)

11.6 FAQ (plain-English answers)

4.1 4A. Shortcut: `mlogit.data`

4.2 4B. Tidyverse: `pivot_longer`

4.3 4C. Classic tidyverse: `gather → separate → spread`

5.1 5A. `nnet::multinom`

5.2 5B. `VGAM::vglm`

6.2 6.2 Why `mlogit` is better for marketing inference & what-ifs

11.1 Reading `mlogit` output (reference)