This article aims to demonstrate how to perform Ordinal regression using the MASS package - POLR function

0.1 Initialization

Required Libraries

library(MASS)        # for polr
library(stargazer)
library(sjPlot)
library(ggplot2)
library(ggthemes)
library(dplyr)
library(tidyr)
library(viridis)
library(gt)
library(extrafont)
library(effects)    # For marginal effects and predictions
library(explore)
library(DescTools)  # for PseudoR2
library(stats)      # For AIC, BIC, logLik
library(afcommon)

Set the defaults theme for ggplot2 in order to match Latex and the requiremens of political science magazines

loadfonts(device = "win", quiet=TRUE)

Set the ggplot2 theme

theme_set(theme_bw() +
            theme(text = element_text(family = "LM Roman 10", size = 12)))

1 Create Dataset

The mtcars data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

# Create data frame with the data dictionary information
car_variables <- tibble(
  variable = c("mpg", "cylinder", "disp", "hp", "drat", 
               "wt", "qsec", "vs", "am", "gear", "carb"),
  description = c(
    "Miles/(US) gallon", 
    "Number of cylinders", 
    "Displacement (cu.in.)", 
    "Gross horsepower", 
    "Rear axle ratio", 
    "Weight (lb/1000)", 
    "1/4 mile time", 
    "V/S", 
    "Transmission (0 = automatic, 1 = manual)", 
    "Number of forward gears", 
    "Number of carburetors"
  )
)

# Create GT table
mtcars_data_dictionary <- car_variables %>%
  gt() %>%
  cols_label(
    variable = "Variable",
    description = "Description"
  ) %>%
  tab_header(
    title = "'mtcars' Data Dictionary",
    subtitle = "Detailed explanation of variables in mtcar automotive dataset"
  ) %>%
  opt_stylize(style = 1, color = "gray")

mtcars_data_dictionary

Variable	Description
'mtcars' Data Dictionary
Detailed explanation of variables in mtcar automotive dataset
mpg	Miles/(US) gallon
cylinder	Number of cylinders
disp	Displacement (cu.in.)
hp	Gross horsepower
drat	Rear axle ratio
wt	Weight (lb/1000)
qsec	1/4 mile time
vs	V/S
am	Transmission (0 = automatic, 1 = manual)
gear	Number of forward gears
carb	Number of carburetors

Careate gas consumption catgories variable mpg-cat

mtcars_prep <- mtcars
mtcars_prep$mpg_cat <- cut(mtcars$mpg, # Create mpg_cat variable with 5 categories
                      breaks = 5, 
                      labels = c("Very Low", "Low", "Medium", "High", "Very High"))

mtcars_prep %>% explore(mpg_cat)

2 Ordinal regression

\(mpg\_cat \sim wt + cyl + am\)

model <- polr(mpg_cat ~ wt + cyl + am, data = mtcars_prep, Hess=TRUE) 
gt_coeftest(model) # with p-values

Variable	Coefficient	Std. Error	t-statistic	p-value
Regression Coefficient Results
wt	−6.186	2.445	−2.529	0.018
cyl	−9.510	1.150	−8.272	0.000
am	−1.710	1.574	−1.086	0.288

Extract model fit information

logLik <- stats::logLik(model)
aic <- stats::AIC(model)
pseudo_r2 <- DescTools::PseudoR2(model)

Use Stargazer to output a regression table that can be used with LaTex.

stargazer(
  model,
  type = "html",
  omit = c("Constant"),
  star.cutoffs = c(0.05, 0.01, 0.001),
  title = "mtcars Gas Consumption - POLR Regression",
  label = "tab:mpg_cat_reg",
  keep.stat = c("n"),
  add.lines = list(c("Log-Likelihood", round(logLik,3)),
                   c("AIC", round(aic,3)),
                   c("Pseudo-R2", round(pseudo_r2,3))),
  style = "apsr",
  out = "mpg_cat.tex"
)

**mtcars Gas Consumption - POLR Regression**

	mpg_cat

wt	-6.186^*
	(2.445)
cyl	-9.510^***
	(1.150)
am	-1.710
	(1.574)
Log-Likelihood	-15.728
AIC	45.456
Pseudo-R2	0.664
N	32

p < .05; p < .01; p < .001

Results Interpretation

dep_var <- "mpg_cat"
ind_vars <- c("wt", "cyl", "am")
interpretation <- 
  af_reg_interpret(model, mtcars_prep, y_name = dep_var, x_list = ind_vars, reg_type= "polr")

To examine the correlation of wt, cyl and am with mpg_cat, we estimated a Ordinal Proportional Odds Logistic regression model (n=32). The dependent variable mpg_cat has 5 levels: Very Low, Low, Medium, High and Very High. The model yielded a significant result [χ²(3) = 62.1, p < 2.11e-13], with a log-likelihood value of -15.7 and pseudo R² = 0.664.
The wt variable has statistically significant correlation. An increase of 1 unit in wt decreases the likelyhood to move from one category of mpg_cat to the next one by 99.79%, assuming all other variables stay constant.
The cyl variable has statistically significant correlation. An increase of 1 unit in cyl decreases the likelyhood to move from one category of mpg_cat to the next one by 99.99%, assuming all other variables stay constant.
Variable am (odds ratio = 0.181, p_val = 0.288) is not statistically significant.

3 Plot model coeeficients

Note: the plotted coefficients are exponentiated

p <- plot_model(
  model,
  transform = "exp", # NULL,
  title = "",
  vline.color = "grey70",
  show.values = TRUE,
  show.p = TRUE,
  show.intercept = FALSE,
  digits = 3,
  sort.est = TRUE
)
af_ggsave("model_coef.png", plot = p)

4 Plot effects

The below plots show the marginal effects of the “wt” and “cyl” variables on the outcome variable “mpg_cat”, controlling for the other variables in the model.The plot visualizes this marginal effect, showing how the predicted probability or cumulative probability changes as “wt” or “cyl” varies. The y-axis shows the predicted cumulative probability of being in a particular category or lower.

plot(effect("wt", model))


# Save in a png file
png("wt_effect_on_mpg_cat", width = 800, height = 600)
plot(effect("wt", model))
dev.off()
png 
  2

Interpretation:

Vehicals in Very Low, High and Very High categories are almost not impacted by the vehicle weight. The higher the vehicle weight is, the probability that the vehicle remains in the Low category or in the below ones (the Very Low category), increases. The higher the vehicle weight is, the probability that the vehicle remains in the Medium category or in the below ones (the Low and Very Low categories), decreases, i.e. there is a higher probability that it will move to the category High.

plot(effect("cyl", model))


# Save in a png file
png("cyl_effect_on_mpg_cat.png", width = 800, height = 600)
plot(effect("cyl", model))
dev.off()
png 
  2

The interpretation for the cyl effect plot is simillar to that of the weight effect plot. Instead of the growing vehicle weight, the effect dependes on the growing number of vehcile cylinders.

5 Calculate Probabilities

Here is a plot of the probabilities for each level of satisfaction based on the fitted ordinal regression model.

# Convert housing dataset to a dataframe
df <- mtcars_prep
dep_var <- "mpg_cat"

# Predict probabilities for each level of Sat
t_prob <- as.data.frame(predict(object = model, df, type = "p"))

# Rename columns to match mpg_cat levels for clarity
colnames(t_prob) <- levels(df[[dep_var]])

# Reshape the probabilities to long format and add satisfaction levels
prob <- t_prob %>%
  pivot_longer(cols = everything(), names_to = dep_var, values_to = "Probability")

# Summarize mean probability and confidence intervals for each level of Sat
summary_stats <- prob %>%
  group_by(!!sym(dep_var)) %>%
  summarise(
    Mean = mean(Probability),
    SE = sd(Probability) / sqrt(n()),  # Standard error
    LowerCI = Mean - 1.96 * SE,        # Lower 95% confidence interval
    UpperCI = Mean + 1.96 * SE         # Upper 95% confidence interval
  )

# Plot the means with confidence intervals
p <- ggplot(summary_stats, aes(x = !!sym(dep_var), y = Mean, color = !!sym(dep_var))) +
  geom_pointrange(aes(ymin = LowerCI, ymax = UpperCI), size = 1.1) +
  geom_point(size = 3) +
  labs(
    x = dep_var,
    y = "Mean Probability",
    title = paste("Predicted Probabilities for", dep_var, "Levels"),
    color = dep_var
  ) +
  theme_minimal()

# Save the plot
af_ggsave(paste0(dep_var, "_means_CI.png"), plot = p)

Ordinal Regression

Performing ordinal regression using polr (mass package).

Amir Freund

2025-02-05