This article aims to demonstrate how to perform Ordinal regression using the MASS package - POLR function
Required Libraries
library(MASS) # for polr
library(stargazer)
library(sjPlot)
library(ggplot2)
library(ggthemes)
library(dplyr)
library(tidyr)
library(viridis)
library(gt)
library(extrafont)
library(effects) # For marginal effects and predictions
library(explore)
library(DescTools) # for PseudoR2
library(stats) # For AIC, BIC, logLik
library(afcommon)
Set the defaults theme for ggplot2 in order to match Latex and the requiremens of political science magazines
Set the ggplot2 theme
The mtcars data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
# Create data frame with the data dictionary information
car_variables <- tibble(
variable = c("mpg", "cylinder", "disp", "hp", "drat",
"wt", "qsec", "vs", "am", "gear", "carb"),
description = c(
"Miles/(US) gallon",
"Number of cylinders",
"Displacement (cu.in.)",
"Gross horsepower",
"Rear axle ratio",
"Weight (lb/1000)",
"1/4 mile time",
"V/S",
"Transmission (0 = automatic, 1 = manual)",
"Number of forward gears",
"Number of carburetors"
)
)
# Create GT table
mtcars_data_dictionary <- car_variables %>%
gt() %>%
cols_label(
variable = "Variable",
description = "Description"
) %>%
tab_header(
title = "'mtcars' Data Dictionary",
subtitle = "Detailed explanation of variables in mtcar automotive dataset"
) %>%
opt_stylize(style = 1, color = "gray")
mtcars_data_dictionary
'mtcars' Data Dictionary | |
Detailed explanation of variables in mtcar automotive dataset | |
Variable | Description |
---|---|
mpg | Miles/(US) gallon |
cylinder | Number of cylinders |
disp | Displacement (cu.in.) |
hp | Gross horsepower |
drat | Rear axle ratio |
wt | Weight (lb/1000) |
qsec | 1/4 mile time |
vs | V/S |
am | Transmission (0 = automatic, 1 = manual) |
gear | Number of forward gears |
carb | Number of carburetors |
Careate gas consumption catgories variable mpg-cat
mtcars_prep <- mtcars
mtcars_prep$mpg_cat <- cut(mtcars$mpg, # Create mpg_cat variable with 5 categories
breaks = 5,
labels = c("Very Low", "Low", "Medium", "High", "Very High"))
mtcars_prep %>% explore(mpg_cat)
\(mpg\_cat \sim wt + cyl + am\)
model <- polr(mpg_cat ~ wt + cyl + am, data = mtcars_prep, Hess=TRUE)
gt_coeftest(model) # with p-values
Regression Coefficient Results | ||||
Variable | Coefficient | Std. Error | t-statistic | p-value |
---|---|---|---|---|
wt | −6.186 | 2.445 | −2.529 | 0.018 |
cyl | −9.510 | 1.150 | −8.272 | 0.000 |
am | −1.710 | 1.574 | −1.086 | 0.288 |
Extract model fit information
Use Stargazer to output a regression table that can be used with LaTex.
stargazer(
model,
type = "html",
omit = c("Constant"),
star.cutoffs = c(0.05, 0.01, 0.001),
title = "mtcars Gas Consumption - POLR Regression",
label = "tab:mpg_cat_reg",
keep.stat = c("n"),
add.lines = list(c("Log-Likelihood", round(logLik,3)),
c("AIC", round(aic,3)),
c("Pseudo-R2", round(pseudo_r2,3))),
style = "apsr",
out = "mpg_cat.tex"
)
mpg_cat | |
wt | -6.186* |
(2.445) | |
cyl | -9.510*** |
(1.150) | |
am | -1.710 |
(1.574) | |
Log-Likelihood | -15.728 |
AIC | 45.456 |
Pseudo-R2 | 0.664 |
N | 32 |
p < .05; p < .01; p < .001 |
Results Interpretation
dep_var <- "mpg_cat"
ind_vars <- c("wt", "cyl", "am")
interpretation <-
af_reg_interpret(model, mtcars_prep, y_name = dep_var, x_list = ind_vars, reg_type= "polr")
To examine the correlation of wt, cyl and am with mpg_cat, we
estimated a Ordinal Proportional Odds Logistic regression model (n=32).
The dependent variable mpg_cat has 5 levels: Very Low, Low, Medium, High
and Very High. The model yielded a significant result [χ²(3) = 62.1, p
< 2.11e-13], with a log-likelihood value of -15.7 and pseudo R² =
0.664.
The wt variable has statistically significant correlation. An
increase of 1 unit in wt decreases the likelyhood to move from one
category of mpg_cat to the next one by 99.79%, assuming all other
variables stay constant.
The cyl variable has statistically
significant correlation. An increase of 1 unit in cyl decreases the
likelyhood to move from one category of mpg_cat to the next one by
99.99%, assuming all other variables stay constant.
Variable am
(odds ratio = 0.181, p_val = 0.288) is not statistically
significant.
Note: the plotted coefficients are exponentiated
p <- plot_model(
model,
transform = "exp", # NULL,
title = "",
vline.color = "grey70",
show.values = TRUE,
show.p = TRUE,
show.intercept = FALSE,
digits = 3,
sort.est = TRUE
)
af_ggsave("model_coef.png", plot = p)
The below plots show the marginal effects of the “wt” and “cyl” variables on the outcome variable “mpg_cat”, controlling for the other variables in the model.The plot visualizes this marginal effect, showing how the predicted probability or cumulative probability changes as “wt” or “cyl” varies. The y-axis shows the predicted cumulative probability of being in a particular category or lower.
# Save in a png file
png("wt_effect_on_mpg_cat", width = 800, height = 600)
plot(effect("wt", model))
dev.off()
png
2
Interpretation:
Vehicals in Very Low, High and Very High categories are almost not impacted by the vehicle weight. The higher the vehicle weight is, the probability that the vehicle remains in the Low category or in the below ones (the Very Low category), increases. The higher the vehicle weight is, the probability that the vehicle remains in the Medium category or in the below ones (the Low and Very Low categories), decreases, i.e. there is a higher probability that it will move to the category High.
# Save in a png file
png("cyl_effect_on_mpg_cat.png", width = 800, height = 600)
plot(effect("cyl", model))
dev.off()
png
2
The interpretation for the cyl effect plot is simillar to that of the weight effect plot. Instead of the growing vehicle weight, the effect dependes on the growing number of vehcile cylinders.
Here is a plot of the probabilities for each level of satisfaction based on the fitted ordinal regression model.
# Convert housing dataset to a dataframe
df <- mtcars_prep
dep_var <- "mpg_cat"
# Predict probabilities for each level of Sat
t_prob <- as.data.frame(predict(object = model, df, type = "p"))
# Rename columns to match mpg_cat levels for clarity
colnames(t_prob) <- levels(df[[dep_var]])
# Reshape the probabilities to long format and add satisfaction levels
prob <- t_prob %>%
pivot_longer(cols = everything(), names_to = dep_var, values_to = "Probability")
# Summarize mean probability and confidence intervals for each level of Sat
summary_stats <- prob %>%
group_by(!!sym(dep_var)) %>%
summarise(
Mean = mean(Probability),
SE = sd(Probability) / sqrt(n()), # Standard error
LowerCI = Mean - 1.96 * SE, # Lower 95% confidence interval
UpperCI = Mean + 1.96 * SE # Upper 95% confidence interval
)
# Plot the means with confidence intervals
p <- ggplot(summary_stats, aes(x = !!sym(dep_var), y = Mean, color = !!sym(dep_var))) +
geom_pointrange(aes(ymin = LowerCI, ymax = UpperCI), size = 1.1) +
geom_point(size = 3) +
labs(
x = dep_var,
y = "Mean Probability",
title = paste("Predicted Probabilities for", dep_var, "Levels"),
color = dep_var
) +
theme_minimal()
# Save the plot
af_ggsave(paste0(dep_var, "_means_CI.png"), plot = p)