This tutorial outlines the process of performing and interpreting an ordinal logistic regression analysis using the clm function from the ordinal package in R, including generating dummy data, fitting the model, interpreting coefficients, and understanding the assumptions and limitations of the proportional odds model.

Data preparation

set.seed(1)
# Generate dummy data
n         <- 620  # Number of observations
group     <- sample(c(0, 1),  size = n, replace = TRUE, prob = c(0.5, 0.5)) # Binary group variable
education <- sample(c(10, 12, 14, 16), size = n, replace = TRUE) # Education levels in years
age       <- sample(18:60, size = n, replace = TRUE) # Age in years
outcome   <- sample(c(0, 1, 2), size = n, replace = TRUE, prob = c(0.3, 0.5, 0.2))  # Ordinal outcome
outcome   <- as.factor(outcome)
# Create a data frame
df <- data.frame(
  group     = group,
  education = education,
  age       = age,
  outcome   = outcome
)
# View the first few rows
head(df)
##   group education age outcome
## 1     1        14  18       0
## 2     1        16  26       0
## 3     0        14  52       0
## 4     0        10  28       1
## 5     1        14  57       0
## 6     0        14  53       1

Ordinal logistic regresion

Here, you are fitting three-category outcomes using a multinomial distribution. First, I would advise to use a cumulative-link-model via clm in ordinal package. This family of models is sometimes called an ordinal logistic regression. This naming highlights that this model is essentially an extension of logistic regression when the outcome is an ordinal variable.

model <- clm(outcome ~ group + education + age, data=df)
summary(model)
## formula: outcome ~ group + education + age
## data:    df
## 
##  link  threshold nobs logLik  AIC     niter max.grad cond.H 
##  logit flexible  620  -622.50 1255.00 5(0)  1.35e-10 1.6e+05
## 
## Coefficients:
##            Estimate Std. Error z value Pr(>|z|)  
## group     -0.108847   0.154069  -0.706   0.4799  
## education -0.060993   0.034554  -1.765   0.0775 .
## age       -0.003180   0.006069  -0.524   0.6003  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 0|1  -1.9584     0.5197  -3.768
## 1|2   0.4627     0.5127   0.902

The outcome is an ordinal variable with 3 ordered categories (i.e. \(0<1<2\)). The group variable is a binary (indicator) variable for group membership. Let’s study these model coefficients.

The Proportional Odds Assumption

The model assumes that the odds are proportional, that is, that the odds of being in a higher category of the outcome are the same for all levels of the outcome. For example, the odds just mentioned of 0.897 are the same for the transition from outcome level 0 to outcome level 1 as for the transition from outcome level 1 to outcome level 2. This is a strong assumption, and it is important to check it. This can be done in different ways, and the clm package offers other flexible models that do not make this assumption.

What are odds and log-odds?

In short, odds are a different way of representing probabilities. They are popular in sports betting and gambling. Mathematically, the odds of probability \(p\) are \(p /(1-p)\). For example, with a \(50-50\) chance of winning, \(p=0.5\) and the odds are \(0.5 /(1-0.5)=1 / 1\). We say the odds are 1-to- 1. With a \(2 / 3\) chance of winning, the odds are \((2 / 3) /(1-2 / 3)=2 / 1\). We say the odds are 2-to-1.

In our example with the clm model above, we are obtaining a ratio of odds. This means that if the odds of falling in a higher category of the outcome for group0 are \(x\), then the odds of falling into a higher category for group1 are \(0.897 \times x\). This is the interpretation of the odds ratio of 0.897.

This is a lot trickier to interpret, since we have odds, which are already a ratio of two probabilities, which are then put together in another ratio. Formally, if group1 have probability \(p_1\) of falling in a higher category of the outcome and group0 have probability \(p_0\), we obtain the odds for group1 as \(\text{odds}_1=p_1 /\left(1-p_1\right)\) and the odds for group0 as \(\text{odds}_0=p_0 /\left(1-p_0\right)\). Then we put these in a ratio :

\[ \text { odds ratio }_{1 / 0}=\frac{\text {odds}_1}{\text {odds}_0}=\frac{p_1 /\left(1-p_1\right)}{p_0/\left(1-p_0\right)} \]

Log-odds are a mathematical curiosity

The log-odds are the natural logarithm of the odds and are practically impossible to interpret. The log-odds are mainly used for mathematical reasons during modelling, and that is why we have to exponentiate the coefficients using the exp() function to be able to interpret them.