This tutorial outlines the process of performing and interpreting an ordinal logistic regression analysis using the clm function from the ordinal package in R, including generating dummy data, fitting the model, interpreting coefficients, and understanding the assumptions and limitations of the proportional odds model.

Data preparation

set.seed(1)
# Generate dummy data
n         <- 620  # Number of observations
group     <- sample(c(0, 1),  size = n, replace = TRUE, prob = c(0.5, 0.5)) # Binary group variable
education <- sample(c(10, 12, 14, 16), size = n, replace = TRUE) # Education levels in years
age       <- sample(18:60, size = n, replace = TRUE) # Age in years
outcome   <- sample(c(0, 1, 2), size = n, replace = TRUE, prob = c(0.3, 0.5, 0.2))  # Ordinal outcome
outcome   <- as.factor(outcome)
# Create a data frame
df <- data.frame(
  group     = group,
  education = education,
  age       = age,
  outcome   = outcome
)
# View the first few rows
head(df)

##   group education age outcome
## 1     1        14  18       0
## 2     1        16  26       0
## 3     0        14  52       0
## 4     0        10  28       1
## 5     1        14  57       0
## 6     0        14  53       1

Ordinal logistic regresion

Here, you are fitting three-category outcomes using a multinomial distribution. First, I would advise to use a cumulative-link-model via clm in ordinal package. This family of models is sometimes called an ordinal logistic regression. This naming highlights that this model is essentially an extension of logistic regression when the outcome is an ordinal variable.

model <- clm(outcome ~ group + education + age, data=df)
summary(model)

## formula: outcome ~ group + education + age
## data:    df
## 
##  link  threshold nobs logLik  AIC     niter max.grad cond.H 
##  logit flexible  620  -622.50 1255.00 5(0)  1.35e-10 1.6e+05
## 
## Coefficients:
##            Estimate Std. Error z value Pr(>|z|)  
## group     -0.108847   0.154069  -0.706   0.4799  
## education -0.060993   0.034554  -1.765   0.0775 .
## age       -0.003180   0.006069  -0.524   0.6003  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 0|1  -1.9584     0.5197  -3.768
## 1|2   0.4627     0.5127   0.902

The outcome is an ordinal variable with 3 ordered categories (i.e. \(0<1<2\)). The group variable is a binary (indicator) variable for group membership. Let’s study these model coefficients.

Interpretation of coefficients in cumulative link models

All the variables other than group are continuous. The interpretations of continuous variables is a little bit trickier, but as far as I know, interpreting these coefficients is not the focus of this analysis - they rather serve as adjustment variables. So, we focus on the group variable. The coefficients of the group variable are the log-odds of being in a higher category of the outcome. For example, the coefficient of group 1 is equal to -0.109. To remove the \(\log\) and obtain the odds, we exponentiate the coefficient.

odds.ratio <- exp(coef(model)[3])
odds.ratio

##    group 
## 0.896868

This means that the odds of being in a higher category of the outcome are 0.897 times higher for group1 than group0, which is not significantly different from and odds ratio of 1 when we look at the p-value.

An odds-ratio of 1 means no association
Larger than 1 means positive association
Smaller than 1 means negative association.

In other words, we cannot conclude that those in group1 have higher odds of falling into a higher category of the outcome than those in group0 based on the data.

The Proportional Odds Assumption

The model assumes that the odds are proportional, that is, that the odds of being in a higher category of the outcome are the same for all levels of the outcome. For example, the odds just mentioned of 0.897 are the same for the transition from outcome level 0 to outcome level 1 as for the transition from outcome level 1 to outcome level 2. This is a strong assumption, and it is important to check it. This can be done in different ways, and the clm package offers other flexible models that do not make this assumption.

What are odds and log-odds?

In short, odds are a different way of representing probabilities. They are popular in sports betting and gambling. Mathematically, the odds of probability \(p\) are \(p /(1-p)\). For example, with a \(50-50\) chance of winning, \(p=0.5\) and the odds are \(0.5 /(1-0.5)=1 / 1\). We say the odds are 1-to- 1. With a \(2 / 3\) chance of winning, the odds are \((2 / 3) /(1-2 / 3)=2 / 1\). We say the odds are 2-to-1.

In our example with the clm model above, we are obtaining a ratio of odds. This means that if the odds of falling in a higher category of the outcome for group0 are \(x\), then the odds of falling into a higher category for group1 are \(0.897 \times x\). This is the interpretation of the odds ratio of 0.897.

This is a lot trickier to interpret, since we have odds, which are already a ratio of two probabilities, which are then put together in another ratio. Formally, if group1 have probability \(p_1\) of falling in a higher category of the outcome and group0 have probability \(p_0\), we obtain the odds for group1 as \(\text{odds}_1=p_1 /\left(1-p_1\right)\) and the odds for group0 as \(\text{odds}_0=p_0 /\left(1-p_0\right)\). Then we put these in a ratio :

\[ \text { odds ratio }_{1 / 0}=\frac{\text {odds}_1}{\text {odds}_0}=\frac{p_1 /\left(1-p_1\right)}{p_0/\left(1-p_0\right)} \]

Log-odds are a mathematical curiosity

The log-odds are the natural logarithm of the odds and are practically impossible to interpret. The log-odds are mainly used for mathematical reasons during modelling, and that is why we have to exponentiate the coefficients using the exp() function to be able to interpret them.