This tutorial outlines the process of performing and interpreting an
ordinal logistic regression analysis using the clm function
from the ordinal package in R, including
generating dummy data, fitting the model, interpreting coefficients, and
understanding the assumptions and limitations of the proportional odds
model.
set.seed(1)
# Generate dummy data
n <- 620 # Number of observations
group <- sample(c(0, 1), size = n, replace = TRUE, prob = c(0.5, 0.5)) # Binary group variable
education <- sample(c(10, 12, 14, 16), size = n, replace = TRUE) # Education levels in years
age <- sample(18:60, size = n, replace = TRUE) # Age in years
outcome <- sample(c(0, 1, 2), size = n, replace = TRUE, prob = c(0.3, 0.5, 0.2)) # Ordinal outcome
outcome <- as.factor(outcome)
# Create a data frame
df <- data.frame(
group = group,
education = education,
age = age,
outcome = outcome
)
# View the first few rows
head(df)
## group education age outcome
## 1 1 14 18 0
## 2 1 16 26 0
## 3 0 14 52 0
## 4 0 10 28 1
## 5 1 14 57 0
## 6 0 14 53 1
Here, you are fitting three-category outcomes using a multinomial
distribution. First, I would advise to use a cumulative-link-model via
clm in ordinal package. This family of models
is sometimes called an ordinal logistic regression.
This naming highlights that this model is essentially an extension of
logistic regression when the outcome is an ordinal variable.
model <- clm(outcome ~ group + education + age, data=df)
summary(model)
## formula: outcome ~ group + education + age
## data: df
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 620 -622.50 1255.00 5(0) 1.35e-10 1.6e+05
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## group -0.108847 0.154069 -0.706 0.4799
## education -0.060993 0.034554 -1.765 0.0775 .
## age -0.003180 0.006069 -0.524 0.6003
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## 0|1 -1.9584 0.5197 -3.768
## 1|2 0.4627 0.5127 0.902
The outcome is an ordinal variable with 3 ordered categories (i.e. \(0<1<2\)). The group variable is a binary (indicator) variable for group membership. Let’s study these model coefficients.
All the variables other than group are continuous. The interpretations of continuous variables is a little bit trickier, but as far as I know, interpreting these coefficients is not the focus of this analysis - they rather serve as adjustment variables. So, we focus on the group variable. The coefficients of the group variable are the log-odds of being in a higher category of the outcome. For example, the coefficient of group 1 is equal to -0.109. To remove the \(\log\) and obtain the odds, we exponentiate the coefficient.
odds.ratio <- exp(coef(model)[3])
odds.ratio
## group
## 0.896868
This means that the odds of being in a higher category of the outcome
are 0.897 times higher for group1 than group0,
which is not significantly different from and odds ratio of 1 when we
look at the p-value.
An odds-ratio of 1 means no association
Larger than 1 means positive association
Smaller than 1 means negative association.
In other words, we cannot conclude that those in group1 have higher odds of falling into a higher category of the outcome than those in group0 based on the data.
The model assumes that the odds are proportional, that is, that the odds of being in a higher category of the outcome are the same for all levels of the outcome. For example, the odds just mentioned of 0.897 are the same for the transition from outcome level 0 to outcome level 1 as for the transition from outcome level 1 to outcome level 2. This is a strong assumption, and it is important to check it. This can be done in different ways, and the clm package offers other flexible models that do not make this assumption.
In short, odds are a different way of representing probabilities. They are popular in sports betting and gambling. Mathematically, the odds of probability \(p\) are \(p /(1-p)\). For example, with a \(50-50\) chance of winning, \(p=0.5\) and the odds are \(0.5 /(1-0.5)=1 / 1\). We say the odds are 1-to- 1. With a \(2 / 3\) chance of winning, the odds are \((2 / 3) /(1-2 / 3)=2 / 1\). We say the odds are 2-to-1.
In our example with the clm model above, we are
obtaining a ratio of odds. This means that if the odds of falling in a
higher category of the outcome for group0 are \(x\), then the odds of falling into a higher
category for group1 are \(0.897
\times x\). This is the interpretation of the odds ratio of
0.897.
This is a lot trickier to interpret, since we have odds, which are
already a ratio of two probabilities, which are then put together in
another ratio. Formally, if group1 have probability \(p_1\) of falling in a higher category of
the outcome and group0 have probability \(p_0\), we obtain the odds for group1 as
\(\text{odds}_1=p_1
/\left(1-p_1\right)\) and the odds for group0 as
\(\text{odds}_0=p_0
/\left(1-p_0\right)\). Then we put these in a ratio :
\[ \text { odds ratio }_{1 / 0}=\frac{\text {odds}_1}{\text {odds}_0}=\frac{p_1 /\left(1-p_1\right)}{p_0/\left(1-p_0\right)} \]
The log-odds are the natural logarithm of the odds and are
practically impossible to interpret. The log-odds are mainly used for
mathematical reasons during modelling, and that is why we have to
exponentiate the coefficients using the exp() function to
be able to interpret them.