This document discusses how to implement an ordinal logistic regression model using R, specifically using the polr function from the MASS package. This example involves constructing a fictitious dataset to predict student grades based on variables such as study time and attendance.
Setting Up the Environment and Data
First, you’ll need to install and load the necessary R package, MASS, and then generate a fictitious dataset.
# Install and load the MASS package if not already installedif (!require(MASS)) install.packages("MASS")
Loading required package: MASS
library(MASS)# Create a fictitious datasetset.seed(123)data <-data.frame(study_time =round(rnorm(100, mean=5, sd=2)), # Study time in hours (mean=5, sd=2)attendance =round(runif(100, min=80, max=100)), # Attendance rate (80% to 100%)grade =factor(sample(c("A", "B", "C", "D", "F"), 100, replace=TRUE),levels =c("A", "B", "C", "D", "F"), ordered =TRUE) # Grades)# Preview the datahead(data)
study_time attendance grade
1 4 85 B
2 5 99 B
3 8 92 A
4 5 90 B
5 5 88 A
6 8 98 F
Fitting the Ordinal Logistic Regression Model
Next, use the polr function to fit an ordinal logistic regression model to the data.
# Fit the ordinal logistic regression modelmodel <-polr(grade ~ study_time + attendance, data = data, Hess=TRUE)# Display the summary of the modelsummary(model)
Call:
polr(formula = grade ~ study_time + attendance, data = data,
Hess = TRUE)
Coefficients:
Value Std. Error t value
study_time -0.16149 0.09926 -1.6268
attendance -0.02173 0.03024 -0.7185
Intercepts:
Value Std. Error t value
A|B -4.0104 2.7278 -1.4702
B|C -2.9323 2.7079 -1.0829
C|D -2.3522 2.7055 -0.8694
D|F -1.7026 2.7093 -0.6284
Residual Deviance: 311.8796
AIC: 323.8796
Making Predictions
Finally, you can make predictions on new data points to estimate the probability of each grade category.
# Predicting grades for new data pointsnew_data <-data.frame(study_time =c(6, 7), attendance =c(90, 95))predictions <-predict(model, newdata = new_data, type ="prob")predictions
A B C D F
1 0.2523633 0.2456461 0.1412492 0.1331143 0.2276271
2 0.3066289 0.2585378 0.1337760 0.1174174 0.1836399
Explanation
This example demonstrates the following steps in R:
Loading Necessary Packages: Ensures that the MASS package, which contains the polr function, is available.
Data Preparation: Generates a dataset with variables like study time and attendance, and an ordered factor for grades.
Model Fitting: Uses the polr function to fit an ordinal logistic regression model. This function models the log odds of being at or above a certain category of the ordered response, relative to the cumulative log odds of being in lower categories.
Prediction: Uses the fitted model to predict the probability distribution over grades for new observations.
This process is useful for analyzing ordinal categorical data where the response variable categories have a meaningful order but no inherent numeric distance between them.
Exercises
Build on this example and apply the explanatory variable (X) to categorical data.
Adapt it to your own data, if Ex 1. can be successfully implemented.