polr

Example of polr

This document discusses how to implement an ordinal logistic regression model using R, specifically using the polr function from the MASS package. This example involves constructing a fictitious dataset to predict student grades based on variables such as study time and attendance.

Setting Up the Environment and Data

First, you’ll need to install and load the necessary R package, MASS, and then generate a fictitious dataset.

# Install and load the MASS package if not already installed
if (!require(MASS)) install.packages("MASS")
Loading required package: MASS
library(MASS)

# Create a fictitious dataset
set.seed(123)
data <- data.frame(
  study_time = round(rnorm(100, mean=5, sd=2)),  # Study time in hours (mean=5, sd=2)
  attendance = round(runif(100, min=80, max=100)),  # Attendance rate (80% to 100%)
  grade = factor(sample(c("A", "B", "C", "D", "F"), 100, replace=TRUE),
                 levels = c("A", "B", "C", "D", "F"), ordered = TRUE)  # Grades
)

# Preview the data
head(data)
  study_time attendance grade
1          4         85     B
2          5         99     B
3          8         92     A
4          5         90     B
5          5         88     A
6          8         98     F

Fitting the Ordinal Logistic Regression Model

Next, use the polr function to fit an ordinal logistic regression model to the data.

# Fit the ordinal logistic regression model
model <- polr(grade ~ study_time + attendance, data = data, Hess=TRUE)

# Display the summary of the model
summary(model)
Call:
polr(formula = grade ~ study_time + attendance, data = data, 
    Hess = TRUE)

Coefficients:
              Value Std. Error t value
study_time -0.16149    0.09926 -1.6268
attendance -0.02173    0.03024 -0.7185

Intercepts:
    Value   Std. Error t value
A|B -4.0104  2.7278    -1.4702
B|C -2.9323  2.7079    -1.0829
C|D -2.3522  2.7055    -0.8694
D|F -1.7026  2.7093    -0.6284

Residual Deviance: 311.8796 
AIC: 323.8796 

Making Predictions

Finally, you can make predictions on new data points to estimate the probability of each grade category.

# Predicting grades for new data points
new_data <- data.frame(study_time = c(6, 7), attendance = c(90, 95))
predictions <- predict(model, newdata = new_data, type = "prob")
predictions
          A         B         C         D         F
1 0.2523633 0.2456461 0.1412492 0.1331143 0.2276271
2 0.3066289 0.2585378 0.1337760 0.1174174 0.1836399

Explanation

This example demonstrates the following steps in R:

  1. Loading Necessary Packages: Ensures that the MASS package, which contains the polr function, is available.

  2. Data Preparation: Generates a dataset with variables like study time and attendance, and an ordered factor for grades.

  3. Model Fitting: Uses the polr function to fit an ordinal logistic regression model. This function models the log odds of being at or above a certain category of the ordered response, relative to the cumulative log odds of being in lower categories.

  4. Prediction: Uses the fitted model to predict the probability distribution over grades for new observations.

This process is useful for analyzing ordinal categorical data where the response variable categories have a meaningful order but no inherent numeric distance between them.

Exercises

  1. Build on this example and apply the explanatory variable (X) to categorical data.

  2. Adapt it to your own data, if Ex 1. can be successfully implemented.