Data Information (Agresti (1996), Table 6.9, p. 186)
Data set (impairment.sav) comes from a study of mental health for a random sample of adult residents of Alachua County, Florida. Mental impairment is ordinal, with categories (1 = well, 2 = mild symptom formation, 3 = moderate symptom formation, 4 = impaired).
The study related Y = mental impairment to two explanatory variables. The life events is a composite measure of the number and severity of important life events such as birth of child, new job, divorce, or death in family that occurred to the subject within the past three years. In this sample it has a mean of 4.3 and standard deviation of 2.7. Socioeconomic status (SES) is measured here as binary (0 = low & 1 = high).
library(haven)
impairment <- read_sav("impairment.sav")
head(impairment)
## # A tibble: 6 x 4
## subject Impairment ses lifeevents
## <dbl> <dbl+lbl> <dbl+lbl> <dbl>
## 1 1 1 [1= Well] 1 [1=high] 1
## 2 2 1 [1= Well] 1 [1=high] 9
## 3 3 1 [1= Well] 1 [1=high] 4
## 4 4 1 [1= Well] 1 [1=high] 3
## 5 5 1 [1= Well] 0 [0=low] 2
## 6 6 1 [1= Well] 1 [1=high] 0
# Prepare a copy for exercise
OLR_data <- impairment
# Ordering the dependent variable
OLR_data$Impairment <- factor(OLR_data$Impairment,levels=c(1,2,3,4),labels=c("well","mild","moderate","impaired"))
# Ordering the independent variable
OLR_data$ses <- factor(OLR_data$ses,levels=c(0,1),labels=c("low","high"))
# Exploratory data analysis
# Summarizing the data
summary(OLR_data)
## subject Impairment ses lifeevents
## Min. : 1.00 well :12 low :18 Min. :0.000
## 1st Qu.:10.75 mild :12 high:22 1st Qu.:2.000
## Median :20.50 moderate: 7 Median :4.000
## Mean :20.50 impaired: 9 Mean :4.275
## 3rd Qu.:30.25 3rd Qu.:6.250
## Max. :40.00 Max. :9.000
# Making frequency table
table(OLR_data$Impairment, OLR_data$ses)
##
## low high
## well 4 8
## mild 4 8
## moderate 5 2
## impaired 5 4
table(OLR_data$Impairment, OLR_data$lifeevents)
##
## 0 1 2 3 4 5 6 7 8 9
## well 1 3 2 3 1 0 0 1 0 1
## mild 0 2 1 3 0 3 1 0 1 1
## moderate 1 0 0 2 2 0 1 0 0 1
## impaired 0 0 1 0 2 1 0 1 3 1
# In order to run the ordinal logistic regression model, we need the polr function from the MASS package
library(MASS)
# Build ordinal logistic regression model
OLRmodel <- polr(Impairment ~ ses + lifeevents , data = OLR_data, Hess = TRUE)
summary(OLRmodel)
## Call:
## polr(formula = Impairment ~ ses + lifeevents, data = OLR_data,
## Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## seshigh -1.1112 0.6109 -1.819
## lifeevents 0.3189 0.1210 2.635
##
## Intercepts:
## Value Std. Error t value
## well|mild -0.2819 0.6423 -0.4389
## mild|moderate 1.2128 0.6607 1.8357
## moderate|impaired 2.2094 0.7210 3.0644
##
## Residual Deviance: 99.0979
## AIC: 109.0979
# Run a "only intercept" model
OIM <- polr(Impairment ~ 1, data = OLR_data)
summary(OIM)
##
## Re-fitting to get Hessian
## Call:
## polr(formula = Impairment ~ 1, data = OLR_data)
##
## No coefficients
##
## Intercepts:
## Value Std. Error t value
## well|mild -0.8473 0.3450 -2.4557
## mild|moderate 0.4055 0.3227 1.2562
## moderate|impaired 1.2368 0.3786 3.2663
##
## Residual Deviance: 109.0421
## AIC: 115.0421
# Compare the our test model with the "Only intercept" model
anova(OIM,OLRmodel)
## Likelihood ratio tests of ordinal regression models
##
## Response: Impairment
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 1 37 109.0421
## 2 ses + lifeevents 35 99.0979 1 vs 2 2 9.944156 0.006928734
Before proceeding to examine the individual coefficients, you want to look at an overall test of the null hypothesis that the location coefficients for all of the variables in the model are 0.
You can base this on the change in log-likelihood when the variables are added to a model that contains only the intercept. The change in likelihood function has a chi-square distribution even when there are cells with small observed and predicted counts.
From the table, you see that the chi-square is 9.944 and p = .007. This means that you can reject the null hypothesis that the model without predictors is as good as the model with the predictors.
# Check the predicted probability for each program
OLRmodel$fitted.values
## well mild moderate impaired
## 1 0.62491585 0.2564222 0.07131474 0.04734725
## 2 0.11502358 0.2518325 0.24398557 0.38915837
## 3 0.39028430 0.3502175 0.14495563 0.11454256
## 4 0.46822886 0.3287373 0.11707594 0.08595788
## 5 0.28503416 0.3548980 0.18808642 0.17198137
## 6 0.69621280 0.2146344 0.05428168 0.03487110
## 7 0.35416883 0.3555319 0.15911298 0.13118624
## 8 0.46822886 0.3287373 0.11707594 0.08595788
## 9 0.46822886 0.3287373 0.11707594 0.08595788
## 10 0.19738781 0.3255946 0.22513070 0.25188691
## 11 0.35416883 0.3555319 0.15911298 0.13118624
## 12 0.28503416 0.3548980 0.18808642 0.17198137
## 13 0.31756633 0.3571768 0.17419493 0.15106190
## 14 0.10019399 0.2315350 0.24178533 0.42648565
## 15 0.46822886 0.3287373 0.11707594 0.08595788
## 16 0.35416883 0.3555319 0.15911298 0.13118624
## 17 0.15167000 0.2918553 0.23993344 0.31654127
## 18 0.54775531 0.2959818 0.09227179 0.06399113
## 19 0.13282485 0.2729369 0.24333367 0.35090454
## 20 0.31756633 0.3571768 0.17419493 0.15106190
## 21 0.11502358 0.2518325 0.24398557 0.38915837
## 22 0.22469945 0.3390046 0.21407808 0.22221786
## 23 0.46822886 0.3287373 0.11707594 0.08595788
## 24 0.62491585 0.2564222 0.07131474 0.04734725
## 25 0.42998727 0.3408054 0.13029529 0.09891202
## 26 0.39028430 0.3502175 0.14495563 0.11454256
## 27 0.22469945 0.3390046 0.21407808 0.22221786
## 28 0.04102612 0.1191445 0.18048372 0.65934567
## 29 0.25278003 0.3485130 0.20206806 0.19663890
## 30 0.17402747 0.3103145 0.23352933 0.28212871
## 31 0.22469945 0.3390046 0.21407808 0.22221786
## 32 0.15167000 0.2918553 0.23993344 0.31654127
## 33 0.54775531 0.2959818 0.09227179 0.06399113
## 34 0.19738781 0.3255946 0.22513070 0.25188691
## 35 0.13282485 0.2729369 0.24333367 0.35090454
## 36 0.17402747 0.3103145 0.23352933 0.28212871
## 37 0.17402747 0.3103145 0.23352933 0.28212871
## 38 0.15167000 0.2918553 0.23993344 0.31654127
## 39 0.05557758 0.1522454 0.20761767 0.58455931
## 40 0.04102612 0.1191445 0.18048372 0.65934567
# We can get the predicted result by use predict functio
predict(OLRmodel)
## [1] well impaired well well mild well mild
## [8] well well mild mild mild mild impaired
## [15] well mild impaired well impaired mild impaired
## [22] mild well well well well mild impaired
## [29] mild mild mild impaired well mild impaired
## [36] mild mild impaired impaired impaired
## Levels: well mild moderate impaired
# Test the goodness of fit
chisq.test(OLR_data$Impairment,predict(OLRmodel))
## Warning in chisq.test(OLR_data$Impairment, predict(OLRmodel)): Chi-squared
## approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: OLR_data$Impairment and predict(OLRmodel)
## X-squared = 7.9614, df = 6, p-value = 0.2409
#Compute confusion table and misclassification error
predictOLR <- predict(OLRmodel,OLR_data)
cTab <- table(OLR_data$Impairment, predictOLR)
mean(as.character(OLR_data$Impairment) != as.character(predictOLR))
## [1] 0.625
(CCR <- sum(diag(cTab)) / sum(cTab)) # Calculate the classification rate
## [1] 0.375
The confusion matrix shows the performance of the ordinal logistic regression model. For example, it shows that, in the test dataset, 6 times category “well” is identified correctly. Similarly, 4 times category “mild” and 5 times category "" is identified correctly.
Using the confusion matrix, we can find that the misclassification error for our model is 62.5%.
# Load the DescTools package for calculate the R square
library("DescTools")
# Calculate the R Square
PseudoR2(OLRmodel, which = c("CoxSnell","Nagelkerke","McFadden"))
## CoxSnell Nagelkerke McFadden
## 0.22011118 0.23553327 0.09119561
There are several R^2-like statistics that can be used to measure the strength of the association between the dependent variable and the predictor variables.
They are not as useful as the statistic in multiple regression, since their interpretation is not straightforward.
For this example, there is the relationship of 23.6% between independent variables and dependent variable based on Nagelkerke’s R^2.
# We can use the coef() function to check the parameter estimates
(OLRestimates <- coef(summary(OLRmodel)))
## Value Std. Error t value
## seshigh -1.1112310 0.6108786 -1.819070
## lifeevents 0.3188613 0.1209922 2.635387
## well|mild -0.2819031 0.6422667 -0.438919
## mild|moderate 1.2127926 0.6606543 1.835745
## moderate|impaired 2.2093721 0.7209703 3.064443
# Add the p-value to our parameter estimates table
p <- pnorm(abs(OLRestimates[, "t value"]), lower.tail = FALSE) * 2
(OLRestimates_P <- cbind(OLRestimates, "p value" = p))
## Value Std. Error t value p value
## seshigh -1.1112310 0.6108786 -1.819070 0.068900746
## lifeevents 0.3188613 0.1209922 2.635387 0.008404133
## well|mild -0.2819031 0.6422667 -0.438919 0.660720194
## mild|moderate 1.2127926 0.6606543 1.835745 0.066395449
## moderate|impaired 2.2093721 0.7209703 3.064443 0.002180760
# Bulding the newpatient entry to test our model
newpatient <- data.frame("subject" = 1, "Impairment" = NA, "ses" = "high", "lifeevents"=3)
# Use predict function to predict the patient's situation
predict(OLRmodel,newpatient)
## [1] well
## Levels: well mild moderate impaired