A study looks at factors that influence the decision of whether to apply to graduate school. College juniors are asked if they are unlikely, somewhat likely, or very likely to apply to graduate school. Hence, our outcome variable has three categories.
Data on parental educational status, whether the undergraduate institution is public or private, and current GPA is also collected. The researchers have reason to believe that the “distances” between these three points are not equal. For example, the “distance” between “unlikely” and “somewhat likely” may be shorter than the distance between “somewhat likely” and “very likely”.
dat <- read.csv("https://raw.githubusercontent.com/RWorkshop/workshopdatasets/master/ologit.csv")
dat <- dat %>% mutate(apply = factor(apply))
## one at a time, table apply, pared, and public
lapply(dat[, c("apply", "pared", "public")], table)
## $apply
##
## somewhat likely unlikely very likely
## 140 220 40
##
## $pared
##
## 0 1
## 337 63
##
## $public
##
## 0 1
## 343 57
## three way cross tabs (xtabs) and flatten the table
ftable(xtabs(~ public + apply + pared, data = dat))
## pared 0 1
## public apply
## 0 somewhat likely 98 26
## unlikely 175 14
## very likely 20 10
## 1 somewhat likely 12 4
## unlikely 25 6
## very likely 7 3
summary(dat$gpa)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.900 2.720 2.990 2.999 3.270 4.000
ggplot(dat, aes(x = apply, y = gpa)) +
geom_boxplot(size = .75) +
geom_jitter(alpha = .5) +
facet_grid(pared ~ public, margins = TRUE) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
Below we use the polr command from the MASS package to estimate an ordered logistic regression model. The command name comes from proportional odds logistic regression, highlighting the proportional odds assumption in our model. polr uses the standard formula interface in R for specifying a regression model with outcome followed by predictors. We also specify Hess=TRUE to have the model return the observed information matrix from optimization (called the Hessian) which is used to get standard errors.
## fit ordered logit model and store results 'm'
m <- polr(apply ~ pared + public + gpa, data = dat, Hess=TRUE)
## view a summary of the model
summary(m)
## Call:
## polr(formula = apply ~ pared + public + gpa, data = dat, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## pared -0.2635 0.2876 -0.9162
## public 0.5600 0.2955 1.8953
## gpa -0.0708 0.2533 -0.2795
##
## Intercepts:
## Value Std. Error t value
## somewhat likely|unlikely -0.7959 0.7516 -1.0589
## unlikely|very likely 2.0445 0.7606 2.6882
##
## Residual Deviance: 736.9261
## AIC: 746.9261
## store table
(ctable <- coef(summary(m)))
## Value Std. Error t value
## pared -0.26352826 0.2876425 -0.9161658
## public 0.55999663 0.2954713 1.8952655
## gpa -0.07080247 0.2533390 -0.2794772
## somewhat likely|unlikely -0.79587821 0.7516351 -1.0588625
## unlikely|very likely 2.04454674 0.7605593 2.6882148
## calculate and store p values
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
## combined table
(ctable <- cbind(ctable, "p value" = p))
## Value Std. Error t value p value
## pared -0.26352826 0.2876425 -0.9161658 0.359579940
## public 0.55999663 0.2954713 1.8952655 0.058057240
## gpa -0.07080247 0.2533390 -0.2794772 0.779878661
## somewhat likely|unlikely -0.79587821 0.7516351 -1.0588625 0.289662386
## unlikely|very likely 2.04454674 0.7605593 2.6882148 0.007183518
(ci <- confint(m)) # default method gives profiled CIs
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## pared -0.82849916 0.3019309
## public -0.01549255 1.1442197
## gpa -0.56786483 0.4264723
confint.default(m) # CIs assuming normality
## 2.5 % 97.5 %
## pared -0.82729727 0.3002408
## public -0.01911656 1.1391098
## gpa -0.56733779 0.4257329
## odds ratios
exp(coef(m))
## pared public gpa
## 0.7683359 1.7506666 0.9316459
## pared public gpa
## 2.8511 0.9429 1.8514
## OR and CI
exp(cbind(OR = coef(m), ci))
## OR 2.5 % 97.5 %
## pared 0.7683359 0.4367042 1.352468
## public 1.7506666 0.9846268 3.139990
## gpa 0.9316459 0.5667342 1.531844
sf <- function(y) {
c('Y>=1' = qlogis(mean(y >= 1)),
'Y>=2' = qlogis(mean(y >= 2)),
'Y>=3' = qlogis(mean(y >= 3)))
}
(s <- with(dat, summary(as.numeric(apply) ~ pared + public + gpa, fun=sf)))
## Length Class Mode
## 3 formula call
glm(I(as.numeric(apply) >= 2) ~ pared, family="binomial", data = dat)
##
## Call: glm(formula = I(as.numeric(apply) >= 2) ~ pared, family = "binomial",
## data = dat)
##
## Coefficients:
## (Intercept) pared
## 0.7245 -0.6292
##
## Degrees of Freedom: 399 Total (i.e. Null); 398 Residual
## Null Deviance: 518
## Residual Deviance: 512.9 AIC: 516.9
glm(I(as.numeric(apply) >= 3) ~ pared, family="binomial", data = dat)
##
## Call: glm(formula = I(as.numeric(apply) >= 3) ~ pared, family = "binomial",
## data = dat)
##
## Coefficients:
## (Intercept) pared
## -2.441 1.094
##
## Degrees of Freedom: 399 Total (i.e. Null); 398 Residual
## Null Deviance: 260.1
## Residual Deviance: 252.2 AIC: 256.2
newdat <- data.frame(
pared = rep(0:1, 200),
public = rep(0:1, each = 200),
gpa = rep(seq(from = 1.9, to = 4, length.out = 100), 4))
newdat <- cbind(newdat, predict(m, newdat, type = "probs"))
##show first few rows
head(newdat)
## pared public gpa somewhat likely unlikely very likely
## 1 0 0 1.900000 0.3404356 0.5579187 0.10164569
## 2 1 0 1.921212 0.4021960 0.5179330 0.07987096
## 3 0 0 1.942424 0.3411104 0.5575179 0.10137173
## 4 1 0 1.963636 0.4029184 0.5174311 0.07965049
## 5 0 0 1.984848 0.3417858 0.5571157 0.10109843
## 6 1 0 2.006061 0.4036413 0.5169282 0.07943058
lnewdat <- melt(newdat, id.vars = c("pared", "public", "gpa"),
variable.name = "Level", value.name="Probability")
## view first few rows
head(lnewdat)
## pared public gpa Level Probability
## 1 0 0 1.900000 somewhat likely 0.3404356
## 2 1 0 1.921212 somewhat likely 0.4021960
## 3 0 0 1.942424 somewhat likely 0.3411104
## 4 1 0 1.963636 somewhat likely 0.4029184
## 5 0 0 1.984848 somewhat likely 0.3417858
## 6 1 0 2.006061 somewhat likely 0.4036413
ggplot(lnewdat, aes(x = gpa, y = Probability, colour = Level)) +
geom_line() + facet_grid(pared ~ public, labeller="label_both")