The survey was conducted among 400 students. The survey was meant to investigate the likelihood of a student to apply for college based on:
The response varibale in question (application to college) is an ordered variable with 3 levels; 1 = Not Likely, 2 = Likely and 3 = Very Likely
From our data structure, modelling of the data can be donwe using Ordinal logistic regression. This generalized type of linear models is used when the response variable is ordinal.
Here we shall explore the command used to create the model. But first we need to load the libraries used for this analysis and the data itself.
library(ordinal) #Ordinal regression package
library(MASS) #plyr method
library(brant) #test of prportional odds
appdat<-read.csv(file.choose(), header = TRUE, sep = ",") ##select the csv file from location
head(appdat)
## apply pared public GPA
## 1 2 0 0 3.26
## 2 1 1 0 3.21
## 3 0 1 1 3.94
## 4 1 0 0 2.81
## 5 1 0 0 2.53
## 6 0 0 1 2.59
We need to confirm the data of the structure and make any necessary transformations before creating a model
str(appdat)
## 'data.frame': 400 obs. of 4 variables:
## $ apply : int 2 1 0 1 1 0 1 1 0 1 ...
## $ pared : int 0 1 1 0 0 0 0 0 0 1 ...
## $ public: int 0 0 1 0 0 1 0 0 0 0 ...
## $ GPA : num 3.26 3.21 3.94 2.81 2.53 2.59 2.56 2.73 3 3.5 ...
#Binaries
appdat$pared<-as.factor(appdat$pared)
appdat$public<-as.factor(appdat$public)
## Ordered category
appdat$apply<-factor(appdat$apply,
levels = c(0,1,2),
ordered = T)
str(appdat)
## 'data.frame': 400 obs. of 4 variables:
## $ apply : Ord.factor w/ 3 levels "0"<"1"<"2": 3 2 1 2 2 1 2 2 1 2 ...
## $ pared : Factor w/ 2 levels "0","1": 1 2 2 1 1 1 1 1 1 2 ...
## $ public: Factor w/ 2 levels "0","1": 1 1 2 1 1 2 1 1 1 1 ...
## $ GPA : num 3.26 3.21 3.94 2.81 2.53 2.59 2.56 2.73 3 3.5 ...
Now our response variable is ordinal and pared and public are binary variables
We now proceed to model the data using the polr command in the MASS package
##Create a an ordinal logistic regression model
appmod<-polr(apply ~ pared + public + GPA, data = appdat, Hess = T)
summary(appmod)
## Call:
## polr(formula = apply ~ pared + public + GPA, data = appdat, Hess = T)
##
## Coefficients:
## Value Std. Error t value
## pared1 1.04769 0.2658 3.9418
## public1 -0.05879 0.2979 -0.1974
## GPA 0.61594 0.2606 2.3632
##
## Intercepts:
## Value Std. Error t value
## 0|1 2.2039 0.7795 2.8272
## 1|2 4.2994 0.8043 5.3453
##
## Residual Deviance: 717.0249
## AIC: 727.0249
The model is of the form;
\[logit(\hat P(Y\leq1)) = 2.204 + 1.048(Pared1) - 0.059(Public1) + 0.616(GPA) \] \[logit(\hat P(Y\leq2)) = 4.299 + 1.048(Pared1) - 0.059(Public1) + 0.616(GPA) \]
### Lets find the p values
aptable <- coef(summary(appmod))
pv<-pnorm(abs(aptable[, "t value"]), lower.tail = FALSE)*2
(aptable<-cbind(aptable, "p value" = pv))
## Value Std. Error t value p value
## pared1 1.04769011 0.2657894 3.9418050 8.087070e-05
## public1 -0.05878572 0.2978614 -0.1973593 8.435464e-01
## GPA 0.61594057 0.2606340 2.3632399 1.811594e-02
## 0|1 2.20391472 0.7795455 2.8271792 4.696004e-03
## 1|2 4.29936313 0.8043267 5.3452947 9.027008e-08
The p-values for pared and GPA are below 0.05 hence they are significant at = 0.05 in predicting the likelihood of a student to apply for college
Let’s check the significance of the model using confidence intervals.
## Confidence intervals
confint(appmod)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## pared1 0.5281768 1.5721750
## public1 -0.6522060 0.5191384
## GPA 0.1076202 1.1309148
Using the confidence intervals (CI), if the 95% CI doesn’t cross 0, the parameter estimate is statistically significant. In this case, pared and GPA are statistically significant in predicting the likelihood of a student applying to college since the range doesn’t contain 0.
###Odds ratio
exp(coef(appmod))
## pared1 public1 GPA
## 2.8510579 0.9429088 1.8513971
exp(cbind(OR = coef(appmod), confint(appmod)))
## Waiting for profiling to be done...
## OR 2.5 % 97.5 %
## pared1 2.8510579 1.6958376 4.817114
## public1 0.9429088 0.5208954 1.680579
## GPA 1.8513971 1.1136247 3.098490
A student whose parent(s) has a college education is 2.85 times more likely to apply for college than student who has parents who didn’t get a college degree, holding all other factors constant
A student is a public school is 5.7% less likely to apply for college compared to a student in a private school, holding all other factors constant
For every unit increase in the GPA, a student is 85% more likely to apply for a college degree, holding all other factors constant.