Introduction

The survey was conducted among 400 students. The survey was meant to investigate the likelihood of a student to apply for college based on:

If one or both parents had a college education (pared)
If the student attended public or private school (public), and
The GPA of the student (GPA)

The response varibale in question (application to college) is an ordered variable with 3 levels; 1 = Not Likely, 2 = Likely and 3 = Very Likely

From our data structure, modelling of the data can be donwe using Ordinal logistic regression. This generalized type of linear models is used when the response variable is ordinal.

Loading packages and data

Here we shall explore the command used to create the model. But first we need to load the libraries used for this analysis and the data itself.

library(ordinal)  #Ordinal regression package
library(MASS) #plyr method
library(brant) #test of prportional odds

appdat<-read.csv(file.choose(), header = TRUE, sep = ",") ##select the csv file from location
head(appdat)

##   apply pared public  GPA
## 1     2     0      0 3.26
## 2     1     1      0 3.21
## 3     0     1      1 3.94
## 4     1     0      0 2.81
## 5     1     0      0 2.53
## 6     0     0      1 2.59

Transforming data

We need to confirm the data of the structure and make any necessary transformations before creating a model

str(appdat)

## 'data.frame':    400 obs. of  4 variables:
##  $ apply : int  2 1 0 1 1 0 1 1 0 1 ...
##  $ pared : int  0 1 1 0 0 0 0 0 0 1 ...
##  $ public: int  0 0 1 0 0 1 0 0 0 0 ...
##  $ GPA   : num  3.26 3.21 3.94 2.81 2.53 2.59 2.56 2.73 3 3.5 ...

#Binaries
appdat$pared<-as.factor(appdat$pared)
appdat$public<-as.factor(appdat$public)

## Ordered category
appdat$apply<-factor(appdat$apply,
                     levels = c(0,1,2),
                     ordered = T)

str(appdat)

## 'data.frame':    400 obs. of  4 variables:
##  $ apply : Ord.factor w/ 3 levels "0"<"1"<"2": 3 2 1 2 2 1 2 2 1 2 ...
##  $ pared : Factor w/ 2 levels "0","1": 1 2 2 1 1 1 1 1 1 2 ...
##  $ public: Factor w/ 2 levels "0","1": 1 1 2 1 1 2 1 1 1 1 ...
##  $ GPA   : num  3.26 3.21 3.94 2.81 2.53 2.59 2.56 2.73 3 3.5 ...

Now our response variable is ordinal and pared and public are binary variables

Creating the model

We now proceed to model the data using the polr command in the MASS package

##Create a an ordinal logistic regression model

appmod<-polr(apply ~ pared + public + GPA, data = appdat, Hess = T)
summary(appmod)

## Call:
## polr(formula = apply ~ pared + public + GPA, data = appdat, Hess = T)
## 
## Coefficients:
##            Value Std. Error t value
## pared1   1.04769     0.2658  3.9418
## public1 -0.05879     0.2979 -0.1974
## GPA      0.61594     0.2606  2.3632
## 
## Intercepts:
##     Value   Std. Error t value
## 0|1  2.2039  0.7795     2.8272
## 1|2  4.2994  0.8043     5.3453
## 
## Residual Deviance: 717.0249 
## AIC: 727.0249

The model is of the form;

\[logit(\hat P(Y\leq1)) = 2.204 + 1.048(Pared1) - 0.059(Public1) + 0.616(GPA) \] \[logit(\hat P(Y\leq2)) = 4.299 + 1.048(Pared1) - 0.059(Public1) + 0.616(GPA) \]

### Lets find the p values

aptable <- coef(summary(appmod))
pv<-pnorm(abs(aptable[, "t value"]), lower.tail = FALSE)*2
(aptable<-cbind(aptable, "p value" = pv))

##               Value Std. Error    t value      p value
## pared1   1.04769011  0.2657894  3.9418050 8.087070e-05
## public1 -0.05878572  0.2978614 -0.1973593 8.435464e-01
## GPA      0.61594057  0.2606340  2.3632399 1.811594e-02
## 0|1      2.20391472  0.7795455  2.8271792 4.696004e-03
## 1|2      4.29936313  0.8043267  5.3452947 9.027008e-08

The p-values for pared and GPA are below 0.05 hence they are significant at = 0.05 in predicting the likelihood of a student to apply for college

Let’s check the significance of the model using confidence intervals.

## Confidence intervals

confint(appmod)

## Waiting for profiling to be done...

##              2.5 %    97.5 %
## pared1   0.5281768 1.5721750
## public1 -0.6522060 0.5191384
## GPA      0.1076202 1.1309148

Using the confidence intervals (CI), if the 95% CI doesn’t cross 0, the parameter estimate is statistically significant. In this case, pared and GPA are statistically significant in predicting the likelihood of a student applying to college since the range doesn’t contain 0.

Odds ratio

###Odds ratio

exp(coef(appmod))

##    pared1   public1       GPA 
## 2.8510579 0.9429088 1.8513971

exp(cbind(OR = coef(appmod), confint(appmod)))

## Waiting for profiling to be done...

##                OR     2.5 %   97.5 %
## pared1  2.8510579 1.6958376 4.817114
## public1 0.9429088 0.5208954 1.680579
## GPA     1.8513971 1.1136247 3.098490

Interpetation of odds ratio

A student whose parent(s) has a college education is 2.85 times more likely to apply for college than student who has parents who didn’t get a college degree, holding all other factors constant

A student is a public school is 5.7% less likely to apply for college compared to a student in a private school, holding all other factors constant

For every unit increase in the GPA, a student is 85% more likely to apply for a college degree, holding all other factors constant.

Likelihood of a student to apply to college

Hilda Ngatia

2024-01-28