Code page for Stephen Peplow's article on takeup prediction

This page contains the code that I used in the analyses. The code is for R language of statistical computing.

The classification tree

There are several different classification tree algorithms available. I used the conditional classification tree from the 'party package'

Load the data

KPU2 <- read.csv("G:/KPU_takeup/KPU2.csv")

Load the party library (having first installed the package)

library(party)

Now convert some of the variables to factors

KPU2$refuse <- as.factor(KPU2$refuse)
KPU2$age <- as.factor(KPU2$agefac)

Now build the tree


kpctree <- ctree(refuse ~ faculty + ratecode + age + ratecode + level, data = KPU2)

And plot it

plot(kpctree)

plot of chunk unnamed-chunk-5

The Logistic Regression

kp <- glm(refuse ~ faculty + ratecode + age, family = binomial, data = KPU2)

Obtain the summary

summary(kp)

Call:
glm(formula = refuse ~ faculty + ratecode + age, family = binomial, 
    data = KPU2)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.060  -0.751  -0.621  -0.143   3.030  

Coefficients:
                                           Estimate Std. Error z value
(Intercept)                                -0.28837    0.19128   -1.51
facultyArts                                -0.09435    0.19502   -0.48
facultyBusiness                            -0.51995    0.19827   -2.62
facultyCommunity and Health Studies        -2.74882    0.36300   -7.57
facultyDesign                             -14.67735  103.18533   -0.14
facultyNon-credential students (Academic)   0.06487    0.28874    0.22
facultyScience and Horticulture             0.00544    0.19977    0.03
facultyTrades and Technology               -3.29720    0.31385  -10.51
ratecodeINTERNATIONAL                      -0.36663    0.11809   -3.10
age2                                       -0.73845    0.05220  -14.15
age3                                       -0.95962    0.06606  -14.53
age4                                       -0.99611    0.13187   -7.55
age5                                       -0.88555    0.13967   -6.34
age6                                       -0.65572    0.17214   -3.81
age7                                       -0.39242    0.21666   -1.81
age8                                       -0.67001    0.33625   -1.99
age9                                       -0.34340    0.49413   -0.69
age10                                      -1.01235    0.43022   -2.35
age11                                      -0.66577    0.24205   -2.75
                                          Pr(>|z|)    
(Intercept)                                0.13167    
facultyArts                                0.62853    
facultyBusiness                            0.00873 ** 
facultyCommunity and Health Studies        3.7e-14 ***
facultyDesign                              0.88689    
facultyNon-credential students (Academic)  0.82224    
facultyScience and Horticulture            0.97827    
facultyTrades and Technology               < 2e-16 ***
ratecodeINTERNATIONAL                      0.00190 ** 
age2                                       < 2e-16 ***
age3                                       < 2e-16 ***
age4                                       4.2e-14 ***
age5                                       2.3e-10 ***
age6                                       0.00014 ***
age7                                       0.07010 .  
age8                                       0.04631 *  
age9                                       0.48708    
age10                                      0.01862 *  
age11                                      0.00595 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 13939  on 12967  degrees of freedom
Residual deviance: 12579  on 12949  degrees of freedom
AIC: 12617

Number of Fisher Scoring iterations: 14