Bricklayer project

I am trying to find the relationship between testscore and wordscore. Can wordscore predict testscore? There are four different boards. I have renumbered them 1,2,3,4 and these are in ascending order of difficulty. There should be a strong positive relationship between testscore and wordscore regardless of the difficulty of the board.

testscore is a binary variable. I have coded it as a factor because the actual number is meaningless (it is just a category). Same with the board…we don't want the actual number to start confusing things. Some of the code appears below, ignore that. I leave it in for my own reference. Then I ran a classification tree with testscore as the dependent variable, and wordscore and board as the explanatory variables:

library(party)
heidiboard <- read.csv("C:/Users/Stephen/Desktop/Heidi/heidiboard.csv")
heidiboard$testscore <- as.factor(heidiboard$testscore)
heidiboard$board <- as.factor(heidiboard$board)
qq <- ctree(testscore ~ wordscore + board, data = heidiboard)
plot(qq)

plot of chunk unnamed-chunk-1

What we can see from this is that: at Node 1, there is a split in wordscore at 21.43. If the wordscore is smaller than 21.43, the board doesn't matter. You proceed to Node 2. If your wordscore was smaller than or equal to zero, you had a 0.25 probability of getting a 1 on the testscore. If you scored more than zero, you had a 0.4 probability of getting a 1 on the testscore.

b. If your wordscore was above 21.43, then we run into the boards at Node 5. Your score on board 2 gives you a 0.4 probability of getting a 1 on the testscore. Your score on boards 1,3,4 give you a 0.6 probability of getting a 1 on the testscore.

Then I tried logistic regression to quantify the size of the variables and their significance:

heidiboard <- read.csv("C:/Users/Stephen/Desktop/Heidi/heidiboard.csv")
lb1 <- glm(testscore ~ wordscore + board, family = binomial, data = heidiboard)
summary(lb1)

Call:
glm(formula = testscore ~ wordscore + board, family = binomial, 
    data = heidiboard)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.476  -1.026  -0.897   1.175   1.510  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.54610    0.12771   -4.28  1.9e-05 ***
wordscore    0.01987    0.00229    8.68  < 2e-16 ***
board       -0.05229    0.04249   -1.23     0.22    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2575.4  on 1871  degrees of freedom
Residual deviance: 2495.7  on 1869  degrees of freedom
AIC: 2502

Number of Fisher Scoring iterations: 4

In the output, note that: number of *** indicates statistical significance. Board 1 is not listed because it is the reference level. The coefficients for the other three boards are to be seen in reference to board 1.

So: wordscore is highly significant with an extremely small p value. It is a highly reliable predictor of testscore. The coefficients for the boards are all negative, which is as it should be because they arer all harder than the reference level, board 1. Board 4 is not significant.

Some thoughts: there is something odd about board 2. I tried an interaction between wordscore and board:

lb2 <- glm(testscore ~ wordscore * board, family = binomial, data = heidiboard)
summary(lb2)

Call:
glm(formula = testscore ~ wordscore * board, family = binomial, 
    data = heidiboard)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.472  -1.030  -0.886   1.169   1.550  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)  
(Intercept)     -0.40266    0.17533   -2.30    0.022 *
wordscore        0.01383    0.00557    2.48    0.013 *
board           -0.11018    0.06476   -1.70    0.089 .
wordscore:board  0.00243    0.00205    1.19    0.236  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2575.4  on 1871  degrees of freedom
Residual deviance: 2494.3  on 1868  degrees of freedom
AIC: 2502

Number of Fisher Scoring iterations: 4

The only interaction that is significant is between board 2 and wordscore. This is out of my field but there is something going on here.