Problem Set # 4

Rob Leteff

date()
## [1] "Thu Nov 15 14:59:55 2012"

Due Date: November 20, 2012
Total Points: 30

1 Use the petrol consumption data set from Lecture 16 and build a regression tree to predict petrol consumption based on petrol tax, average income, amount of pavement and the proportion of the population with drivers licences. Plot the tree. Which variables are split first and second? Prune the tree leaving only three terminal nodes. Plot the final tree. (10)

require(tree)
## Loading required package: tree
PC = read.table("http://myweb.fsu.edu/jelsner/PetrolConsumption.txt", header = TRUE)
head(PC)
##   Petrol.Tax Avg.Inc Pavement Prop.DL Petrol.Consumption
## 1        9.0    3571     1976   0.525                541
## 2        9.0    4092     1250   0.572                524
## 3        9.0    3865     1586   0.580                561
## 4        7.5    4870     2351   0.529                414
## 5        8.0    4399      431   0.544                410
## 6       10.0    5342     1333   0.571                457
PCtree = tree(Petrol.Consumption ~ ., data = PC)
plot(PCtree)
text(PCtree)

plot of chunk unnamed-chunk-2

The tree splits first at the proportion of people with drivers licenses and second at average income.

PCtree2 = prune.tree(PCtree, best = 3)
plot(PCtree2)
text(PCtree2)

plot of chunk unnamed-chunk-3

2 Use the data from Lecture 18 to model the probability of O-ring damage as a logistic regression using launch temperature as the explanatory variable. Is the temperature a significant predictor of damage? Is it adequate? What are the odds of damage when launch temperature is 60F relative to the odds of damage when the temperature is 75F? Use the model to predict the probability of damage given a launch temperature of 55F. (20)

temp = c(66, 70, 69, 68, 67, 72, 73, 70, 57, 63, 70, 78, 67, 53, 67, 75, 70, 
    81, 76, 79, 75, 76, 58)
damage = c(0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 
    1)
logrm = glm(damage ~ temp, family = binomial)
summary(logrm)
## 
## Call:
## glm(formula = damage ~ temp, family = binomial)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.061  -0.761  -0.378   0.452   2.217  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   15.043      7.379    2.04    0.041 *
## temp          -0.232      0.108   -2.14    0.032 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 28.267  on 22  degrees of freedom
## Residual deviance: 20.315  on 21  degrees of freedom
## AIC: 24.32
## 
## Number of Fisher Scoring iterations: 5
pchisq(28.267 - 20.315, 1, lower.tail = F)
## [1] 0.004803

Is the temperature a significant predictor of damage?
Temperature is a significant predictor of damage because the p-value is less than 0.01.

pchisq(20.315, 21, lower.tail = FALSE)
## [1] 0.5014

Is it adequate?
Given the high p-value (greater than 0.15), there is not enough evidence to reject the null hypothesis and we have to accept that the model is adequate.

What are the odds of damage when launch temperature is 60F relative to the odds of damage when the temperature is 75F?

exp(-0.2322 * (75 - 60))
## [1] 0.03072

The odds that damage will occur to the O ring at 60 degrees are 100/3.

Use the model to predict the probability of damage given a launch temperature of 55F.

predict(logrm, data.frame(temp = 55, damage = 1), type = "response")
##      1 
## 0.9067

The probability of damage occurring at 55 degrees is 91%.