Problem Set # 4

William “Luke” Tabbert

date()
## [1] "Mon Nov 19 15:36:24 2012"

Due Date: November 20, 2012 Total Points: 30

1 Use the petrol consumption data set from Lecture 16 and build a regression tree to predict petrol consumption based on petrol tax, average income, amount of pavement and the proportion of the population with drivers licences. Plot the tree. Which variables are split first and second? Prune the tree leaving only three terminal nodes. Plot the final tree. (10)

library("tree")
petrol = read.table("http://myweb.fsu.edu/jelsner/PetrolConsumption.txt", header = TRUE)
tr = tree(Petrol.Consumption ~ ., data = petrol)
plot(tr)
text(tr)

plot of chunk question1a

The first split variable is “Prop.DL” while the second is “Avg.Inc”

tr2 = prune.tree(tr, best = 3)
plot(tr2)
text(tr2)

plot of chunk question1b

2 Use the data from Lecture 18 to model the probability of O-ring damage as a logistic regression using launch temperature as the explanatory variable. Is the temperature a significant predictor of damage? Is it adequate? What are the odds of damage when launch temperature is 60F relative to the odds of damage when the temperature is 75F? Use the model to predict the probability of damage given a launch temperature of 55F. (20)

temp = c(66, 70, 69, 68, 67, 72, 73, 70, 57, 63, 70, 78, 67, 53, 67, 75, 70, 
    81, 76, 79, 75, 76, 58)
damage = c(0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 
    1)
launch = data.frame(temp, damage)
logrm = glm(damage ~ ., data = launch, family = binomial)
summary(logrm)
## 
## Call:
## glm(formula = damage ~ ., family = binomial, data = launch)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.061  -0.761  -0.378   0.452   2.217  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   15.043      7.379    2.04    0.041 *
## temp          -0.232      0.108   -2.14    0.032 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 28.267  on 22  degrees of freedom
## Residual deviance: 20.315  on 21  degrees of freedom
## AIC: 24.32
## 
## Number of Fisher Scoring iterations: 5

The small p-values indicate that temperature is a significant predictor of damage.

pchisq(20.315, 21, lower.tail = F)
## [1] 0.5014

The large p-value indicates that the model is adequate.

exp(-0.2322 * (75 - 60))
## [1] 0.03072

The odds of damage during a 75F launch relative to a 60F launch is 3.1%

predict(logrm, data.frame(temp = 55), type = "response")
##      1 
## 0.9067

There is a 90.7% chance that a shuttle will sustain damage during a 55F launch.