Problem Set # 4

Holly Widen

date()
## [1] "Mon Nov 19 22:01:46 2012"

Due Date: November 20, 2012
Total Points: 30

1 Use the petrol consumption data set from Lecture 16 and build a regression tree to predict petrol consumption based on petrol tax, average income, amount of pavement and the proportion of the population with drivers licences. Plot the tree. Which variables are split first and second? Prune the tree leaving only three terminal nodes. Plot the final tree. (10)

PC = read.table("http://myweb.fsu.edu/jelsner/PetrolConsumption.txt", header = TRUE)
head(PC)
##   Petrol.Tax Avg.Inc Pavement Prop.DL Petrol.Consumption
## 1        9.0    3571     1976   0.525                541
## 2        9.0    4092     1250   0.572                524
## 3        9.0    3865     1586   0.580                561
## 4        7.5    4870     2351   0.529                414
## 5        8.0    4399      431   0.544                410
## 6       10.0    5342     1333   0.571                457
suppressMessages(require(tree))
## Warning: package 'tree' was built under R version 2.15.2
tr = tree(Petrol.Consumption ~ ., data = PC)
plot(tr)
text(tr)

plot of chunk RegressionTree

Prop.DL and Avg.Inc are split first and second, respectively.

tr2 = prune.tree(tr, best = 3)
plot(tr2)
text(tr2)

plot of chunk RegressionTree2

tr2
## node), split, n, deviance, yval
##       * denotes terminal node
## 
## 1) root 48 6e+05 600  
##   2) Prop.DL < 0.646 42 3e+05 600  
##     4) Avg.Inc < 4395 27 9e+04 600 *
##     5) Avg.Inc > 4395 15 5e+04 500 *
##   3) Prop.DL > 0.646 6 9e+04 800 *

2 Use the data from Lecture 18 to model the probability of O-ring damage as a logistic regression using launch temperature as the explanatory variable. Is the temperature a significant predictor of damage? Is it adequate? What are the odds of damage when launch temperature is 60F relative to the odds of damage when the temperature is 75F? Use the model to predict the probability of damage given a launch temperature of 55F. (20)

temp = c(66, 70, 69, 68, 67, 72, 73, 70, 57, 63, 70, 78, 67, 53, 67, 75, 70, 
    81, 76, 79, 75, 76, 58)
damage = c(0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 
    1)
logrm = glm(damage ~ temp, family = binomial)
summary(logrm)
## 
## Call:
## glm(formula = damage ~ temp, family = binomial)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.061  -0.761  -0.378   0.452   2.217  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   15.043      7.379    2.04    0.041 *
## temp          -0.232      0.108   -2.14    0.032 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 28.267  on 22  degrees of freedom
## Residual deviance: 20.315  on 21  degrees of freedom
## AIC: 24.32
## 
## Number of Fisher Scoring iterations: 5

The small p-value confirms that temperature is an important and significant predictor of damage.

The difference between the null and residual deviance is 28.267 - 20.315 = 7.952 on 22 - 21 = 1 degree of freedom. Comparing this drop in deviance to a chi-squared value.

pchisq(7.952, 1, lower.tail = F)
## [1] 0.004803

The p-value for the model is approximately 0.005 and so we can say that there is convincing evidence that the model is significant and thus, adequate in explaining damage.

We can write the logistic regression model where pi represents damage probability:
logit(pi) = 15.043 - 0.232 * temp
OR
In words: The logarithm of the odds of damage equals 15.043 minus 0.232 times temp.
So, the odds of damage when launch temperature is 60F relative to the odds of damage when the temperature is 75F is:
exp(-0.232*(60 - 75)) = 32.460 ~ 32.5
Thus, the damage odds of a 60F launch temperature are about 32.5 times the damage odds of a 75F launch temperature.

predict(logrm, data.frame(temp = 55), type = "response")
##      1 
## 0.9067

Based on the data and the model, we can predict a damage probability of 90.7% given a launch temperature of 55F.