Week 8 HW

8.2 (a)

Birth weight of baby = 120.07 - 1.93 * parity

(b)

The negative slope indicates that if the baby is not the first born then their birth weight will be lower. First born birth weight is 120.07 ounces and non-first born is 118.14 ounces.

(c)

The p-value for the parity variable is 0.1052 so there is not a statistically significant relationship.

8.4 (a)

Number of days absent = 18.93 - 9.11 * ethnicity + 3.10 * sex + 2.15 * learner status

(b)

Ethnic status has a large impact on number of days absent because the -9.11 slope indicates that non aboriginal students miss 9 more days than aboriginal students. The 3.1 slope for sex indicates that boys miss 3 more days on average then girls. The 2.15 slope for learner status indicates that slow learners miss 2 more days on average than average learners.

(c)

The residual is:

2 - (18.93 - 9.11*0 + 3.1*1 + 2.15*1)

## [1] -22.18

(d)

R-squared is:

1-(240.57/264.17)

## [1] 0.08933641

Adjusted R-squared is:

1-((240.57/(146-3-1))/(264.17/(146-1)))

## [1] 0.07009704

8.6 (a)

95% confidence interval:

0.34+(1.96*0.13)

## [1] 0.5948

0.34-(1.96*0.13)

## [1] 0.0852

(0.09, 0.59). We are 95% confident that each inch of tree height on average contributes 0.09 to 0.59 cubic feet in tree volume when controlling for the other variables in the model.

(b)

The model estimates that this tree would have a volume of:

-57.99 + 0.34*79 + 4.71*11.3

## [1] 22.093

(-57.99 + 0.34*79 + 4.71*11.3) - 24.2

## [1] -2.107

The model underestimates the volume of the tree by 2.107 cubic feet.

8.8

The learner status variable should be removed first because the adjusted R-squared is larger without that variable when compared to the full model.

8.10

Using both the adjusted R-squared and the p-value approaches we would add the ethnicity variable first. It has the largest adjusted R-squared and the only p-value that is significant.

8.12

Since the goal of the recommendation system is to optimize the accuracy of the recommendations the adjusted R-squared approach should be used.

8.14

Nearly normal residuals: The normal probability plot shows a nearly normal distribution of the residuals, but has some large outliers on the lower tail.

Constant variability of residuals: The plot of the residuals versus the fitted values has some structure in the middle but overall the variance of the residuals does not appear to be constant.

Independent residuals: The plot of the residuals in the order of their collection has a random scatter, so there is no apparent structure that would indicate a problem.

Linear relationships between the response variable and numerical explanatory variables: The residuals vs ID are randomly distributed around 0 with the exception of some outliers. The residuals vs gender do appear to be randomly distributed around 0.

Each of the conditions, with the exception of independence, lead to concerns about the data so a regression model may not be most appropriate.

8.18 (a)

p-51

exp(11.663 - 0.2162*51)/(1+exp(11.663 - 0.2162*51))

## [1] 0.6540297

p-53

exp(11.663 - 0.2162*53)/(1+exp(11.663 - 0.2162*53))

## [1] 0.5509228

p-55

exp(11.663 - 0.2162*55)/(1+exp(11.663 - 0.2162*55))

## [1] 0.4432456

(b)

Temperature<-c(51,53,55,57,59,61,63,65,67,69,71)
Probability.Damage<-c(0.654,0.551,0.443,0.341,0.251,0.179,0.124,0.084,0.056,0.037,0.024)
Challenger<-data.frame(cbind(Temperature,Probability.Damage))

Challenger

##    Temperature Probability.Damage
## 1           51              0.654
## 2           53              0.551
## 3           55              0.443
## 4           57              0.341
## 5           59              0.251
## 6           61              0.179
## 7           63              0.124
## 8           65              0.084
## 9           67              0.056
## 10          69              0.037
## 11          71              0.024

require(ggplot2)

## Loading required package: ggplot2

ggplot(Challenger, aes(y = Probability.Damage, x = Temperature)) + geom_point() + geom_smooth(method = "glm")

Week 8 HW

Greg Adelsberger

3/10/2018

8.2

(a)

Birth weight of baby = 120.07 - 1.93 * parity

(b)

The negative slope indicates that if the baby is not the first born then their birth weight will be lower. First born birth weight is 120.07 ounces and non-first born is 118.14 ounces.

(c)

The p-value for the parity variable is 0.1052 so there is not a statistically significant relationship.

8.4

(a)

Number of days absent = 18.93 - 9.11 * ethnicity + 3.10 * sex + 2.15 * learner status

(b)

(c)

The residual is:

(d)

R-squared is:

Adjusted R-squared is:

8.6

(a)

95% confidence interval:

(0.09, 0.59). We are 95% confident that each inch of tree height on average contributes 0.09 to 0.59 cubic feet in tree volume when controlling for the other variables in the model.

(b)

The model estimates that this tree would have a volume of:

The model underestimates the volume of the tree by 2.107 cubic feet.

8.8

The learner status variable should be removed first because the adjusted R-squared is larger without that variable when compared to the full model.

8.10

Using both the adjusted R-squared and the p-value approaches we would add the ethnicity variable first. It has the largest adjusted R-squared and the only p-value that is significant.

8.12

Since the goal of the recommendation system is to optimize the accuracy of the recommendations the adjusted R-squared approach should be used.

8.14

Nearly normal residuals: The normal probability plot shows a nearly normal distribution of the residuals, but has some large outliers on the lower tail.

Constant variability of residuals: The plot of the residuals versus the fitted values has some structure in the middle but overall the variance of the residuals does not appear to be constant.

Independent residuals: The plot of the residuals in the order of their collection has a random scatter, so there is no apparent structure that would indicate a problem.

Linear relationships between the response variable and numerical explanatory variables: The residuals vs ID are randomly distributed around 0 with the exception of some outliers. The residuals vs gender do appear to be randomly distributed around 0.

Each of the conditions, with the exception of independence, lead to concerns about the data so a regression model may not be most appropriate.

8.18

(a)

p-51

p-53

p-55

(b)

(c)