require("ggplot2")
## Loading required package: ggplot2

### 8.2 Baby Weights

##### a)Write the equation of the regression line.

Wegiht = 120.07 - 1.93(Parity)

##### b) Interpret the slope in this context, and calculate the predicted birth weight of first borns and others.

The slope of -1.93 means that if the baby is not a first born, it will be, on average, 1.93 ounces less than a first born.

According to this model, a first born baby weighs, on average, 120.07 ounces.

According to this model, non-first born baby weighs, on average, 118.14 ounces.

##### c) Is there a statistically significant relationship between the average birth weight and parity?

Because the p-value is greater than .05 and the t is less than 2 (greater than -2), I would say there is not a statistically significant relationship between the average birth weight and parity.

### 8.4 Absenteeism

##### a) Write the equation of the regression line.

Days = 18.93 -9.11(eth) + 3.10(sex) + 2.15(lrn)

##### b)Interpret each one of the slopes in this context.

According to this model, if you are not aboriginal, your number of days absent will be 9.11 less on average.

According to this model, if you are male (rather than female), your number of days absent will be 3.10 more on average.

According to this model, if you are a slow learner, your number of days absent will be 2.15 more on average.

##### c) Calculate the residual for the first observation in the data set: a student who is aboriginal,male, a slow learner, and missed 2 days of school.

Days(predicted) = 18.93 -9.11(0) + 3.10(1) + 2.15(1)

Days(predicted) = 24.18

Days(actual) = 2

residual = 24.18 - 2 = 22.18

##### d)The variance of the residuals is 240.57, and the variance of the number of absent days for all students in the data set is 264.17. Calculate the R2 and the adjusted R2. Note that there are 146 observations in the data set.
R2 = 1 - (240.57)/(264.17)
R2
## [1] 0.08933641
AdjR2 = 1 - (240.57/264.17)*((146-1)/(146-3-1))
AdjR2
## [1] 0.07009704

### 8.6 Cherry Trees

##### a) Calculate a 95% confidence interval for the coefficient of height, and interpret it in the context of the data.

Confidence Interval at the 95% Level alpha = .05 t = 2.05

CI = (b1 - t * se, b1 + t * se)

Lower = .34 - 2.05 * .13
Lower
## [1] 0.0735
Upper = .34 + 2.05 * .13
Upper
## [1] 0.6065

CI = (0.0735, 0.6065)

We are 95% confident that the coefficient of height will fall between .0735 and .06065.

##### b)One tree in this sample is 79 feet tall, has a diameter of 11.3 inches, and is 24.2 cubic feet in volume. Determine if the model overestimates or underestimates the volume of this tree, and by how much.

Volume(predicted) = -57.99 + .34(height) + 4.71(diameter)

 Volume = -57.99 + .34*(79) + 4.71*(11.3)
Volume
## [1] 22.093

Volume(predicted) = 22.093

Volume(actual) = 24.2

The model slightly underestimated the volume of this tree, only by 2.107

### 8.8 Absenteeism

##### Which, if any, variable should be removed from the model first?

The learner status variable should be removed because we get a better adjusted R2 value.

### 8.10 Ansenteeism

##### Exercise 8.4 provides regression output for the full model, including all explanatory variables available in the data set, for predicting the number of days absent from school. In this exercise we consider a forward-selection algorithm and add variables to the model one-at-a-time. The table below shows the p-value and adjusted R2 of each model where we include only the corresponding predictor. Based on this table, which variable should be added to the model first?

Ethnicity should be added to the model first due to the adjusted R2 value and the p-value below .05, meaning it is statistically significant.

### 8.14 GPA and IQ

##### A regression model for predicting GPA from gender and IQ was fit, and both predictors were found to be statistically significant. Using the plots given below, determine if this regression model is appropriate for these data.

Looking at the plots, the residuals seem mostly normal, and follow the regression line very well. It seems like the regression should have a strong R2 value. There are just a few lower valus that do not follow the regression line. It is also good to note that there does not seem to be correlation between the residuals and fitted values.The regression model looks appropriate for these data.

### 8.18 Challenger Disaster

##### ^p65 = 0.084 ^p67 = 0.056 ^p69 = 0.037 ^p71 = 0.024
Temperature1 = 51

Damage1 = 11.6630 - 0.2162 * Temperature1
P1 = exp(Damage1) / (1 + exp(Damage1))

P1
## [1] 0.6540297
Temperature2 = 53

Damage2 = 11.6630 - 0.2162 * Temperature2
P2 = exp(Damage2) / (1 + exp(Damage2))

P2
## [1] 0.5509228
Temperature3 = 55

Damage3 = 11.6630 - 0.2162 * Temperature3
P3 = exp(Damage3) / (1 + exp(Damage3))

P3
## [1] 0.4432456
##### b)Add the model-estimated probabilities from part (a) on the plot, then connect these dots using a smooth curve to represent the model-estimated probabilities.
Temperature <- c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)

Damaged <- c(5,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0)

Undamaged <- c(1,5,5,5,6,6,6,6,6,6,5,6,5,6,6,6,6,5,6,6,6,6,6)

Challenger <- data.frame(Temperature, Damaged, Undamaged)
library(ggplot2)
ggplot(Challenger,aes(x=Temperature,y=Damaged)) + geom_point() +
stat_smooth(method = 'glm', family = 'binomial')
## Warning: Ignoring unknown parameters: family

Temp <- seq(from = 51, to = 71, by = 2)
Prob <- c(P1, P2, P3, 0.341, 0.251, 0.179, 0.124, 0.084, 0.056, 0.037, 0.024)
plot1 = plot(Temp, Prob, type = "o", col = "red")

##### c)Describe any concerns you may have regarding applying logistic regression in this application, and note any assumptions that are required to accept the model’s validity.

In order to apply logistic regression in this applicatin:

Each predictor xi must be linearly related to logit(pi) if all other predictors are held constant.

This condition is tough to varify.

The second condition is:

Each outcome Yi is independent of the other outcomes.

Each launch should be independent of the others, therefore this condition is met.