DATA 606 - Presentation (8.3)

N Nedd

May 4, 2017

Question

Read data

url <- "https://raw.githubusercontent.com/jbryer/DATA606Fall2016/master/Data/Data%20from%20openintro.org/Ch%208%20Exercise%20Data/babies.csv"
babies <- read.csv(url)

The equation of the regression line is:

\[\hat{bwt} = -80.41 + 0.44 * gestation - 3.33 * parity - 0.01 * age + 1.15 * height + 0.05 * weight - 8.40 * smoke\]

Gestation has a positive relationship with the baby’s birth weight. A value of 0.44 means that it is predicted that there will be a 0.44 ounce increase in baby weight for each additional day the baby gestates.

Conversely, age has a negative relationship with the baby’s birth weight. A value of 0.01 means that it is predicted that there will be a decrease in birth weight of 0.01 ounce as the mother’s age increases by one year.

In exercise 8.2, the coefficient for parity was -1.93. In this exercise, the coefficient is -3.33.

This can be explained by the presence of the correlation between parity and one of the other variables examined in this exercise.

The residual can be calculated as follows:

predict1 <- -80.41 + 0.44 * babies$gestation[1] -3.33 * babies$parity[1] - 0.01 * babies$age[1] + 1.15 * babies$height[1] + 0.05 * babies$weight[1] - 8.40 * babies$smoke[1]

residual1 <- babies$bwt[1] - predict1

The prediction for the first observation is 120.58. The actual value is 120. Therefore the residual is -0.58.

size <- nrow(babies)
VarTotal <- 332.57
VarError <- 249.28
#Var = SS/Size-1.  Therefore SS = Var * (size - 1)
SSTotal <- VarTotal * size -1
SSError <- VarError * size -1


Rsquared <- (SSTotal - SSError)/SSTotal

Rsquaredadj <- 1 - ((SSError/SSTotal) * ((size -1)/(size - 6 - 1)))

\(R^{2}\) = 0.2504

\(R^{2}_{adj}\) = 0.2468