Multiple Regression

Anil Akyildirim

11/18/2019

Intro to Multiple Regression

Baby weights, Part III. (9.3, P. 351). A more realistic approach to modeling infant weights is to consider all possibly related variables at once. Other variables of interest include length of pregnancy in days (gestation), mother’s age in years(age), mother’s height in inches(height), and mother’s pregnancy weight in piunts(weight). Below are three observations from this data set.

The summary table below shows the results of a regression model for predicting the average birth weight of babies based on all of the variables included in the data set.

  1. Write the equation of the regression model that includes all of the variables.
  2. Interpret the slopes of gestation and age in this context.
  3. Calculate the residual for the first observation in the data set.
  4. The variance residual is 249.28, and the variance of the birth weights of all babies in the data set is 332.57. Calculate the \(R^2\) . Note that there are 1,236 observations in the data set.

Equation of the Regression Model

\(Birth weight = \beta_{0} + \beta_{1} * gestation + \beta_{2}*parity + \beta_{3}*age + \beta_4{height} + \beta_{5}*weight + \beta_{6}*smoke\)

\(Birth weight = -80.41 + 0.44*gestation - 3.33*parity - 0.01*age +1.15*height + 0.05*weight -8.40*smoke\)

Interpretation of Slopes

Gestation: All else held constant, babies weigh 0.44 ounce more for each additional day of pregnancy.

Parity: All else held constant, first born babies weight 3.33 ounce less than other babies.

Age: All else held constant, babies weigh 0.01 ounce less for each additional year in mother’s age.

Height: All else held constant, babies weigh 1.15 ounce more for each additional inch in mother’s height.

Weight: All else held constant, babies weigh 0.05 ounce more for each additional lbs in mother’s weight.

Smoke: All else held constant, non smoker mother’s babies weigh 8.40 less than smoker babies.

Intercept: Babies that are not first born, had 0 length of pregnancy, 0 years old, 0 inches tall, 0 lbs and non smoker weigh average 80.41. Obviously, this does not make sense in context.

Calculating Residuals

Residuals is the difference between the observed and predicted values of Birth weight.

\(Birth Weight=-80.41+(0.44*284)-(3.33*0)-(0.01*27)+(1.15*62)+(0.05*100)-(8.40*0)\)

\(=-80.41 + 124.96 - 0 -0.27+71.3+5-0\)

\(=120.58\)

The predicted baby weight based on first observation variables is 120.58

Actual baby weight for the first observation is 120.

Conclusion: The model over-predicts the birth weight by 0.58 ounces.

Caculating \(R^2\) and Adjusted \(R^2\)

Based on definition \(R^2=(explained \ variability \ in \ y)/(total \ variability \ in \ y)\)

Total Variability: Sum of Squares of y: \(SS_{Total}=332.57\)

Unexplained Variability: Sum of Squares of Residuals: \(SS_{Error}=249.28\)

Explained Variability: Sum of Squares x: \(SS_{Model}: 83.29\)

\(R^2=(83.29)/332.57= 0.25044\)

\(R^2_{adj}=1-((SS_{Error}/SS_{Total})*((n-1)/(n-p-1)))\)

n is the number of cases

p is the number of explanotary variables we used.

\(R^2_{adj}=1-((249.28/332.57)*((1236-1)/(1236-6-1))\)

\(R^2_{adj}=0.246852\)

\(R^2_{adj}\) will always be smaller than \(R^2\) as amount of explatory variables are never negative.

We choose models with higher \(R^2_{adj}\) over others.