The Data

The data gives the systolic blood pressure (SBP), body size (QUET), age (AGE), and smoking history (SMK = 0 if nonsmoker, SMK = 1 if a current or previous smoker) for a hypothetical sample of 32 white males over 40 years old from the town of Angina.

data = read.csv("week2-HW-data.csv", header = T, sep = ",", row.names = 1)
attach(data)

Exercise One:

Determine the ANOVA tables for the following regressions:

  1. SBP (Y) on SMK (X)
  2. SBP (Y) on QUET (X)
  3. QUET (Y) on AGE (X)
  4. SBP (Y) on AGE (X)

Solving

  1. SBP (Y) on SMK (X):
anova(lm( SBP ~ SMK))
## Analysis of Variance Table
## 
## Response: SBP
##           Df Sum Sq Mean Sq F value Pr(>F)
## SMK        1  393.1   393.1  1.9548 0.1723
## Residuals 30 6032.9   201.1
  1. SBP (Y) on QUET (X):
anova(lm( SBP ~ QUET))
## Analysis of Variance Table
## 
## Response: SBP
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## QUET       1 3537.9  3537.9  36.751 1.172e-06 ***
## Residuals 30 2888.0    96.3                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. QUET (Y) on AGE (X):
anova(lm( QUET ~ AGE))
## Analysis of Variance Table
## 
## Response: QUET
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## AGE        1 4.9360  4.9360  54.367 3.253e-08 ***
## Residuals 30 2.7237  0.0908                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. SBP (Y) on AGE (X)
anova(lm( SBP ~ AGE))
## Analysis of Variance Table
## 
## Response: SBP
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## AGE        1 3861.6  3861.6  45.177 1.894e-07 ***
## Residuals 30 2564.3    85.5                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Exercise Two:

Use the ANOVA tables to perform the F-test for the significance of each straight-line regression.

Solving

For each table, we are testing if the independent variable contributes significantly to the model, this is, if the model fits better than the naive model.

If we take a confidence level of 5%, for:

  1. SBP (Y) on SMK (X) we haven’t evidence for reject the null hypothesis (\(\beta_1 = 0\)), because the probability of error in rejecting the true null hypothesis would be 17.23% (higher than 5% previously admitted).

  2. SBP (Y) on QUET (X) we can reject the null hypothesis (\(\beta_1 = 0\)), because the probability of error in rejecting the true null hypothesis would be 0% (less than 5% previously admitted).

  3. QUET (Y) on AGE (X) we can reject the null hypothesis (\(\beta_1 = 0\)), because the probability of error in rejecting the true null hypothesis would be 0% (less than 5% previously admitted).

  4. SBP (Y) on AGE (X) we can reject the null hypothesis (\(\beta_1 = 0\)), because the probability of error in rejecting the true null hypothesis would be 0% (less than 5% previously admitted).

Exercise Three:

Interpret your results.

Solving

If we consider a confidence level of 5%, we can conclude: - some variation of the SBP can be explained by QUET and AGE on linear regression; - there is interaction (linear) between QUET and AGE; - though not too clear here (not linear association), but by homework 2, we can conclude that the smoke can influence on SBP.

detach(data)