The Ohio State University. Applied Regression Analysis by Stanley Lemeshow, PhD.

The following table gives the systolic blood pressure (SBP), body size (QUET), age (AGE), and smoking history (SMK = 0 if nonsmoker, SMK = 1 if a current or previous smoker) for a hypothetical sample of 32 white males over 40 years old from the town of Angina:

##    Person SBP  QUET AGE SMK
## 1       1 135 2.876  45   0
## 2       2 122 3.251  41   0
## 3       3 130 3.100  49   0
## 4       4 148 3.768  52   0
## 5       5 146 2.979  54   1
## 6       6 129 2.790  47   1
## 7       7 162 3.668  60   1
## 8       8 160 3.612  48   1
## 9       9 144 2.368  44   1
## 10     10 180 4.637  64   1
## 11     11 166 3.877  59   1
## 12     12 138 4.032  51   1
## 13     13 152 4.116  64   0
## 14     14 138 3.673  56   0
## 15     15 140 3.562  54   1
## 16     16 134 2.998  50   1
## 17     17 145 3.360  49   1
## 18     18 142 3.024  46   1
## 19     19 135 3.171  57   0
## 20     20 142 3.401  56   0
## 21     21 150 3.628  56   1
## 22     22 144 3.751  58   0
## 23     23 137 3.296  53   0
## 24     24 132 3.210  50   0
## 25     25 149 3.301  54   1
## 26     26 132 3.017  48   1
## 27     27 120 2.789  43   0
## 28     28 126 2.956  43   1
## 29     29 161 3.800  63   0
## 30     30 170 4.132  63   1
## 31     31 152 3.962  62   0
## 32     32 164 4.010  65   0

Generate scatter diagrams for each of the following variable pairs:

Plot 1:

Plot 2:

Plot 3:

Plot 4:

Comparing Blood Pressure with Smoking History

1. Determine the least-squares estimates of slope \(\beta_1\) and intercept \(\beta_0\) for the straight-line regression of SBP (Y) on SMK (X).

## 
## Call:
## lm(formula = SBP ~ SMK, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.824  -9.056  -2.812  11.200  32.176 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  140.800      3.661  38.454   <2e-16 ***
## SMK            7.024      5.023   1.398    0.172    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.18 on 30 degrees of freedom
## Multiple R-squared:  0.06117,    Adjusted R-squared:  0.02988 
## F-statistic: 1.955 on 1 and 30 DF,  p-value: 0.1723

\(\hat \beta_0\) (intercept) is 140.800

\(\hat \beta_1\) (slope) is 7.024

2. Compare the value of \(\hat \beta_0\) with the mean SBP for nonsmokers. Compare the value of \(\hat \beta_0\) + \(\hat \beta_1\) with the mean SBP for smokers.

Mean systolic blood pressure for nonsmokers:

## [1] 140.8

Mean systolic blood pressure for nonsmokers is equal to the intercept (\(\hat \beta_0\)) because smoking history (SMK) is a factor variable with two values (zero and one) and for nonsmokers it is equal to 0 (intercept is the value of dependent variable (SBP) while independent variable is equal to zero).

Mean systolic blood pressure for smokers:

## [1] 147.824

\(\hat \beta_0\) + \(\hat \beta_1\):

##         
## 147.824

The value of \(\hat \beta_0\) + \(\hat \beta_1\) is equal to the mean systolic blood pressure for smokers because \(\hat \beta_1\) shows increase in dependent variable (SBP) while we increase independent variable (SMK) on one unit (smoking history (SMK) for smokers is equal to 1).

3. Test the hypothesis that the true slope \(\beta_1\) is 0.

\(H_0: \beta_1 = 0\)

\(H_A: \beta_1 \neq 0\)

n (number of observations) = 32

\(\alpha\) = 0.05

Confidence interval for \(\beta_1\):

## [1] "-2.0423 : 2.0423"

Test statistic:

##        
## 1.3981

Based on our test statistic we:

## [1] "failed to reject the null hypothesis"

Probability:

##        
## 0.1723

Plot:

4. Is the test in question (3) equivalent to the usual two-sample t test for the equality of two population means assuming equal but unknown variances?

Yes, both tests are equivalent to each other:

## 
##  Two Sample t-test
## 
## data:  data$SBP[data$SMK == 1] and data$SBP[data$SMK == 0]
## t = 1.3981, df = 30, p-value = 0.1723
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.235823 17.282882
## sample estimates:
## mean of x mean of y 
##  147.8235  140.8000

Comparing Blood Pressure with Body Size

1. Determine the least-squares estimates of slope \(\beta_1\) and intercept \(\beta_0\) for the straight-line regression of SBP (Y) on QUET (X).

## 
## Call:
## lm(formula = SBP ~ QUET, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.231  -7.145  -1.604   7.798  22.531 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   70.576     12.322   5.728 2.99e-06 ***
## QUET          21.492      3.545   6.062 1.17e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.812 on 30 degrees of freedom
## Multiple R-squared:  0.5506, Adjusted R-squared:  0.5356 
## F-statistic: 36.75 on 1 and 30 DF,  p-value: 1.172e-06

\(\hat \beta_0\) (intercept) is 70.576

\(\hat \beta_1\) (slope) is 21.492

2. Test the hypothesis of zero slope.

\(H_0\): \(\beta_1\) = 0

\(H_A\): \(\beta_1\) \(\neq\) 0

n (number of observations) = 32

\(\alpha\) = 0.05

Confidence interval for \(\beta_1\):

## [1] "-2.0423 : 2.0423"

Test statistic:

##        
## 6.0623

Based on our test statistic we:

## [1] "reject the null hypothesis"

Probability:

##            
## 0.00000117

Plot:

3. Find a 95% confidence interval for \(\mu_{y|\bar x}\).

## [1] "140.989 : 148.073"

4. Calculate 95% prediction bands.

5. Based on the above, would you conclude that blood pressure increases as body size increases?

Yes.

6. Are any of the assumptions for straight-line regression clearly not satisfied in this example?

All assumptions: linear relationship, homoskedasticity (equal variances) and normality of residuals are satisfied: