The Ohio State University. Applied Regression Analysis by Stanley Lemeshow, PhD.
The following table gives the systolic blood pressure (SBP), body size (QUET), age (AGE), and smoking history (SMK = 0 if nonsmoker, SMK = 1 if a current or previous smoker) for a hypothetical sample of 32 white males over 40 years old from the town of Angina:
## Person SBP QUET AGE SMK
## 1 1 135 2.876 45 0
## 2 2 122 3.251 41 0
## 3 3 130 3.100 49 0
## 4 4 148 3.768 52 0
## 5 5 146 2.979 54 1
## 6 6 129 2.790 47 1
## 7 7 162 3.668 60 1
## 8 8 160 3.612 48 1
## 9 9 144 2.368 44 1
## 10 10 180 4.637 64 1
## 11 11 166 3.877 59 1
## 12 12 138 4.032 51 1
## 13 13 152 4.116 64 0
## 14 14 138 3.673 56 0
## 15 15 140 3.562 54 1
## 16 16 134 2.998 50 1
## 17 17 145 3.360 49 1
## 18 18 142 3.024 46 1
## 19 19 135 3.171 57 0
## 20 20 142 3.401 56 0
## 21 21 150 3.628 56 1
## 22 22 144 3.751 58 0
## 23 23 137 3.296 53 0
## 24 24 132 3.210 50 0
## 25 25 149 3.301 54 1
## 26 26 132 3.017 48 1
## 27 27 120 2.789 43 0
## 28 28 126 2.956 43 1
## 29 29 161 3.800 63 0
## 30 30 170 4.132 63 1
## 31 31 152 3.962 62 0
## 32 32 164 4.010 65 0
##
## Call:
## lm(formula = SBP ~ SMK, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.824 -9.056 -2.812 11.200 32.176
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 140.800 3.661 38.454 <2e-16 ***
## SMK 7.024 5.023 1.398 0.172
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.18 on 30 degrees of freedom
## Multiple R-squared: 0.06117, Adjusted R-squared: 0.02988
## F-statistic: 1.955 on 1 and 30 DF, p-value: 0.1723
\(\hat \beta_0\) (intercept) is 140.800
\(\hat \beta_1\) (slope) is 7.024
Mean systolic blood pressure for nonsmokers:
## [1] 140.8
Mean systolic blood pressure for nonsmokers is equal to the intercept (\(\hat \beta_0\)) because smoking history (SMK) is a factor variable with two values (zero and one) and for nonsmokers it is equal to 0 (intercept is the value of dependent variable (SBP) while independent variable is equal to zero).
Mean systolic blood pressure for smokers:
## [1] 147.824
\(\hat \beta_0\) + \(\hat \beta_1\):
##
## 147.824
The value of \(\hat \beta_0\) + \(\hat \beta_1\) is equal to the mean systolic blood pressure for smokers because \(\hat \beta_1\) shows increase in dependent variable (SBP) while we increase independent variable (SMK) on one unit (smoking history (SMK) for smokers is equal to 1).
\(H_0: \beta_1 = 0\)
\(H_A: \beta_1 \neq 0\)
n (number of observations) = 32
\(\alpha\) = 0.05
Confidence interval for \(\beta_1\):
## [1] "-2.0423 : 2.0423"
Test statistic:
##
## 1.3981
Based on our test statistic we:
## [1] "failed to reject the null hypothesis"
Probability:
##
## 0.1723
Plot:
Yes, both tests are equivalent to each other:
##
## Two Sample t-test
##
## data: data$SBP[data$SMK == 1] and data$SBP[data$SMK == 0]
## t = 1.3981, df = 30, p-value = 0.1723
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.235823 17.282882
## sample estimates:
## mean of x mean of y
## 147.8235 140.8000
##
## Call:
## lm(formula = SBP ~ QUET, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.231 -7.145 -1.604 7.798 22.531
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 70.576 12.322 5.728 2.99e-06 ***
## QUET 21.492 3.545 6.062 1.17e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.812 on 30 degrees of freedom
## Multiple R-squared: 0.5506, Adjusted R-squared: 0.5356
## F-statistic: 36.75 on 1 and 30 DF, p-value: 1.172e-06
\(\hat \beta_0\) (intercept) is 70.576
\(\hat \beta_1\) (slope) is 21.492
\(H_0\): \(\beta_1\) = 0
\(H_A\): \(\beta_1\) \(\neq\) 0
n (number of observations) = 32
\(\alpha\) = 0.05
Confidence interval for \(\beta_1\):
## [1] "-2.0423 : 2.0423"
Test statistic:
##
## 6.0623
Based on our test statistic we:
## [1] "reject the null hypothesis"
Probability:
##
## 0.00000117
Plot:
## [1] "140.989 : 148.073"
Yes.
All assumptions: linear relationship, homoskedasticity (equal variances) and normality of residuals are satisfied: