\[ \hat{y} = \hat\beta_0 + (\hat\beta_1 \cdot x_1)+ (\hat\beta_2 \cdot x_2) + (\hat\beta_3 \cdot x_3) \] or \[ \hat{heartrate} = \hat\beta_0 + (\hat\ Rest \cdot x_1)+ (\hat\ Wgt \cdot x_2) + (\hat\ Exc\cdot x_3) \]
Null hypothesis (H0): Weight (\(\hat\beta_2\)) is not a significant predictor in the model i.e. \(\beta_2= 0\)
Alternate hypothesis (H1): Weight (\(\hat\beta_2\)) is a significant predictor in the model i.e. \(\beta_2 \ne 0\)
The P-value for the Wgt coefficient is 0.282 which is greater than the significance level of 5% (0.05). Thus, the alternative hypothesis is rejected in favor of the null. This suggests that weight is not a significant predictor of heart rate in the model.
Additionally, the large t-statistic (1.63) also suggests that Wgt is not significantly different from 0.
In a previous problem set, you modeled the number of calories in a serving of breakfast cereal as a function of the number of grams of sugar in the cereal. In this problem, you’ll revisit the breakfast cereal data, this time with a multiple regression model that which predicts the number of calories in a serving of breakfast cereal as a function of grams of sugar and the grams of fiber per serving.
##
## Call:
## lm(formula = Calories ~ Sugar + Fiber, data = Cereal)
##
## Coefficients:
## (Intercept) Sugar Fiber
## 109.308 1.005 -3.744
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| Sugar | 1 | 4567.222 | 4567.2222 | 19.21640 | 0.0001119 |
| Fiber | 1 | 4782.973 | 4782.9735 | 20.12416 | 0.0000831 |
| Residuals | 33 | 7843.214 | 237.6732 | NA | NA |
## [1] 0.5823133
R-squared formula = \(\ SS_Model / SS_Total\) => \(4567.2/7843.2\)
Hence, R-squared = 0.5823
Thus, 58.2% of the variation in cereal calories can be explained by the regression model based on sugar and fiber.
## [1] 15.41752
Standard Error formula = root(Mean Sq Error) => $ $
Hence, SE = 15.417
# Calculate F-statistic:
MS_Model <- (4567.2 + 4783.0)/2
F_stat <- MS_Model/237.7
pf(F_stat, df1 = 2, df2 = 33, lower.tail = FALSE)## [1] 2.377482e-06
Thus, the p value of the F statistic = 2.377482e-06
The p value suggests that there is strong evidence to reject the null hypothesis at a significance level of 99% (since 2.377482e-06 < 0.01). The explanatoryvariables in this model (Sugar and Fiber) remove significant variability from the model, and have significant relationships with the outcome i.e. Calories.
# Simple linear regression model:
cereal_model <- lm(formula = Calories ~ Sugar, data = Cereal)
cereal_model##
## Call:
## lm(formula = Calories ~ Sugar, data = Cereal)
##
## Coefficients:
## (Intercept) Sugar
## 87.428 2.481
In the simple linear regression model, as the amount of sugar increases by 1 gram, estimated calories increase by 2.481 units.
On the other hand, the multiple regression model estimates that as the amount of sugar increases by 1 gram, estimated calories increase by 1.005 units while holding the other explanatory variables (i.e. fiber) constant at all values.
The estimated effect of sugar is positive in both models. However, in the multiple regression model, when fiber is also taken into account, the magnitude of sugar’s effect decreases from 2.48 to 1.005. This implies that fiber and sugar may be confounded i.e. there may be high fiber cereal that with sugar. Additionally, 1 additional unit of fiber results in a 3.74 decrease in calories, holding other explanatory variables constant at any given value. Thus the impact of fiber improves overall understanding of the model.