library (HistData)
lmmodel=lm(childHeight ~ midparentHeight,data=GaltonFamilies)
summary (lmmodel)
##
## Call:
## lm(formula = childHeight ~ midparentHeight, data = GaltonFamilies)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9570 -2.6989 -0.2155 2.7961 11.6848
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.63624 4.26511 5.307 1.39e-07 ***
## midparentHeight 0.63736 0.06161 10.345 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.392 on 932 degrees of freedom
## Multiple R-squared: 0.103, Adjusted R-squared: 0.102
## F-statistic: 107 on 1 and 932 DF, p-value: < 2.2e-16
This model examines the relationship between child
height (dependent variable) and midparent
height (independent variable) using data from the
GaltonFamilies
dataset.
These statistics summarize the distribution of residuals (the differences between the observed and predicted values of child height). Most residuals are between -2.7 and 2.8, meaning the model’s predictions are generally within that range of the true values. There are some extreme residuals, as shown by the minimum (-8.96) and maximum (11.68).
Intercept (Estimate = 22.63624): This is the expected child height when the midparent height is zero. Although zero midparent height isn’t realistic, the intercept is necessary to define the regression line. In context, this means other parts of the model take over when predicting actual heights.
midparentHeight (Estimate = 0.63736): For every 1 unit increase in midparent height (in inches), the child’s height is expected to increase by 0.637 inches, on average. This positive coefficient indicates a direct relationship between the two variables.
Residual standard error = 3.392: This value represents the typical distance that the observed values fall from the regression line. In this case, the average deviation is about 3.39 inches.
Multiple R-squared = 0.103: This indicates that about 10.3% of the variation in child height can be explained by midparent height. This is a relatively low value, meaning midparent height alone doesn’t explain much of the variation in child height.
Adjusted R-squared = 0.102: Adjusted R-squared is similar but takes into account the number of predictors in the model. Since there is only one predictor, it’s nearly the same as the R-squared.
F-statistic = 107, p-value: < 2.2e-16: This indicates that the model as a whole is statistically significant, meaning that midparent height is a significant predictor of child height.
lmmodel2=lm(childHeight ~ mother+father,data=GaltonFamilies)
summary(lmmodel2)
##
## Call:
## lm(formula = childHeight ~ mother + father, data = GaltonFamilies)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.117 -2.741 -0.218 2.766 11.694
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.64328 4.26213 5.313 1.35e-07 ***
## mother 0.29051 0.04852 5.987 3.05e-09 ***
## father 0.36828 0.04489 8.204 7.66e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.389 on 931 degrees of freedom
## Multiple R-squared: 0.1052, Adjusted R-squared: 0.1033
## F-statistic: 54.74 on 2 and 931 DF, p-value: < 2.2e-16
This model examines the relationship between child height (dependent variable) and both mother’s height and father’s height (independent variables).
These statistics summarize the distribution of residuals (the differences between the observed and predicted child heights). The residuals show a similar pattern as in the first model, with most falling within a typical range of -2.7 to 2.8, but some extreme values reaching nearly -9.1 and +11.7.
Intercept (Estimate = 22.64328): This is the expected child height when both mother’s and father’s heights are zero, which isn’t meaningful in this context but helps define the regression line. It represents the base value from which the heights of the parents influence the prediction.
Mother (Estimate = 0.29051): For every 1 unit (inch) increase in the mother’s height, the child’s height is expected to increase by about 0.29 inches, on average. This is a positive relationship, but the mother’s influence on the child’s height is smaller compared to the father’s.
Father (Estimate = 0.36828): For every 1 unit (inch) increase in the father’s height, the child’s height is expected to increase by about 0.37 inches, on average. The father’s height has a slightly stronger effect than the mother’s height.
Residual standard error = 3.389: This means the average deviation of the observed child heights from the model’s predicted values is about 3.39 inches, similar to the previous model.
Multiple R-squared = 0.1052: This indicates that about 10.52% of the variation in child height can be explained by the heights of the mother and father. This is a slight improvement over the model that only used midparent height (which had an R-squared of 10.3%).
Adjusted R-squared = 0.1033: Adjusted R-squared is slightly lower than the R-squared and accounts for the number of predictors. It still suggests that the model has a low explanatory power.
F-statistic = 54.74, p-value: < 2.2e-16: The overall F-statistic is highly significant, meaning that the model, as a whole, is a good fit for the data. The combined effect of mother and father’s heights significantly predicts child height.
lmmodel3=lm(childHeight ~ gender,data=GaltonFamilies)
summary (lmmodel3)
##
## Call:
## lm(formula = childHeight ~ gender, data = GaltonFamilies)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.234 -1.604 -0.104 1.766 9.766
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 64.1040 0.1173 546.32 <2e-16 ***
## gendermale 5.1301 0.1635 31.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.497 on 932 degrees of freedom
## Multiple R-squared: 0.5137, Adjusted R-squared: 0.5132
## F-statistic: 984.4 on 1 and 932 DF, p-value: < 2.2e-16
This model examines the relationship between child height (dependent variable) and gender (independent variable), where the gender variable is likely coded as a factor with categories such as “male” and “female.”
The residuals summarize the differences between the observed and predicted child heights. The spread of the residuals is tighter than in the previous models, with most residuals falling between -1.6 and 1.77. This suggests the model fits the data better than the previous ones.
Intercept (Estimate = 64.1040): This is the predicted height for females (since “female” is typically the reference level for gender). On average, the expected height for a female child is about 64.1 inches.
gendermale (Estimate = 5.1301): The coefficient for “male” indicates that male children are, on average, about 5.13 inches taller than female children. This significant positive coefficient shows that gender has a strong effect on child height, with males being taller on average.
Residual standard error = 2.497: This represents the typical deviation of observed child heights from the predicted values, and at 2.497 inches, this model is more accurate than the previous models with higher residual errors (~3.39).
Multiple R-squared = 0.5137: This suggests that 51.37% of the variation in child height can be explained by gender. This is a substantial improvement compared to the previous models (which explained around 10% of the variation).
Adjusted R-squared = 0.5132: Adjusted R-squared is very close to the R-squared, indicating that adding gender as the sole predictor explains a large proportion of the variability in child height.
F-statistic = 984.4, p-value: < 2.2e-16: The F-statistic is very high, indicating that the model as a whole is highly significant. Gender is a strong predictor of child height in this dataset.
lmmodel4a <-lm(childHeight ~ midparentHeight+ gender,data=GaltonFamilies)#separatly
summary (lmmodel4a)
##
## Call:
## lm(formula = childHeight ~ midparentHeight + gender, data = GaltonFamilies)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5317 -1.4600 0.0979 1.4566 9.1110
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.51410 2.73392 6.04 2.22e-09 ***
## midparentHeight 0.68702 0.03944 17.42 < 2e-16 ***
## gendermale 5.21511 0.14216 36.69 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.17 on 931 degrees of freedom
## Multiple R-squared: 0.6332, Adjusted R-squared: 0.6324
## F-statistic: 803.6 on 2 and 931 DF, p-value: < 2.2e-16
This model examines the relationship between child height (dependent variable) and two predictors: midparent height (the average height of the parents) and gender.
The residuals indicate that most of the errors between predicted and actual child heights are fairly small, with 50% of the residuals falling between -1.46 and 1.46 inches. The spread of residuals is much tighter compared to the previous models, suggesting a better fit.
Intercept (Estimate = 16.51410): This is the predicted child height when both midparent height is zero and the child is female (since gender is a binary variable and female is usually the reference level). While zero midparent height isn’t realistic, this intercept helps define the regression line.
midparentHeight (Estimate = 0.68702): For every 1 inch increase in the average height of the parents (midparent height), the child’s height is expected to increase by about 0.69 inches, on average. This coefficient indicates a positive relationship between midparent height and child height, similar to the earlier model that used only midparent height.
gendermale (Estimate = 5.21511): Being male is associated with a predicted increase in height of about 5.22 inches compared to female children. This is a very strong predictor, suggesting that gender significantly affects child height, with males being taller on average.
Residual standard error = 2.17: This is the typical deviation between the observed and predicted child heights, indicating a smaller error than in the previous models (which had residual errors around 2.5 to 3.4 inches). The lower error indicates that this model fits the data better.
Multiple R-squared = 0.6332: This means that 63.32% of the variation in child height can be explained by midparent height and gender together. This is a substantial improvement over models that used only one of these predictors.
Adjusted R-squared = 0.6324: The adjusted R-squared is very close to the R-squared, confirming that both predictors are meaningful, and the model isn’t overfitting.
F-statistic = 803.6, p-value: < 2.2e-16: The F-statistic is extremely high, indicating that the model as a whole is statistically significant. The p-value confirms that midparent height and gender are both very strong predictors of child height.
lmmodel4 <- lm(childHeight ~ midparentHeight+ gender + midparentHeight*gender,data=GaltonFamilies) #together
summary (lmmodel4)
##
## Call:
## lm(formula = childHeight ~ midparentHeight + gender + midparentHeight *
## gender, data = GaltonFamilies)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5431 -1.4568 0.0769 1.4795 9.0860
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.33348 3.86636 4.742 2.45e-06 ***
## midparentHeight 0.66075 0.05580 11.842 < 2e-16 ***
## gendermale 1.57998 5.46264 0.289 0.772
## midparentHeight:gendermale 0.05252 0.07890 0.666 0.506
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.171 on 930 degrees of freedom
## Multiple R-squared: 0.6334, Adjusted R-squared: 0.6322
## F-statistic: 535.6 on 3 and 930 DF, p-value: < 2.2e-16
This model investigates the relationship between child height (dependent variable) and the predictors midparent height (the average height of the parents) and gender, including the interaction between midparent height and gender. The interaction term allows us to examine whether the effect of midparent height on child height differs by gender.
The residuals indicate that the model fits the data reasonably well, with most errors being small (the median residual is close to zero). However, the minimum and maximum residuals show some variability, suggesting occasional larger prediction errors.
Intercept (Estimate = 18.33348): This is the predicted height for a female child (reference group) when midparent height is zero. Again, while this value is not practically meaningful, it provides a baseline for the model.
midparentHeight (Estimate = 0.66075): For every 1-inch increase in midparent height, the child’s height increases by approximately 0.66 inches, holding gender constant. This coefficient is statistically significant (p < 2e-16).
gendermale (Estimate = 1.57998): The coefficient for “gendermale” suggests that, on average, male children are about 1.58 inches taller than female children, but this effect is not statistically significant (p = 0.772). This indicates that the difference in height attributed to gender alone is not strong in the context of this model.
midparentHeight:gendermale (Estimate = 0.05252): This interaction term indicates that the effect of midparent height on child height is slightly greater for males, with an additional increase of 0.052 inches for each additional inch in midparent height. However, this effect is also not statistically significant (p = 0.506), meaning that the interaction does not provide strong evidence that the relationship between midparent height and child height differs by gender.
Residual standard error = 2.171: This indicates a similar level of accuracy in predicting child height compared to the previous model (which had a residual standard error of about 2.17 inches).
Multiple R-squared = 0.6334: The model explains 63.34% of the variability in child height, similar to the previous model that included only midparent height and gender without the interaction term.
Adjusted R-squared = 0.6322: This value is very close to the R-squared, suggesting that the model is appropriately specified with the additional interaction term not substantially improving the fit.
F-statistic = 535.6, p-value: < 2.2e-16: The F-statistic is high, indicating that the model as a whole is statistically significant, although the individual coefficients associated with gender and the interaction term do not provide significant additional predictive power.