library("Hmisc")
getHdata(FEV)
The FEV data set contains information on outcome variable forced expiratory volume (fev) and additional variables, age in years (age), height in inches (height), male or female sex (sex), and current or non-current smoking (smoker). I would expected to see an association between age and FEV, because it seems that lungs would get stronger as children age. For height, I would expect that taller people have greater lung volume and therefore have higher FEV. Sex seems like it would be associated with FEV as well, since men tend to be stronger than women. Finally, I would expect smoking to be negatively associated with FEV since smoking damages lungs.
I think that smoking and age could interact with each other to influence the outcome of FEV because people who are older might have been smoking longer and would therefore have worse lung strength. I also think that smoking and sex could also interact to influence FEV outcome, because men and women have different metabolisms and hormones. I know that women are more vulnerable to lung carcinogens than men are, so I think that women’s FEV would be more heavily influenced by smoking.
I do think that linear relationships would be sufficient to explain these relationships– the graphs show a linear trend and the variation of the points appears constant:
plot(FEV$age,FEV$fev,xlab="Age (Years)",ylab="Forced Expiratory Volume",main="Age and Forced Expiratory Volume",pch=16)
plot(FEV$height,FEV$fev,xlab="Height (Inches)",ylab="Forced Expiratory Volume",main="Height and Forced Expiratory Volume",pch=16)
qplot(sex,fev,data=FEV)
Here is the R code and output for my two best-fitting models, along with their residual plots. I do not see a trend in my model’s residuals:
mlr1<-lm(fev~age+height+sex*smoke,data=FEV)
summary(mlr1)
##
## Call:
## lm(formula = fev ~ age + height + sex * smoke, data = FEV)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.38404 -0.25547 0.00456 0.24666 1.93203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.421924 0.222835 -19.844 < 2e-16 ***
## age 0.065990 0.009465 6.972 7.73e-12 ***
## height 0.103734 0.004750 21.840 < 2e-16 ***
## sexmale 0.135409 0.034638 3.909 0.000102 ***
## smokecurrent smoker -0.183075 0.074189 -2.468 0.013857 *
## sexmale:smokecurrent smoker 0.234147 0.109605 2.136 0.033032 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4111 on 648 degrees of freedom
## Multiple R-squared: 0.7769, Adjusted R-squared: 0.7752
## F-statistic: 451.4 on 5 and 648 DF, p-value: < 2.2e-16
qplot(mlr1$residuals)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
mlr2<-lm(fev~age+height*sex,data=FEV)
summary(mlr2)
##
## Call:
## lm(formula = fev ~ age + height * sex, data = FEV)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.25462 -0.24277 0.00342 0.24299 1.85454
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.098624 0.328952 -9.420 < 2e-16 ***
## age 0.066850 0.008929 7.487 2.31e-13 ***
## height 0.081243 0.006304 12.888 < 2e-16 ***
## sexmale -1.808288 0.360662 -5.014 6.89e-07 ***
## height:sexmale 0.032418 0.005913 5.483 6.00e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4037 on 649 degrees of freedom
## Multiple R-squared: 0.7846, Adjusted R-squared: 0.7833
## F-statistic: 591 on 4 and 649 DF, p-value: < 2.2e-16
qplot(mlr2$residuals)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
interaction.plot(FEV$height, FEV$sex, FEV$fev)
If I were to collect similar data for a new study, I would collect more detailed information on smoking (e.g. packs smoked) to get a better picture of whether smoking influences FEV among children. I might also collect information on parents’ smoking status, since children are likely exposed to smoke if their parents are smokers. Finally, I might collect data on weight to see if weight or BMI influences FEV.