Overview

Forced expiratory volume (FEV) is a measure of lung capacity. It measures how much a person can exhale during a forced breath. During the 1970s, data was collected in Boston in youths. This particular data is a cross sectional subset of the larger study which also examined second-hand smoke. The data contains 654 cases, each one being a child.

Question

Can the FEV be predicted based on the age, height, sex, and smoker status?

Response

  • fev, the forced expiratory volume, a measure of the lung capacity, in liters

Explanatory

  • age: quantitative, age of the youth
  • height: quantitative, the height of the child
  • sex: qualitative, sex of the youth (female, male)
  • smoke: qualitative, age of the passenger (current smoker, non-current smoker)

Summary Statistics

summary(FEV)
##       age              fev            height          sex     
##  Min.   : 3.000   Min.   :0.791   Min.   :46.00   female:318  
##  1st Qu.: 8.000   1st Qu.:1.981   1st Qu.:57.00   male  :336  
##  Median :10.000   Median :2.547   Median :61.50               
##  Mean   : 9.931   Mean   :2.637   Mean   :61.14               
##  3rd Qu.:12.000   3rd Qu.:3.119   3rd Qu.:65.50               
##  Max.   :19.000   Max.   :5.793   Max.   :74.00               
##                 smoke    
##  non-current smoker:589  
##  current smoker    : 65  
##                          
##                          
##                          
## 
summary(FEV~smoke)
##  Length   Class    Mode 
##       3 formula    call

Summary Statistics

Summary Statistics

ggplot(data = FEV, aes(x = age, y = fev, color = smoke)) +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE) +
  ggtitle("Age vs. FEV") 

Summary Statistics

Summary Statistics

Linear Regression

#0.5716 -> 0.7657 -> 0.7736 -> 0.774
model <- lm(fev ~ age + height + sex + smoke, data = FEV)
summary(model)
## 
## Call:
## lm(formula = fev ~ age + height + sex + smoke, data = FEV)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.37656 -0.25033  0.00894  0.25588  1.92047 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -4.456974   0.222839 -20.001  < 2e-16 ***
## age                  0.065509   0.009489   6.904 1.21e-11 ***
## height               0.104199   0.004758  21.901  < 2e-16 ***
## sexmale              0.157103   0.033207   4.731 2.74e-06 ***
## smokecurrent smoker -0.087246   0.059254  -1.472    0.141    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4122 on 649 degrees of freedom
## Multiple R-squared:  0.7754, Adjusted R-squared:  0.774 
## F-statistic:   560 on 4 and 649 DF,  p-value: < 2.2e-16

Interpretation of the regression coefficients

\(\widehat{fev} = -4.5697 + 0.6551 \cdot age + 0.1042 \cdot height + 0.1571 \cdot sex_{male}-0.0872 \cdot smoke_{current smoker}\)

Slope of age: All else held constant, children that are 1 year older tend to have 0.066 liters more in lung capacity.

Slope of height: All else held constant, children that are 1 inch taller tend to have 0.104 liters more in lung capacity.

Slope of sex: All else held constant, the model predicts that males have 0.157 more liters in lung capacity than females.

Slope of smoke: All else held constant, the model predicts that current smokers have 0.087 less liters in lung capacity than females.

Intercept: Children that are 0 years old and 0 inches tall, have a lung capacity of -4.457 liters. The intercept does not make sense in context and is there for correction.

\(R^2_{adj}\): 77.4% of the variance in FEV can be explained by the independent variables.

Residual Analysis

Linear Regression

model2 <- lm(fev ~ (age / height) + sex + smoke, data = FEV)
summary(model2)
## 
## Call:
## lm(formula = fev ~ (age/height) + sex + smoke, data = FEV)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.71508 -0.23557 -0.00791  0.23193  1.61211 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.8264848  0.0903125  20.224  < 2e-16 ***
## age                 -0.6134754  0.0365213 -16.798  < 2e-16 ***
## sexmale              0.0869255  0.0330242   2.632  0.00869 ** 
## smokecurrent smoker -0.1872152  0.0575954  -3.251  0.00121 ** 
## age:height           0.0110817  0.0004755  23.306  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4011 on 649 degrees of freedom
## Multiple R-squared:  0.7873, Adjusted R-squared:  0.786 
## F-statistic: 600.7 on 4 and 649 DF,  p-value: < 2.2e-16

Interpretation of the regression coefficients

\(\widehat{fev} = 1.826 - 0.6135 \cdot age + 0.0869 \cdot sex_{male}-0.1872 \cdot smoke_{current smoker} + 0.0111 \cdot \frac{age}{height}\)

Slope of age: All else held constant, children that are 1 year older tend to have 0.613 liters less in lung capacity.

Slope of sex: All else held constant, the model predicts that males have 0.0870 more liters in lung capacity than females.

Slope of smoke: All else held constant, the model predicts that current smokers have 0.1872 less liters in lung capacity than females.

Slope of age/height: All else held constant, a 1 unit increase tends to have 0.0111 liters more in lung capacity.

Intercept: Children that are 0 years old and 0 inches tall, have a lung capacity of 1.8265 liters. The intercept does not make sense in context and is there for correction.

\(R^2_{adj}\): 78.6% of the variance in FEV can be explained by the independent variables.

Residual Analysis

Conclusion

Some limitations may include whether or not the smoke variable can be trusted since it was self-reported. Also, since this data is based on children, we don’t know how long or how many times they have smoked.

There are also other factors that contribute to FEV such as weight, physical activity, muscle strength, body mass index, medical conditions, history, and environmental factors.

In conclusion, based on the analyses, it can be said that males tend to have a higher FEV and smokers tend to have a lower lung capacity. Also as a child gets older and taller, their lung capacity increases. Data also changes throughout the years, so this data may not be representative presently. FEV prediction can be improved with more explanatory variables.

References

For this project, the data was used from Vanderbilt and a description can be found here. The file is labeled as FEV.csv.

Rosner, B. (1999), Fundamentals of Biostatistics, 5th ed., Pacific Grove, CA: Duxbury. Data obtained from http://biostat.mc.vanderbilt.edu/DataSets.