Context

It is now widely believed that smoking tends to impair lung function. Much of the data to support this arises from studies of pulmonary function in adults who are long-time smokers. A question then arises whether such deleterious effects of smoking can be detected in children who smoke. To address this question, measures of lung function were made in 654 children seen for a routine check up in a particular pediatric clinic. The children participating in this study were asked whether they were current smokers.

A common measurement of lung function is the forced expiratory volume (FEV), which measures how much air you can blow out of your lungs in a short period of time. A higher FEV is usually associated with better respiratory function. It is well known that prolonged smoking diminishes FEV in adults, and those adults with diminished FEV also tend to have decreased pulmonary function as measured by other clinical variables, such as blood oxygen and carbon dioxide levels.

Here, we will practice fitting linear regressions using the FEV dataset.

Reading in the Data

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tibble' was built under R version 4.5.2
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'purrr' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'stringr' was built under R version 4.5.2
## Warning: package 'forcats' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
fev <- read_csv("fev.csv")
head(fev)
## # A tibble: 6 × 7
##   seqnbr subjid   age   fev height sex    smoke    
##    <dbl>  <dbl> <dbl> <dbl>  <dbl> <chr>  <chr>    
## 1      1    301     9  1.71   57   female nonsmoker
## 2      2    451     8  1.72   67.5 female nonsmoker
## 3      3    501     7  1.72   54.5 female nonsmoker
## 4      4    642     9  1.56   53   male   nonsmoker
## 5      5    901     9  1.90   57   male   nonsmoker
## 6      6   1701     8  2.34   61   female nonsmoker

The columns are:

Question 1: Fit a linear regression model with fev as response and smoke as the predictor. Provide an interpretation for the coefficient on smoke including an estimate and 95% confidence interval. model1<-lm(fev~smoke,data=fev) summary(model1) confint(model1)

###Ans1: The esitmated coef for smoke smoker is 0.71 that means children who smoke have FEV > than about 0.71 ltthan who don’t smoke. CI = [0.49,0.92] – As 0 is not in this interval and p-value is really small, the association is statistically significant. Also, it can be because of confounding.

Question 2: Assess the assumptions of linearity, constant variance and normality using residual plots. Comment on what you notice.

Residuals vs Fitted (Linearity & Constant Variance)

fev %>% mutate(resid = resid(model1)) %>% ggplot(aes(x = smoke, y = resid)) + geom_boxplot(fill = “skyblue”) + geom_hline(yintercept = 0, linetype = “dashed”) + labs( x = “Smoking Status”, y = “Residuals”, title = “Residuals by Smoking Status” )

Normal Q-Q plot (Normality)

fev %>% mutate( fitted = fitted(model1), resid = resid(model1) ) %>% ggplot(aes(x = fitted, y = resid)) + geom_point(alpha = 0.5) + geom_hline(yintercept = 0, linetype = “dashed”) + labs( x = “Fitted Values”, y = “Residuals”, title = “Residuals vs Fitted Values” )

Scale-Location plot (Constant Variance)

fev %>% mutate(resid = resid(model1)) %>% ggplot(aes(sample = resid)) + stat_qq() + stat_qq_line() + labs(title = “Normal Q–Q Plot of Residuals”)

###Ans2: The residual plots look fairly random with no strong patterns, suggesting the linearity assumption is reasonable. The spread of residuals is similar across groups, and the residuals appear roughly normal varies only at tails. Overall, there are no serious violations of the model assumptions

Question 3: Fit a linear regression model with fev as response and height as the predictor. Provide an interpretation for the coefficient on height, including an estimate and 95% confidence interval.

model_height <- lm(fev ~ height, data = fev) summary(model_height) confint(model_height)

###Ans3: Taller children tend to have higher FEV. On average, each extra inch of height increases FEV by about 0.13 liters (95% CI: 0.13 to 0.14). This effect is statistically significant and height explains a large part of the differences in FEV.

Question 4: Convert height from inches to centimeters and fit a new simple linear regression model. Interpret the coefficient, including an estimate and 95% confidence interval.

fev <- fev %>% mutate(height_cm = height * 2.54) model_height_cm <- lm(fev ~ height_cm, data = fev)

summary(model_height_cm)

confint(model_height_cm)

###Ans4: Taller children have higher FEV. On average, each extra centimeter of height increases FEV by about 0.052 liters (95% CI: 0.050 to 0.054). This effect is statistically significant, and the overall relationship is the same as when height is measured in inches—only the number is smaller because of the unit change.

Question 5: Compare and contrast your results in Questions 3 and 4.

###Ans5: The results are the same in both models - taller children have higher FEV. The only difference is the unit: in inches, each extra inch increases FEV by about 0.13 liters, while in centimeters, each extra cm increases FEV by about 0.052 liters.