suppressPackageStartupMessages(library(tidyverse))
library(ggplot2)

Q.1 - Error assumptions: The errors are independent, with zero expected value and constant variance and are normally distributed.

Q.2 - View data

fev <- read.table("http://people.bath.ac.uk/trs35/fev.txt", header = T)
glimpse(fev)
## Observations: 654
## Variables: 7
## $ seqnbr <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
## $ subjid <int> 301, 451, 501, 642, 901, 1701, 1752, 1753, 1901, 1951, ...
## $ age    <int> 9, 8, 7, 9, 9, 8, 6, 6, 8, 9, 6, 8, 8, 8, 8, 7, 5, 6, 9...
## $ fev    <dbl> 1.708, 1.724, 1.720, 1.558, 1.895, 2.336, 1.919, 1.415,...
## $ height <dbl> 57.0, 67.5, 54.5, 53.0, 57.0, 61.0, 58.0, 56.0, 58.5, 6...
## $ sex    <int> 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 1...
## $ smoke  <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2...
#View(fev)

We have numerical data. Need to recode the smoke variable as it’s currently numeric but we want it to be a factor.

fev$smoke <- factor(fev$smoke,labels = c("yes", "no"))
#View(fev$smoke)

Q.3 - Plot fev vs. age coloured by smoking status.

fev %>% ggplot(aes(x = age, y = fev, colour= smoke)) +
  geom_point(aes(shape = smoke, color = smoke), alpha = 0.6, position = "jitter") +
  scale_shape_manual(values = c(17, 19)) +
  scale_color_manual(values = c("#1380A1", "#FAAB18")) +
  ggtitle("Fev vs. Age") +
  theme_bw() +
  theme(legend.position = "top", plot.title = element_text(face = "bold", hjust = 0.5)) + labs(x = "Age", y = "Fev")

Q.4 - Fit a linear model

fevLm <- lm(fev ~ age, data = fev)
summary(fevLm)
## 
## Call:
## lm(formula = fev ~ age, data = fev)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.57539 -0.34567 -0.04989  0.32124  2.12786 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.431648   0.077895   5.541 4.36e-08 ***
## age         0.222041   0.007518  29.533  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5675 on 652 degrees of freedom
## Multiple R-squared:  0.5722, Adjusted R-squared:  0.5716 
## F-statistic: 872.2 on 1 and 652 DF,  p-value: < 2.2e-16

We conclude that there is a significant, positive association between age and fev. For every additional year of age, the fev score increases by 0.22 L/s.

Q.5 - Make another plot with geom_smooth line fitted.

fev %>% ggplot(aes(x = height, y = fev, colour= smoke)) +
  geom_point(aes(shape = smoke, color = smoke), alpha = 0.6, position = "jitter") +
  geom_smooth(method = "lm", se = F) +
  scale_shape_manual(values = c(17, 19)) +
  scale_color_manual(values = c("#1380A1", "#FAAB18")) +
  ggtitle("Fev vs. Height") +
  theme_bw() +
  theme(legend.position = "top", plot.title = element_text(face = "bold", hjust = 0.5)) + 
  labs(x = "Height", y = "Fev")

Q.6 - Diagonstic plots

fevLm <- lm(fev ~ age, data = fev)
plot(fitted(fevLm), residuals(fevLm), 
     xlab = "Fitted", ylab = "Residuals")
abline(h=0)

As per the plot we can see some pattern in the error. If there was constant variance we would expect there to be no pattern. HK may be an issue.

Q.7 - Relevel the fev data

fevMod <- lm(fev ~ age+smoke, data = fev)
summary(fevMod)
## 
## Call:
## lm(formula = fev ~ age + smoke, data = fev)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6653 -0.3564 -0.0508  0.3494  2.0894 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.158378   0.131003   1.209  0.22712    
## age         0.230605   0.008184  28.176  < 2e-16 ***
## smokeno     0.208995   0.080745   2.588  0.00986 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5651 on 651 degrees of freedom
## Multiple R-squared:  0.5766, Adjusted R-squared:  0.5753 
## F-statistic: 443.3 on 2 and 651 DF,  p-value: < 2.2e-16

The relationship between FEV and age is more or less unchanged with the previous result. Non-smokers have 0.21 L/s greater lung capacity compared to smokers to smokers on average.