In this exercise you will study a dataset concerning vital capacity, which is the maximal amount of air, that can be exhaled after a maximal inhalation.
library(multcomp)
## Warning: package 'multcomp' was built under R version 3.2.5
## Loading required package: mvtnorm
## Warning: package 'mvtnorm' was built under R version 3.2.5
## Loading required package: survival
## Warning: package 'survival' was built under R version 3.2.5
## Loading required package: TH.data
## Warning: package 'TH.data' was built under R version 3.2.5
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 3.2.5
##
## Attaching package: 'TH.data'
## Det følgende objekt er maskeret fra 'package:MASS':
##
## geyser
library(ggplot2)
Read in the data:
vitcap <- read.table("http://asta.math.aau.dk/dan/static/datasets?file=vitcap.dat", header=TRUE)
head(vitcap)
## exposure age vital.capacity z1 z2
## 1 C 39 4.62 0 1
## 2 C 40 5.29 0 1
## 3 C 41 5.52 0 1
## 4 C 41 3.71 0 1
## 5 C 45 4.02 0 1
## 6 C 49 5.09 0 1
In the dataset, the variable vital.capacity has been measured on 84 workers in the cadmium industry.
The next variable is the factor exposure with 3 levels, indicating the level of cadmium exposure:
The data set also contains dummy variables for the factor exposure:
z1=1 if exposure=B and 0 otherwise.z2=1 if exposure=C and 0 otherwise.You will use these two variables later on.
exposure on the response vital.capacity. In that connection you should calculate/interpretboxplot(vital.capacity ~ exposure, data = vitcap )
-(2) F-test for no effect of exposure
Since the p-value is not significant (0.09021), we can accept the Null Hypothesis of using the simple model for estimation. Therefore, we should NOT use a more complex model, namely one with exposure as a predictor, to better express the change in vital.capacity with the increase of age.
model = lm(vital.capacity ~ exposure, data = vitcap )
summary(model)
##
## Call:
## lm(formula = vital.capacity ~ exposure, data = vitcap)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77179 -0.45205 0.09808 0.51295 1.57083
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.46204 0.11223 39.757 <2e-16 ***
## exposureB 0.00974 0.17997 0.054 0.9570
## exposureC -0.51288 0.24245 -2.115 0.0375 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7445 on 81 degrees of freedom
## Multiple R-squared: 0.05767, Adjusted R-squared: 0.0344
## F-statistic: 2.478 on 2 and 81 DF, p-value: 0.09021
-(3) confidence intervals for multiple pairwise comparisons. Hint (assuming your fitted model is called model and that the multcomp package is installed):
test <- glht(model, mcp(exposure = "Tukey"))
summary(test)
The difference between B time and A time is close to 0, meaning that exposure below 10 years does not appear to affect the vital capacity by a significant amount. However, the difference between C and B or A is significant. Working for more than 10 years will affect you.
We expand the analysis to include the workers age - the variable age - as a predictor. Make a model and carry out an analysis investigating the effect of the predictors exposure and age on the response vital.capacity. In that connection you should:
exposure and agemodelA = lm(vital.capacity ~ exposure + age, data = vitcap)
modelB = lm(vital.capacity ~ exposure * age, data = vitcap)
anova(modelA, modelB)
## Analysis of Variance Table
##
## Model 1: vital.capacity ~ exposure + age
## Model 2: vital.capacity ~ exposure * age
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 80 30.035
## 2 78 27.535 2 2.4995 3.5402 0.03376 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A p-value of around 3% means that we should use the more complex model to determining whether there is a significant interaction between the effects of exposure and age.
p <- ggplot(vitcap, aes(x=age, y=vital.capacity)) #empty plot
p <- p + geom_point(aes(color=exposure)) #add points
p <- p + geom_smooth(aes(color = exposure), method="lm", se=FALSE) #add lines
p
model1 <- lm(vital.capacity ~ age*z2, data = vitcap)
model2 <- lm(vital.capacity ~ age*z1 + age*z2, data = vitcap)
model1
model2
model1 and model2.