#1) Demonstrate ID variable does not have to be in order
library(HRW)
library(gamm4)
## Loading required package: Matrix
## Loading required package: lme4
## Loading required package: mgcv
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:lme4':
##
## lmList
## This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.
## This is gamm4 0.2-6
data(femSBMD)
names(femSBMD)
## [1] "idnum" "spnbmd" "age" "ethnicity" "black" "hispanic"
## [7] "white"
fit <- gamm4(spnbmd ~ s(age,k=10,bs="cr") + black + hispanic + white, random= ~(1|idnum),data = femSBMD)
summary(fit$gam)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## spnbmd ~ s(age, k = 10, bs = "cr") + black + hispanic + white
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.92535 0.01246 74.239 < 2e-16 ***
## black 0.08195 0.01722 4.758 2.24e-06 ***
## hispanic -0.01511 0.01759 -0.859 0.391
## white 0.01508 0.01753 0.860 0.390
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(age) 7.233 7.233 224.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.519
## lmer.REML = -2386.2 Scale est. = 0.0013566 n = 1003
Above is the original fit of the spinal bone mineral density data. If the theory that the value of the id number does not effect the model is true, then altering the id number variable should still produce the same model.
#1.a) Define new IDnum variable
femSBMD$idnum2 = 2*femSBMD$idnum
#1.b) Rerun gamm4::gamm4 code to check results
fit2 <- gamm4(spnbmd ~ s(age,k=10,bs="cr") + black + hispanic + white,
random= ~(1|idnum2),data = femSBMD)
summary(fit2$gam)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## spnbmd ~ s(age, k = 10, bs = "cr") + black + hispanic + white
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.92535 0.01246 74.239 < 2e-16 ***
## black 0.08195 0.01722 4.758 2.24e-06 ***
## hispanic -0.01511 0.01759 -0.859 0.391
## white 0.01508 0.01753 0.860 0.390
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(age) 7.233 7.233 224.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.519
## lmer.REML = -2386.2 Scale est. = 0.0013566 n = 1003
As we can see, using altered id numbers produces the same model.
#2) Import the pig weight data
pigWeights <- read.csv("~/Grad School/STAT 689/Data/pigWeights.csv")
#2.a) Display the lattice plot
library(lattice)
library(HRW)
pigvis <- xyplot(weight ~ num.weeks,
group = id.num,as.table = TRUE,
data = pigWeights,
xlab = list("age (weeks)"),
ylab = list("weight"),
main = "Pig Weight Data",
panel = function(x,y,subscripts,groups)
{
panel.grid()
panel.superpose(x,y,subscripts,groups,
type = "b",pch = 16,lwd = 2)
})
plot(pigvis)
#2.b) Assessing applicability of a random intercept model
Based on the plot of the pig weight data, a random intercept model seems like it could be applied to this data. The plot of each pig’s growth seems to have roughly the same shape, although each pig has a different starting weight. The only area of concern is that the range of pig weights at 9 weeks is greater than the range of pig weights at 1 week; further analysis would be needed to assess if this difference is significant enough to warrant a random function model.
#2.c) Fit random intercept model
fitsp <- gamm4(weight ~ s(num.weeks,k=8,bs="cr"),
random= ~(1|id.num),data = pigWeights)
fitsp$mer
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 2027.055
## Random effects:
## Groups Name Std.Dev.
## id.num (Intercept) 3.8926
## Xr s(num.weeks) 0.1179
## Residual 2.0732
## Number of obs: 432, groups: id.num, 48; Xr, 6
## Fixed Effects:
## X(Intercept) Xs(num.weeks)Fx1
## 50.41 -46.02
Between-pig variance aka var(U): (3.8926)^2 = 15.152
Within-pig variance aka var(e): (2.0732)^2 = 4.298
#2.d & 2.e) Check spline fit against linear & quadratic effects
fitL <- gamm4(weight ~ I(num.weeks),
random= ~(1|id.num),data = pigWeights)
fitQ <- gamm4(weight ~ I(num.weeks)+I(num.weeks^2),
random= ~(1|id.num),data = pigWeights)
anova(fitsp$mer,fitL$mer)
## refitting model(s) with ML (instead of REML)
## Data: NULL
## Models:
## fitL$mer: NULL
## fitsp$mer: NULL
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## fitL$mer 4 2037.8 2054.1 -1014.9 2029.8
## fitsp$mer 5 2037.1 2057.4 -1013.5 2027.1 2.7577 1 0.09679 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fitsp$mer,fitQ$mer)
## refitting model(s) with ML (instead of REML)
## Data: NULL
## Models:
## fitsp$mer: NULL
## fitQ$mer: NULL
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## fitsp$mer 5 2037.1 2057.4 -1013.5 2027.1
## fitQ$mer 5 2039.1 2059.4 -1014.5 2029.1 0 0
anova(fitQ$mer,fitL$mer)
## refitting model(s) with ML (instead of REML)
## Data: NULL
## Models:
## fitL$mer: NULL
## fitQ$mer: NULL
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## fitL$mer 4 2037.8 2054.1 -1014.9 2029.8
## fitQ$mer 5 2039.1 2059.4 -1014.5 2029.1 0.7488 1 0.3869
Based on the analysis of variance, it does not appear that a spline is needed for this model. The ANOVA comparison of the spline model versus the linear model yielded a p-value of 0.0979, so we can conclude that the spline model was not a significantly better fit than the linear model. Comparing the spline model with the quadratic model does not give a p-value because the two models have the same degrees of freedom, but we see that the deviance is almost the same as the linear model deviance (2029.1 vs. 2029.8). Finally, comparing the linear and the quadratic models yields a p-value of 0.3869, so the quadratic model is not a significantly better fit than the linear model. Based on these analyses, a linear effect model seems to be the best fit for the data.
#2.f) Plot the function
par(mai=c(1.02,0.9,0.5,0.3))
par(mfrow=c(1,1))
plot(fitsp$gam,shade = TRUE,
shade.col = "palegreen",ylab="fitted splines",main='Pig Weight Data')
#2.g) Calculate the expected within-pig correlation
#corr = var(U)/(var(U)+var(e))
corr = 15.15233/(15.15233+4.298158)
corr
## [1] 0.7790206
The estimated within-pig correlation for this model is 0.779. This indicates that values measured from the same pig are highly correlated.