Stat689 HW11

#1) Demonstrate ID variable does not have to be in order

library(HRW)
library(gamm4)

## Loading required package: Matrix

## Loading required package: lme4

## Loading required package: mgcv

## Loading required package: nlme

## 
## Attaching package: 'nlme'

## The following object is masked from 'package:lme4':
## 
##     lmList

## This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.

## This is gamm4 0.2-6

data(femSBMD)
names(femSBMD)

## [1] "idnum"     "spnbmd"    "age"       "ethnicity" "black"     "hispanic" 
## [7] "white"

fit <- gamm4(spnbmd ~ s(age,k=10,bs="cr") + black + hispanic + white, random= ~(1|idnum),data = femSBMD)

summary(fit$gam)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## spnbmd ~ s(age, k = 10, bs = "cr") + black + hispanic + white
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.92535    0.01246  74.239  < 2e-16 ***
## black        0.08195    0.01722   4.758 2.24e-06 ***
## hispanic    -0.01511    0.01759  -0.859    0.391    
## white        0.01508    0.01753   0.860    0.390    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df     F p-value    
## s(age) 7.233  7.233 224.9  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.519   
## lmer.REML = -2386.2  Scale est. = 0.0013566  n = 1003

Above is the original fit of the spinal bone mineral density data. If the theory that the value of the id number does not effect the model is true, then altering the id number variable should still produce the same model.

#1.a) Define new IDnum variable
femSBMD$idnum2 = 2*femSBMD$idnum

#1.b) Rerun gamm4::gamm4 code to check results
fit2 <- gamm4(spnbmd ~ s(age,k=10,bs="cr") + black + hispanic + white,
             random= ~(1|idnum2),data = femSBMD)

summary(fit2$gam)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## spnbmd ~ s(age, k = 10, bs = "cr") + black + hispanic + white
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.92535    0.01246  74.239  < 2e-16 ***
## black        0.08195    0.01722   4.758 2.24e-06 ***
## hispanic    -0.01511    0.01759  -0.859    0.391    
## white        0.01508    0.01753   0.860    0.390    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df     F p-value    
## s(age) 7.233  7.233 224.9  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.519   
## lmer.REML = -2386.2  Scale est. = 0.0013566  n = 1003

As we can see, using altered id numbers produces the same model.

#2) Import the pig weight data

pigWeights <- read.csv("~/Grad School/STAT 689/Data/pigWeights.csv")

#2.a) Display the lattice plot

library(lattice)
library(HRW)

pigvis <- xyplot(weight ~ num.weeks,
                     group = id.num,as.table = TRUE,
                     data = pigWeights,
                     xlab = list("age (weeks)"),
                     ylab = list("weight"),
                     main = "Pig Weight Data",
                     panel = function(x,y,subscripts,groups)
                     {  
                       panel.grid() 
                       panel.superpose(x,y,subscripts,groups,
                                       type = "b",pch = 16,lwd = 2)
                     })
plot(pigvis)

#2.b) Assessing applicability of a random intercept model

Based on the plot of the pig weight data, a random intercept model seems like it could be applied to this data. The plot of each pig’s growth seems to have roughly the same shape, although each pig has a different starting weight. The only area of concern is that the range of pig weights at 9 weeks is greater than the range of pig weights at 1 week; further analysis would be needed to assess if this difference is significant enough to warrant a random function model.

#2.c) Fit random intercept model

fitsp <- gamm4(weight ~ s(num.weeks,k=8,bs="cr"),
             random= ~(1|id.num),data = pigWeights)

fitsp$mer

## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 2027.055
## Random effects:
##  Groups   Name         Std.Dev.
##  id.num   (Intercept)  3.8926  
##  Xr       s(num.weeks) 0.1179  
##  Residual              2.0732  
## Number of obs: 432, groups:  id.num, 48; Xr, 6
## Fixed Effects:
##     X(Intercept)  Xs(num.weeks)Fx1  
##            50.41            -46.02

Between-pig variance aka var(U): (3.8926)^2 = 15.152

Within-pig variance aka var(e): (2.0732)^2 = 4.298

#2.d & 2.e) Check spline fit against linear & quadratic effects

fitL <- gamm4(weight ~ I(num.weeks),
               random= ~(1|id.num),data = pigWeights)
fitQ <- gamm4(weight ~ I(num.weeks)+I(num.weeks^2),
               random= ~(1|id.num),data = pigWeights)

anova(fitsp$mer,fitL$mer)

## refitting model(s) with ML (instead of REML)

## Data: NULL
## Models:
## fitL$mer: NULL
## fitsp$mer: NULL
##           npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## fitL$mer     4 2037.8 2054.1 -1014.9   2029.8                       
## fitsp$mer    5 2037.1 2057.4 -1013.5   2027.1 2.7577  1    0.09679 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(fitsp$mer,fitQ$mer)

## refitting model(s) with ML (instead of REML)

## Data: NULL
## Models:
## fitsp$mer: NULL
## fitQ$mer: NULL
##           npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)
## fitsp$mer    5 2037.1 2057.4 -1013.5   2027.1                    
## fitQ$mer     5 2039.1 2059.4 -1014.5   2029.1     0  0

anova(fitQ$mer,fitL$mer)

## refitting model(s) with ML (instead of REML)

## Data: NULL
## Models:
## fitL$mer: NULL
## fitQ$mer: NULL
##          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## fitL$mer    4 2037.8 2054.1 -1014.9   2029.8                     
## fitQ$mer    5 2039.1 2059.4 -1014.5   2029.1 0.7488  1     0.3869

Based on the analysis of variance, it does not appear that a spline is needed for this model. The ANOVA comparison of the spline model versus the linear model yielded a p-value of 0.0979, so we can conclude that the spline model was not a significantly better fit than the linear model. Comparing the spline model with the quadratic model does not give a p-value because the two models have the same degrees of freedom, but we see that the deviance is almost the same as the linear model deviance (2029.1 vs. 2029.8). Finally, comparing the linear and the quadratic models yields a p-value of 0.3869, so the quadratic model is not a significantly better fit than the linear model. Based on these analyses, a linear effect model seems to be the best fit for the data.

#2.f) Plot the function

par(mai=c(1.02,0.9,0.5,0.3))
par(mfrow=c(1,1))
plot(fitsp$gam,shade = TRUE,
     shade.col = "palegreen",ylab="fitted splines",main='Pig Weight Data')

#2.g) Calculate the expected within-pig correlation

#corr = var(U)/(var(U)+var(e))
corr = 15.15233/(15.15233+4.298158)
corr

## [1] 0.7790206

The estimated within-pig correlation for this model is 0.779. This indicates that values measured from the same pig are highly correlated.

Stat689 HW11

Bailey Stewart

4/11/2022