STAT 700, Fall 2014 Homework 9 Problems due Wed. Nov. 19

Francis Alvarez
Gonzalo Urritia

Following the Rat Pup R code, make a new variable sex1 which is a 1 for Female and 0 for Male. Let’s center the ages, so use I(age-11) in your R function calls. Fita linear mixed model using REML, that has fixed effects that include sex1, I(age-11), sex1:I(age-11) interaction, an intercept and a random effect for just the intercept. Call this fit1.

require(nlme)

## Loading required package: nlme

Ortho=Orthodont
Ortho$sex1 = c(rep(0,64),rep(1,44)) #add column, 0's if male 1's if female
Ortho$age = Ortho$age-11
fit1 = lme(distance~age+sex1+age*sex1, data=Ortho, random=~1 | Subject) #model with random effect, intercept
fit2 = gls(distance~age+sex1+age*sex1, data=Ortho)  #model with no random effect

(b) Use the LRT to test \(H_0: \sigma^{2}_{intercept} = {0}\) against \(H_1: \sigma^{2}_{intercept} \ne {0}\).. State your conclusions and state which is the “best” model.

anova(fit2, fit1)

##      Model df   AIC   BIC logLik   Test L.Ratio p-value
## fit2     1  5 493.6 506.8 -241.8                       
## fit1     2  6 445.8 461.6 -216.9 1 vs 2    49.8  <.0001

The better model is fit1 with the added random effect for the intercept. Its AIC is signigicantly lower and the anova test yielded a p-value < .0001 rejecting our null hypothesis of \(H_0: \sigma^{2}_{intercept} = {0}\).

plot(fit1)

plot of chunk unnamed-chunk-3

The residual plot is scattered so there is no reason to doubt this model.

qqnorm(resid(fit1))

plot of chunk unnamed-chunk-4 The Q-Q plot yields a straight line with the exception of a few points on the tails but we have strong reason to believe the data is iid N(0,\(\sigma^{2}\)).

(c) Examine the summary of the model fit1 from part (a). Is there evidence to suggest that boys and girls have significantly different growth patterns? Explain.

summary(fit1)

## Linear mixed-effects model fit by REML
##  Data: Ortho 
##     AIC   BIC logLik
##   445.8 461.6 -216.9
## 
## Random effects:
##  Formula: ~1 | Subject
##         (Intercept) Residual
## StdDev:       1.816    1.386
## 
## Fixed effects: distance ~ age + sex1 + age * sex1 
##              Value Std.Error DF t-value p-value
## (Intercept) 24.969    0.4860 79   51.38  0.0000
## age          0.784    0.0775 79   10.12  0.0000
## sex1        -2.321    0.7614 25   -3.05  0.0054
## age:sex1    -0.305    0.1214 79   -2.51  0.0141
##  Correlation: 
##          (Intr) age    sex1  
## age       0.000              
## sex1     -0.638  0.000       
## age:sex1  0.000 -0.638  0.000
## 
## Standardized Within-Group Residuals:
##      Min       Q1      Med       Q3      Max 
## -3.59804 -0.45462  0.01578  0.50245  3.68621 
## 
## Number of Observations: 108
## Number of Groups: 27

After looking at the p-value for sex1, (0.0054), we notice that it is below \(\alpha=0.05\) which would lead us to believe that gender does have a significant effect on growth patterns.

(d)Make a plot of the standardized residuals versus fitted values by gender.

plot(fit1, resid(.,type="p")~fitted(.) | Subject, abline=0, lty=2)

plot of chunk unnamed-chunk-6 It doesn’t appear that males and females have the same residual variance. They both appear to be pretty close but the males seem to have a larger variance in comparison to the females.

(e) You can test part (d) as in the Rat Pup Lab, but for this homework just use your model from part (a) with the VarIdent option to allow for different residual variances for each gender.

fit1b = lme(distance~age+sex1+age*sex1, data=Ortho, random=~1 | Subject, weights = varIdent(form=~1|sex1))
plot(fit1b, resid(.,type="p")~fitted(.) | Subject, abline=0, lty=2)

plot of chunk unnamed-chunk-7 After looking at the residuals vs fitted plot with the VarIdent option now it appears that the Males and Females have the same residual variance.

(f) Which model fit1 or fit1b in part (e) is the “best” model based on AIC? Plot the fitted values over time for all the subjects from the best model.

anova(fit1,fit1b)

##       Model df   AIC   BIC logLik   Test L.Ratio p-value
## fit1      1  6 445.8 461.6 -216.9                       
## fit1b     2  7 429.2 447.7 -207.6 1 vs 2   18.54  <.0001

The best model based on AIC is fit1b (allowed different residual variances for each gender).

Plot the fitted values over time for all the subjects from the best model.

plot(augPred(fit1))

plot of chunk unnamed-chunk-9

PROBLEM 2 Briefly describe your dataset for the Project. Include a description of the fixed and random effects you will consider in the model.

The data set is a survey of 40 food items that claim to be “low-calorie” or “low-fat.” The variables include difference of labeled and actual calories per gram and per item. Each item is also classified by its advertisement/distribution. The fixed effects will be the the classification of the item (N if nationally advertised, R if regionally distributed, L if locally prepared). We can include a random effect for the intercept for the model. We will be comparing whether each classication has a significant effect on the error of marked calories. After that we can follow up with how much a difference could we suspect after modeling the data. http://lib.stat.cmu.edu/DASL/Datafiles/Calories.html