require(nlme)
## Loading required package: nlme
Ortho=Orthodont
Ortho$sex1 = c(rep(0,64),rep(1,44)) #add column, 0's if male 1's if female
Ortho$age = Ortho$age-11
fit1 = lme(distance~age+sex1+age*sex1, data=Ortho, random=~1 | Subject) #model with random effect, intercept
fit2 = gls(distance~age+sex1+age*sex1, data=Ortho) #model with no random effect
(b) Use the LRT to test \(H_0: \sigma^{2}_{intercept} = {0}\) against \(H_1: \sigma^{2}_{intercept} \ne {0}\).. State your conclusions and state which is the “best” model.
anova(fit2, fit1)
## Model df AIC BIC logLik Test L.Ratio p-value
## fit2 1 5 493.6 506.8 -241.8
## fit1 2 6 445.8 461.6 -216.9 1 vs 2 49.8 <.0001
The better model is fit1 with the added random effect for the intercept. Its AIC is signigicantly lower and the anova test yielded a p-value < .0001 rejecting our null hypothesis of \(H_0: \sigma^{2}_{intercept} = {0}\).
plot(fit1)
The residual plot is scattered so there is no reason to doubt this model.
qqnorm(resid(fit1))
The Q-Q plot yields a straight line with the exception of a few points on the tails but we have strong reason to believe the data is iid N(0,\(\sigma^{2}\)).
(c) Examine the summary of the model fit1 from part (a). Is there evidence to suggest that boys and girls have significantly different growth patterns? Explain.
summary(fit1)
## Linear mixed-effects model fit by REML
## Data: Ortho
## AIC BIC logLik
## 445.8 461.6 -216.9
##
## Random effects:
## Formula: ~1 | Subject
## (Intercept) Residual
## StdDev: 1.816 1.386
##
## Fixed effects: distance ~ age + sex1 + age * sex1
## Value Std.Error DF t-value p-value
## (Intercept) 24.969 0.4860 79 51.38 0.0000
## age 0.784 0.0775 79 10.12 0.0000
## sex1 -2.321 0.7614 25 -3.05 0.0054
## age:sex1 -0.305 0.1214 79 -2.51 0.0141
## Correlation:
## (Intr) age sex1
## age 0.000
## sex1 -0.638 0.000
## age:sex1 0.000 -0.638 0.000
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.59804 -0.45462 0.01578 0.50245 3.68621
##
## Number of Observations: 108
## Number of Groups: 27
After looking at the p-value for sex1, (0.0054), we notice that it is below \(\alpha=0.05\) which would lead us to believe that gender does have a significant effect on growth patterns.
(d)Make a plot of the standardized residuals versus fitted values by gender.
plot(fit1, resid(.,type="p")~fitted(.) | Subject, abline=0, lty=2)
It doesn’t appear that males and females have the same residual variance. They both appear to be pretty close but the males seem to have a larger variance in comparison to the females.
(e) You can test part (d) as in the Rat Pup Lab, but for this homework just use your model from part (a) with the VarIdent option to allow for different residual variances for each gender.
fit1b = lme(distance~age+sex1+age*sex1, data=Ortho, random=~1 | Subject, weights = varIdent(form=~1|sex1))
plot(fit1b, resid(.,type="p")~fitted(.) | Subject, abline=0, lty=2)
After looking at the residuals vs fitted plot with the VarIdent option now it appears that the Males and Females have the same residual variance.
(f) Which model fit1 or fit1b in part (e) is the “best” model based on AIC? Plot the fitted values over time for all the subjects from the best model.
anova(fit1,fit1b)
## Model df AIC BIC logLik Test L.Ratio p-value
## fit1 1 6 445.8 461.6 -216.9
## fit1b 2 7 429.2 447.7 -207.6 1 vs 2 18.54 <.0001
The best model based on AIC is fit1b (allowed different residual variances for each gender).
Plot the fitted values over time for all the subjects from the best model.
plot(augPred(fit1))
PROBLEM 2 Briefly describe your dataset for the Project. Include a description of the fixed and random effects you will consider in the model.
The data set is a survey of 40 food items that claim to be “low-calorie” or “low-fat.” The variables include difference of labeled and actual calories per gram and per item. Each item is also classified by its advertisement/distribution. The fixed effects will be the the classification of the item (N if nationally advertised, R if regionally distributed, L if locally prepared). We can include a random effect for the intercept for the model. We will be comparing whether each classication has a significant effect on the error of marked calories. After that we can follow up with how much a difference could we suspect after modeling the data. http://lib.stat.cmu.edu/DASL/Datafiles/Calories.html