dog.dat=read.table("http://www-rohan.sdsu.edu/~babailey/stat700/dog.dat",header=T)
stripchart(PEconc ~ Anesthesia, vertical=T,method="stack",xlab="Anesthesia",ylab="PEconc",data=dog.dat,main="Dog Data Stripchart")
It appears that Anesthesia 3 has a different variance than the first two anesthesias. Anesthesia 1 and 2 don’t differ too much, we could believe that they have the same variance and mean.
(B)Fit a linear mixed effects model with lme. Give summary and diagnostics plots of the residuals. What do you conclude? Make sure that a factor is a factor! You can use the as.factor funcion inside of lme.
require(nlme)
## Loading required package: nlme
dog.dat$Anesthesia = as.factor(dog.dat$Anesthesia) #anesthesia is a factor so don't use numerical values
fit1 = lme(PEconc ~1,data=dog.dat,random=~1| Anesthesia)
summary(fit1)
## Linear mixed-effects model fit by REML
## Data: dog.dat
## AIC BIC logLik
## 28.55 32.65 -11.27
##
## Random effects:
## Formula: ~1 | Anesthesia
## (Intercept) Residual
## StdDev: 0.2096 0.3177
##
## Fixed effects: PEconc ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 0.5853 0.1342 27 4.361 2e-04
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.6464 -0.5659 -0.3195 0.6612 2.2885
##
## Number of Observations: 30
## Number of Groups: 3
plot(fit1)
qqnorm(fit1)
After looking at the residual of the lme fit. It doesnt appear that there is any specific patter. The data falls pretty evenly on both sides across the x-axis. As for the QQ plot the data does have a slight deiviation from the a linear shape. There is a small bump towards the center and the tails aren’t straight. Overall is it still believable that the data is normal.
(c)Test the hypothesis H0 : σ2α = 0 vs H1 : σ2α6= 0, using a LRT. What do you conclude?
fit2 = gls(PEconc ~1,data=dog.dat)
anova(fit1,fit2)
## Model df AIC BIC logLik Test L.Ratio p-value
## fit1 1 3 28.55 32.65 -11.27
## fit2 2 2 30.80 33.54 -13.40 1 vs 2 4.26 0.039
Because we get a p-value of 0.039 we reject the null hypothesis and have reason to assume the std deviation is significant.
(d) The default method with lme is REML. Repeat (c) using ML for BOTH fits. How do the p-values of the LRT using REML and ML compare?
fitb1 = lme(PEconc ~1,data=dog.dat,random=~1| Anesthesia,method="ML")
fitb2 = gls(PEconc ~1,data=dog.dat, method="ML")
anova(fitb1,fitb2)
## Model df AIC BIC logLik Test L.Ratio p-value
## fitb1 1 3 26.15 30.35 -10.07
## fitb2 2 2 27.19 30.00 -11.60 1 vs 2 3.044 0.081
The p-value is no longer significant (p-value=0.081 > 0.05), therefore we fail to reject the null hpothesis, unlike the previous question. REML and ML result in different results.
PROBLEM 2 Return to Nonlinear Models: Create 100 observations where the underlying signal is a sine function with amplitude of 4 and a horizontal phase shift of π. Noise is added in the form of normal (Gaussian) random numbers with mean equal to zero and standard deviation equal to 0.5.
x = seq(0,2*pi,length=100) #sequence from 0 to 2pi, 100 terms
set.seed(1)
e = rnorm(100,0,.5)
y = 4*sin(x-pi)+e
plot(x,y,main=expression('y'[i]*' = 4sin(x'[i]*'-pi)+e'[i]))
Use non linear least squares to create a model of the data.
Start with the approximation: amp=1, horshft=0
y1=nls(y~amp*sin(x-horshft),start=list(amp=1, horshft=0))
summary(y1)
##
## Formula: y ~ amp * sin(x - horshft)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## amp -4.01269 0.06461 -62.1 <2e-16 ***
## horshft -0.00311 0.01594 -0.2 0.85
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.455 on 98 degrees of freedom
##
## Number of iterations to convergence: 3
## Achieved convergence tolerance: 4.69e-07
The model converges with an amp=-4.012691 and horshft=-0.003115 and Residual Standard Errorr=.4545
Repeate the process with approximation: amp=2, horshft=2
y2=y1=nls(y~amp*sin(x-horshft),start=list(amp=2, horshft=2))
summary(y2)
##
## Formula: y ~ amp * sin(x - horshft)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## amp 4.0127 0.0646 62.1 <2e-16 ***
## horshft 3.1385 0.0159 196.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.455 on 98 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 6.99e-07
The model converges with an amp=4.0169, horshft=3.13848 and Residual Standard Error=.4545
Show that if you start with initial estimates of amp and horshft closer to their true value, nls gives the estimates you want.For illustration, a print out of the summary of the fit is acceptable. What properties of the sine function might cause this behavior?
We notice that the amplitude and horizontal shifts of both NLS models are different and yet the Residual Standard Errors are the same. The reason this occurs is the sine function is periodic. As long as the period is the same you can add 2*PI to any initial guess and it would yield the same result. The same concept occurs with amplitude except the sign would be opposite and the other parameters would have to be adjusted accordingly. We also plot the two results below and notice that they are exactly the same model except the parameters are different, proving what we expect would happen with the sine function.
par(mfrow=c(2,1))
plot(x,y,main=expression('y'[i]*' = 4sin(x'[i]*'-pi)+e'[i]))
lines(x,predict(y1),col="red")
legend(4,0,"Prediction 1",col="red",pch=NA,lty=1,cex=.5)
plot(x,y,main=expression('y'[i]*' = 4sin(x'[i]*'-pi)+e'[i]))
lines(x,predict(y2),col="green")
legend(4,0,"Prediction 2 (correct)",col="green",pch=NA,lty=1,cex=.5)