D. Write a program to compute the p-value of the Durbin-Watson test in Section 2.11.2 by simulation, and use it to repeat the computation in examples (Mount Airy-Charleston and Amherst) on p.94-95 of the S&Y textbook.
First do linear regression on the two data sets:
rm(list=ls(all=TRUE)) #reset environment
setwd('~/Dropbox/Statistics_study/STOR_664/Homework/HW1')
amh <- read.table('amherst.dat', header = T)
charles <- read.table('charles.dat', header = F)
names(charles) = c('year', 'airy', 'char')
Define a function to calculate D, given x, y
calculateD <- function(x,y) #input as the result of lm
{
datalm <- lm(y~x)
e <- datalm$residual #residual
n <- length(y) #number of observations
e_sq <- e^2
sum_e_sq <- sum(e_sq) #the denominator of D
D <- 0
for (i in 2:n)
{D <- D + (e[i]-e[i-1])^2} #calculate the nominator part of D
D <- D/sum_e_sq
names(D) <- NULL
return(D)
}
Running the simulation
corr.test <- function(y, x, nsim=10000)
{
n <- length(y)
d1 <- calculateD(x, y)
count <- 0
for (i in 1:nsim)
{
tempy <- rnorm(n)
d2 <- calculateD(x, tempy)
if (d2<d1) count = count + 1
}
return(count/nsim)
}
amh_p <- corr.test(amh$temp,amh$year)
amh_p
## [1] 0.0065
char_p <- corr.test(charles$char,charles$airy)
char_p
## [1] 0.4211
For Amherest data, p-value is 0.0065, which is highly significant –strong evidence for correlation among residuals.
For Mount Airy Charleston, p-value is 0.4211, which is not significant – no evidence for correlation among residuals.