D. Write a program to compute the p-value of the Durbin-Watson test in Section 2.11.2 by simulation, and use it to repeat the computation in examples (Mount Airy-Charleston and Amherst) on p.94-95 of the S&Y textbook.

First do linear regression on the two data sets:

rm(list=ls(all=TRUE))  #reset environment 
setwd('~/Dropbox/Statistics_study/STOR_664/Homework/HW1')
amh <- read.table('amherst.dat', header = T)

charles <- read.table('charles.dat', header = F)
names(charles) = c('year', 'airy', 'char')

Define a function to calculate D, given x, y

calculateD <- function(x,y)  #input as the result of lm
  {
    datalm <- lm(y~x)
    e <- datalm$residual   #residual
    n <- length(y)   #number of observations
    e_sq <- e^2  
    sum_e_sq <- sum(e_sq) #the denominator of D
    D <- 0
    for (i in 2:n) 
       {D <- D + (e[i]-e[i-1])^2}  #calculate the nominator part of D
    D <- D/sum_e_sq
    names(D) <- NULL
    return(D)
  }

Running the simulation

corr.test <- function(y, x, nsim=10000)
  {
    n <- length(y)
    d1 <- calculateD(x, y)
    count <- 0
    for (i in 1:nsim)
      {
        tempy <- rnorm(n)
        d2 <- calculateD(x, tempy)
        if (d2<d1) count = count + 1 
        
      }
    return(count/nsim)
  }

amh_p <- corr.test(amh$temp,amh$year)
amh_p
## [1] 0.0065
char_p <- corr.test(charles$char,charles$airy)
char_p
## [1] 0.4211

For Amherest data, p-value is 0.0065, which is highly significant –strong evidence for correlation among residuals.

For Mount Airy Charleston, p-value is 0.4211, which is not significant – no evidence for correlation among residuals.