## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.1
## Loading required package: leaps
## Loading required package: locfit
## locfit 1.5-9.1    2013-03-22
## Loading required package: mgcv
## Loading required package: nlme
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:forecast':
## 
##     getResponse
## This is mgcv 1.8-12. For overview type 'help("mgcv-package")'.
## Loading required package: tseries
## 
## Attaching package: 'TSA'
## The following objects are masked from 'package:forecast':
## 
##     fitted.Arima, plot.Arima
## The following objects are masked from 'package:timeDate':
## 
##     kurtosis, skewness
## The following objects are masked from 'package:stats':
## 
##     acf, arima
## The following object is masked from 'package:utils':
## 
##     tar
## Loading required package: lattice
## Loading required package: ltsa
## Loading required package: bestglm
## 
## Attaching package: 'FitAR'
## The following object is masked from 'package:forecast':
## 
##     BoxCox

Model

Fit the following transfer function noise model \[ co2_t =\alpha + \sum^s_{j} v_j \cdot gas_{t-j-b} + e_t, (1)\] where \(e_t\) is serially correlated.

1. Use the ideas of prewhitening taught in class to identify b and s.

Before doing any modelling, we start by cleaning up our variables.

First note that y does not have a mean of 0. We can simply subtract its mean from each entry of y to obtain a 0-mean process for further analysis.

Also, note that X and e are both serially correlated, which presents problems for modelling. We use AR to model X as hinted and finds that the program chooses an AR(4) process. In theory by applying this process to X we can reduce it to white noise.

y1 <- y - mean(y)
mod <- ar(x, order = 5)
mod
## 
## Call:
## ar(x = x, order.max = 5)
## 
## Coefficients:
##       1        2        3        4  
##  1.9290  -1.1997   0.1002   0.1212  
## 
## Order selected 4  sigma^2 estimated as  0.03579
alpha <- residuals(mod)

Before proceeding further we check the adequecy of this model by verifying whether it is in fact white noise.

par(mfrow=c(3,1), cex=0.5)
acf(na.omit(as.ts(alpha)), lwd=5, col="red", main=NA)
pacf(na.omit(as.ts(alpha)), lwd=5, col="red", main=NA)
LBQPlot(na.omit(alpha),lag.max = 10)
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple
## of shorter object length

The ACF and PACF are generally in line and the LB test looks good as well. Thus we proceed having model x as this AR(4) process:

\[ Phi(B)X_t = a_t \]

where \[Phi(B) = (1-1.929B+1.1997B^2-0.1002B^3-0.1212B^4)\]

Applying this process to the model equation we now have:

\[ Phi(B) y_t = \sum^s_{j} v_j \cdot Phi(B) x_{t-j-b} + Phi(B)e_t \]

\[ beta_t = \sum^s_{j} v_j \cdot a_{t-j-b} + n_t \]

where \[beta_t = Phi(B) y_t\] and \[ n_t = Phi(B) e_t\]

Since our regressors are no longer serially correlated, we now proceed to identify b and s in our model by checking which ccf are significant between a and beta. First we obtain beta by filtering our y using the process identified above.

beta <- filter(y1,c(1,-1.929,1.1997,-0.1002,-0.1212), sides = 1)

ccf( beta,na.omit(alpha), main = "ccf of prewhitened series")

From this graph we can clearly see that significant correlation between the two variables start at lag 3 and continues for 5 periods. Thus we have identified b = 3 and s = 5.

Also, this process could have been done simply using the builtin ‘prewhiten function’, and reversing the time order. We obtain the same result.

prewhiten(x,y)

2. Fit Eqn. (1) based on your preliminary identification and R arima function

From the previous part we have identified the model as

\[ co2_t = c + v_1 gas_{t-3} + v_2 gas_{t-4} + v_3 gas_{t-5} + v_4 gas_{t-6}+ v_5 gas_{t-7} + n_t \]

where \(c = mean(co2) = 53.509\) and \(v_1, ...v_5\) can be estimated using

\[ beta_t = \sum^5_{j} v_j \cdot a_{t-j-3} + n_t\]

since our regressors are now serially uncorrelated, we can proceed using an OLS regression. First we introduce series of lagged variables, using a customized function to deal with the missing values.

lagpad <- function(x, k) {
  if (!is.vector(x)) 
    stop
  if (!is.numeric(x)) 
    stop
  if (!is.numeric(k))
    stop
  if (1 != length(k))
    stop
  c(rep(NA, k), x)[1 : length(x)] 
}

a1 <-as.ts(lagpad(as.vector(alpha),1))
a2 <-as.ts(lagpad(as.vector(alpha),2))
a3 <-as.ts(lagpad(as.vector(alpha),3))
a4 <-as.ts(lagpad(as.vector(alpha),4))
a5 <-as.ts(lagpad(as.vector(alpha),5))
a6 <-as.ts(lagpad(as.vector(alpha),6))
a7 <-as.ts(lagpad(as.vector(alpha),7))

Now for the regression:

model <- (lm(beta~ a3 + a4 + a5 + a6 + a7+0 ))
summary(model)
## 
## Call:
## lm(formula = beta ~ a3 + a4 + a5 + a6 + a7 + 0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.93723 -0.13479 -0.00533  0.13876  1.51490 
## 
## Coefficients:
##    Estimate Std. Error t value Pr(>|t|)    
## a3 -0.53936    0.08128  -6.636 1.66e-10 ***
## a4 -0.63920    0.08064  -7.927 5.37e-14 ***
## a5 -0.86875    0.08043 -10.801  < 2e-16 ***
## a6 -0.48749    0.08059  -6.049 4.65e-09 ***
## a7 -0.33368    0.08120  -4.109 5.22e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2574 on 280 degrees of freedom
##   (11 observations deleted due to missingness)
## Multiple R-squared:  0.5009, Adjusted R-squared:  0.492 
## F-statistic:  56.2 on 5 and 280 DF,  p-value: < 2.2e-16

As expected, we have obtained statistically significant coefficients for each of the regressors. We have now constructed a model in the form of:

\[ CO2_t = 53.509 - 0.53936 gas_{t-3} - 0.63920 gas_{t-4}- 0.86875 gas_{t-5} - 0.48749 gas_{t-6} - 0.33368 gas_{t-7} + e_t \]

Our next step is to model the serially correlated residuals using an arima process. We obtain our residuals and using the auto.arima function to obtain the appropriate model.

x3 <-as.ts(lagpad(as.vector(x),3))
x4 <-as.ts(lagpad(as.vector(x),4))
x5 <-as.ts(lagpad(as.vector(x),5))
x6 <-as.ts(lagpad(as.vector(x),6))
x7 <-as.ts(lagpad(as.vector(x),7))

y.pred <- mean(y)-0.53936*x3 - 0.68920*x4 - 0.86875*x5 - 0.48749*x6 - 0.33368*x7

re <- y - y.pred

mod1 <- auto.arima(re)

3. Checking model adequacy of your fitted model.

We have two things to verify with regards to our models: whether the predicted value for CO2 generatted by the model fits closely to the actual values, and whether the model for the residuals is suitable.

In terms of the prediction power of the model:

ts.plot(y,y.pred, main = "predicted vs actual CO2")

We can see that the actual and predicted values move quite closely together, which generally indicates that we have a good model.

Lastly for the residuals, we need to check whether our ARIMA(4,1,0) is suitable. We do so by checking whether the residuals of this model (residuals of residuals) is in fact white noise.

par(mfrow=c(3,1), cex=0.5)
acf(na.omit(mod1$resid), lwd=1, col="red", main=NA)
pacf(na.omit(mod1$resid), lwd=1, col="red", main=NA)
LBQPlot(na.omit(mod1$resid),lag.max = 25)
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple
## of shorter object length

We can see that this is acceptable as a white noise series, and thus our work is complete and we conclude that our model is adequate.

\[ CO2_t = 53.509 - 0.53936 gas_{t-3} - 0.63920 gas_{t-4}- 0.86875 gas_{t-5} - 0.48749 gas_{t-6} - 0.33368 gas_{t-7} + e_t \]

\[ e_t = ARIMA(4,1,0) \]