Question 1

I use R to compute \(\hat{\phi}_{\text{YW}}\) and \(\hat{\phi}_{\text{MLE}}\) in an AR(1) process, and use a series of \(\phi\) to generate the estimated values \(\hat{\phi}\) across different sample sizes \(n\), where \[ \hat\phi_{\text{YW}} =\frac{\sum_{t=1}^{n-1} X_t X_{t+1}}{\sum_{t=1}^{n} X_t^2}, \qquad \hat\phi_{\text{MLE}} =\frac{\sum_{t=2}^{n} X_t X_{t-1}}{\sum_{t=2}^{n} X_{t-1}^2}. \]

ar1_sim <- function(Phi, n, seed){
  set.seed(seed)
  Ns <- as.integer(n)
  out <- lapply(Ns, function(nn){
    phi_yw_hat <-  1:length(Phi)
    phi_mle_hat <- 1:length(Phi)
    phi <- Phi
    for(i in 1:length(Phi)){
      x <- arima.sim(list(ar=Phi[i]), nn)
      phi_yw_hat[i] <- sum(x[1:(nn-1)]*x[2:nn]) / sum(x**2)
      phi_mle_hat[i] <- sum(x[1:(nn-1)]*x[2:nn]) / sum(x[1:(nn-1)]**2)
    }
    cbind("phi"=phi, "yw"=phi_yw_hat, "mle"=phi_mle_hat, "n"=nn)
  })
  phi_hat <- as.data.frame(do.call(rbind, out))
  ggplot(phi_hat, aes(x=phi)) +
    geom_line(aes(y=mle, color="mle"))+
    geom_line(aes(y=yw, color="yw"))+
    scale_color_manual(values = c("mle"="#1170AA", "yw"="#d1495b"),
                         name = "Estimator") +
    labs(title = expression(hat(phi)~" vs true " * phi * " across sample sizes"),
         subtitle = "Panels show different sample sizes n",
         x = expression(true~phi),
         y = expression(hat(phi)))+
    facet_wrap(~ n, scales = "free_y")
}
n <- c(3:7,10,100,1000,10000)
Phi <- seq(0.0001, 0.9999, by = 0.0001)
seed <- 1234
ar1_sim(Phi,n,seed)

It is clear that the Yule-Walker estimator of the autoregressive coefficient in the AR(1) model is guaranteed never to exceed 1. However, the maximum likelihood estimator exceed 1 in some cases, especially when n is small

Question 2

2.(a)

AIC for a Gaussian AR(\(p\)) under the conditional likelihood

Assume a zero-mean AR(\(p\)) process \[ X_t=\phi_1 X_{t-1}+\cdots+\phi_p X_{t-p}+\epsilon_t,\qquad \epsilon_t\stackrel{i.i.d.}{\sim}\mathcal N(0,\sigma^2). \] Conditioning on \(X_1,\ldots,X_p\), the Gaussian likelihood of \(X_{p+1},\ldots,X_n\) is (based on the lecture note) \[ L(\phi_p,\sigma^2) =\left(\frac{1}{\sqrt{2\pi\sigma}}\right)^{\,n-p} \prod_{i=p+1}^{n} \exp\!\left(-\,\frac{(X_i-\phi_1X_{i-1}-\cdots-\phi_pX_{i-p})^2}{2\sigma^2}\right). \] Thus, the log-likelihood function is \[ l(\phi_p,\sigma^2) =\frac{-(n-p)}2\log(2\pi\sigma^2) -\frac{1}{2\sigma^2}\sum_{i=p+1}^{n}(X_i-\phi_1X_{i-1}-\cdots-\phi_pX_{i-p})^2. \] and the MLE estimators are \[ (\hat\sigma^{\text{MLE}})^2 =\frac{1}{\,n-p\,}\sum_{i=p+1}^{n}\bigl(X_i-\hat\phi^{\text{MLE}}_1X_{i-1}-\cdots-\hat\phi^{\text{MLE}}_pX_{i-p}\bigr)^2, \quad \hat{\boldsymbol\phi}^{\text{MLE}}_p =\arg\min_{\boldsymbol\phi_p}\sum_{i=p+1}^{n}\bigl(X_i-\phi_1X_{i-1}-\cdots-\phi_pX_{i-p}\bigr)^2. \] In this case, \(\hat{\boldsymbol\phi}\) is computed in R by qr.solve(x, y), and the residuals \(e\) yield \(\hat\sigma^2=\text{mean}(e^2)\).

Substituting \((\hat{\boldsymbol\phi},\hat\sigma^2)\) into the log-likelihood, the AIC is \[ \mathrm{AIC} =2k-2\log L(\hat{\boldsymbol\phi},\hat\sigma^2), \qquad k=p+1\ \ (\text{\(p\) AR coefficients plus } \sigma^2). \]

ar_aic <- function(x,p){
  M <- embed(x, p+1)
  x <- M[,-1, drop=FALSE]
  y <- M[,1]
  phi_hat <- qr.solve(x,y)
  e <- y-x%*%phi_hat
  n_e <- length(e)
  sigma2_hat <- mean(e**2)
  loglikelihood <- -0.5*n_e*log(2*pi*sigma2_hat)-0.5*sum(e**2)/sigma2_hat
  k <- p+1
  aic <- -2*loglikelihood+2*k
  aic
}
ar3 <- arima.sim(list(ar=c(0.1, 0.5, 0.2)), n=10000)
ar_aic(ar3, 7)
## [1] 28384.6

2.(b)

I simulate an AR(3) process of length \(n=\) 1,000 with coefficients \(\phi=(0.1,\,0.5,\,0.2)\) in R by arima.sim(), and using the function in 2(a) to calculate AIC for orders 1,…,P, where p = 10 in this case.

The AR(3) process is \[ X_t \;=\; 0.1\,X_{t-1} + 0.5\,X_{t-2} + 0.2\,X_{t-3} + \epsilon_t,\qquad \epsilon_t \sim \mathcal N(0,\sigma^2). \]

arp_aic <- function(data,p){
  p_aic <- 1:p
  for(i in 1:p){
    p_aic[i] <- ar_aic(data,i)
  }
  AIC_table <- cbind("p"=1:p, "AIC"=p_aic) |> as.data.frame()
  cat("AIC is smallest when p =",which.min(p_aic),"\nAIC =",p_aic[which.min(p_aic)],"\n")
  return(AIC_table)
  # Based on section from “Time Series: Theory and Methods” by Brockwell and Davis
  # models with AIC-min(AIC) <= 2 should also be considered competitive
  # if(length(which(p_aic-min(p_aic)<=2 & p_aic != min(p_aic)) > 0))
  # cat("AR(",which(p_aic-min(p_aic)<=2 & p_aic != min(p_aic)),") should also be considered competitive\n")
}

ar3 <- arima.sim(list(ar=c(0.1, 0.5, 0.2)), n=1000)
arp_aic(ar3,10)
## AIC is smallest when p = 10 
## AIC = 2789.21

2(c)

By rewriting the function in 2(a) and combine it with the following aic_choose(data, p)function, the output shows the maximum order P, the order at which the AIC is minimized, and some models should be considered based on the contents in section 9.3 from Brockwell & Davis. (models with AIC values within c of the minimum value should be consider competitive, with c = 2 as a typical value)

arp_aic <- function(data,p){
  p_aic <- 1:p
  for(i in 1:p){
    p_aic[i] <- ar_aic(data,i)
  }
  #AIC_table <- cbind("p"=1:p, "AIC"=p_aic) |> as.data.frame()
  cat("AIC is smallest when p =",which.min(p_aic),"\nAIC =",p_aic[which.min(p_aic)],"\n")
  #return(AIC_table)
  # Based on section from “Time Series: Theory and Methods” by Brockwell and Davis
  # models with AIC-min(AIC) <= 2 should also be considered competitive
  if(length(which(p_aic-min(p_aic)<=2 & p_aic != min(p_aic)) > 0))
    cat("AR(",which(p_aic-min(p_aic)<=2 & p_aic != min(p_aic)),") should also be considered competitive\n")
}

aic_choose <- function(data,p){
  for (i in 1:p) {
    cat(sprintf("max order = %d\n", i))
    tbl <- arp_aic(data, i)
    cat("\n")
  }
}

aic_choose(ar3, 20)
## max order = 1
## AIC is smallest when p = 1 
## AIC = 3155.648 
## 
## max order = 2
## AIC is smallest when p = 2 
## AIC = 2853.55 
## 
## max order = 3
## AIC is smallest when p = 3 
## AIC = 2797.089 
## 
## max order = 4
## AIC is smallest when p = 4 
## AIC = 2796.21 
## AR( 3 ) should also be considered competitive
## 
## max order = 5
## AIC is smallest when p = 5 
## AIC = 2794.345 
## AR( 4 ) should also be considered competitive
## 
## max order = 6
## AIC is smallest when p = 6 
## AIC = 2793.473 
## AR( 5 ) should also be considered competitive
## 
## max order = 7
## AIC is smallest when p = 7 
## AIC = 2792.994 
## AR( 5 6 ) should also be considered competitive
## 
## max order = 8
## AIC is smallest when p = 8 
## AIC = 2791.694 
## AR( 6 7 ) should also be considered competitive
## 
## max order = 9
## AIC is smallest when p = 9 
## AIC = 2790.821 
## AR( 8 ) should also be considered competitive
## 
## max order = 10
## AIC is smallest when p = 10 
## AIC = 2789.21 
## AR( 9 ) should also be considered competitive
## 
## max order = 11
## AIC is smallest when p = 11 
## AIC = 2785.359 
## 
## max order = 12
## AIC is smallest when p = 12 
## AIC = 2785.205 
## AR( 11 ) should also be considered competitive
## 
## max order = 13
## AIC is smallest when p = 13 
## AIC = 2784.454 
## AR( 11 12 ) should also be considered competitive
## 
## max order = 14
## AIC is smallest when p = 14 
## AIC = 2784.126 
## AR( 11 12 13 ) should also be considered competitive
## 
## max order = 15
## AIC is smallest when p = 15 
## AIC = 2783.893 
## AR( 11 12 13 14 ) should also be considered competitive
## 
## max order = 16
## AIC is smallest when p = 16 
## AIC = 2783.157 
## AR( 13 14 15 ) should also be considered competitive
## 
## max order = 17
## AIC is smallest when p = 17 
## AIC = 2782.852 
## AR( 13 14 15 16 ) should also be considered competitive
## 
## max order = 18
## AIC is smallest when p = 18 
## AIC = 2781.793 
## AR( 16 17 ) should also be considered competitive
## 
## max order = 19
## AIC is smallest when p = 19 
## AIC = 2777.92 
## 
## max order = 20
## AIC is smallest when p = 19 
## AIC = 2777.92 
## AR( 20 ) should also be considered competitive

We can see that AIC tends to overestimate p. As discussed in the lecture, this tendency likely results from an insufficient penalty when the number of parameters is large.

Question 3

3.(a)

According to Brockwell & Daivs (section 9.3), AIC is asymptotically efficient for autoregressive models, but it is not a consistent order selector. AIC tend to overestimate p since its penalty term is relatively small,which align with the conclusion in 2(c)

3.(b)

AIC-based selection is more likely to choose a larger order p to achieve a smaller prediction error.

In contrast, BIC-based selection is more likely to choose the true order p, even if the error is larger.