Problem

Select a time series data and obtain the following:

  1. Mean and Variance

  2. AVCF,ACF and PACF

  3. Plot ACF and PACF

Theory

Autocorrelation function (ACF)

Autocorrelation is the correlation of a variable with itself at differing time lags. Recall from lecture that we defined the sample autocovariance function (ACVF), \(c_k\), for some lag \(k\) as

\[\begin{equation} (\#eq:ACVF) c_k = \frac{1}{n}\sum_{t=1}^{n-k} \left(x_t-\bar{x}\right) \left(x_{t+k}-\bar{x}\right) \end{equation}\]

Note that the sample autocovariance of \(\{x_t\}\) at lag 0, \(c_0\), equals the sample variance of \(\{x_t\}\) calculated with a denominator of \(n\). The sample autocorrelation function (ACF) is defined as

\[\begin{equation} (\#eq:ACF) r_k = \frac{c_k}{c_0} = \text{Cor}(x_t,x_{t+k}) \end{equation}\]

Recall also that an approximate 95% confidence interval on the ACF can be estimated by

\[\begin{equation} (\#eq:ACF95CI) -\frac{1}{n} \pm \frac{2}{\sqrt{n}} \end{equation}\]

where \(n\) is the number of data points used in the calculation of the ACF.

Partial autocorrelation function (PACF)

The partial autocorrelation function (PACF) measures the linear correlation of a series \(\{x_t\}\) and a lagged version of itself \(\{x_{t+k}\}\) with the linear dependence of \(\{x_{t-1},x_{t-2},\dots,x_{t-(k-1)}\}\) removed. Recall from lecture that we define the PACF as

\[\begin{equation} (\#eq:PACFdefn) f_k = \begin{cases} \text{Cor}(x_1,x_0)=r_1 & \text{if } k = 1;\\ \text{Cor}(x_k-x_k^{k-1},x_0-x_0^{k-1}) & \text{if } k \geq 2; \end{cases} \end{equation}\]

with

It’s easy to compute the PACF for a variable in R using the pacf() function, which will automatically plot a correlogram when called by itself (similar to acf()).

White noise (WN)

A time series \(\{w_t\}\) is a discrete white noise series (DWN) if the \(w_1, w_1, \dots, w_t\) are independent and identically distributed (IID) with a mean of zero. For most of the examples in this course we will assume that the \(w_t \sim \text{N}(0,q)\), and therefore we refer to the time series \(\{w_t\}\) as Gaussian white noise. If our time series model has done an adequate job of removing all of the serial autocorrelation in the time series with trends, seasonal effects, etc., then the model residuals (\(e_t = y_t - \hat{y}_t\)) will be a WN sequence with the following properties for its mean (\(\bar{e}\)), covariance (\(c_k\)), and autocorrelation (\(r_k\)):

\[\begin{equation} (\#eq:WNprops) \begin{aligned} \bar{x} &= 0 \\ c_k &= \text{Cov}(e_t,e_{t+k}) = \begin{cases} q & \text{if } k = 0 \\ 0 & \text{if } k \neq 1 \end{cases} \\ r_k &= \text{Cor}(e_t,e_{t+k}) = \begin{cases} 1 & \text{if } k = 0 \\ 0 & \text{if } k \neq 1. \end{cases} \end{aligned} \end{equation}\]

Data Description

Recruitment (index of the number of new fish) for a period of 453 months ranging over the years 1950-1987. Recruitment is loosely defined as an indicator of new members of a population to the first life stage at which natural mortality stabilizes near adult levels.

Source

Data furnished by Dr. Roy Mendelssohn of the Pacific Fisheries Environmental Laboratory, 
NOAA (personal communication).

R Code

library(astsa)
library(psych)
data(rec)
summary(rec)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.72   39.62   68.63   62.26   86.85  100.00
#psych::describe(data, is.na(TRUE))
data<-ts(rec)
head(data, n=100)
##   [1] 68.63000 68.63000 68.63000 68.63000 68.63000 68.63000 59.16000 48.70000
##   [9] 47.54000 50.91000 44.70000 42.85000 39.62000 44.45000 38.98000 42.62000
##  [17] 48.27000 59.39000 51.66000 38.55000 60.33000 72.27000 68.62000 69.63000
##  [25] 72.20000 67.87000 64.91001 53.85000 37.96000 23.23000 12.68000  9.84000
##  [33]  7.82000 11.78000 10.22000 12.19000 18.60000 26.97000 22.52000 19.18000
##  [41] 17.14000 18.61000 20.02000 22.65000 38.99000 76.55000 87.99000 99.80000
##  [49] 96.69000 87.45000 88.57000 97.43000 99.99000 94.88000 86.99000 79.73001
##  [57] 92.35000 91.29000 94.31000 84.95000 82.97000 92.98001 81.06000 62.37000
##  [65] 52.99000 39.53000 42.90000 33.76000 40.97000 60.50000 66.61000 80.38000
##  [73] 95.86000 97.74000 80.24000 73.44000 65.67000 47.81000 33.51000 34.22000
##  [81] 32.95000 32.55000 46.92000 44.64000 53.02000 41.98000 30.43000 24.43000
##  [89] 18.05000 20.98000 12.37000 12.03000 12.41000 15.89000 20.46000 26.95000
##  [97] 30.29000 26.21000 23.34000 25.55000
plot(rec, col='red')

mean_ts<-mean(data)
mean_ts
## [1] 62.26278
variance_ts<-var(data)
variance_ts
## [1] 782.7188
auto_correlation<-acf(data,plot= TRUE,type = 'correlation',main='ACF Plot',col='green')

auto_correlation
## 
## Autocorrelations of series 'data', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000  0.922  0.783  0.627  0.477  0.355  0.259  0.182  0.127  0.094  0.074 
##     11     12     13     14     15     16     17     18     19     20     21 
##  0.057  0.024 -0.037 -0.116 -0.188 -0.240 -0.267 -0.268 -0.241 -0.185 -0.110 
##     22     23     24     25     26 
## -0.033  0.030  0.064  0.057  0.021
acvf<-acf(data,plot= TRUE,type = 'covariance',main='AVCF Plot',col='green')

acvf
## 
## Autocovariances of series 'data', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  781.0  719.9  611.5  489.7  372.8  277.6  202.5  142.5   99.1   73.1   57.9 
##     11     12     13     14     15     16     17     18     19     20     21 
##   44.6   18.7  -29.0  -90.4 -146.6 -187.4 -208.4 -209.1 -188.4 -144.5  -85.8 
##     22     23     24     25     26 
##  -26.1   23.7   50.2   44.2   16.4
pacf<-pacf(data,plot=TRUE,main="PACF Plot",col='green')

pacf
## 
## Partial autocorrelations of series 'data', by lag
## 
##      1      2      3      4      5      6      7      8      9     10     11 
##  0.922 -0.445 -0.048 -0.016  0.073 -0.029 -0.031  0.036  0.048 -0.018 -0.055 
##     12     13     14     15     16     17     18     19     20     21     22 
## -0.140 -0.149 -0.054  0.052  0.010  0.006  0.024  0.087  0.109  0.029 -0.027 
##     23     24     25     26 
## -0.008 -0.068 -0.120 -0.030