ACF and PACF with different values of ϕ.

The code below will simulate four different AR(1) time series with different values of ϕ. All of the time series have the exact same residuals (ϵ).

First order time series AR(1) Model:

\[y_t = \phi y_{t-1} + \epsilon_t\]

n <- 100
epsilon <- rnorm(n = n,mean = 0,sd = 1) # epsilon will contain 100 values drawn from the normal distribution that have a mean of 0 and standard deviation of 1.

# Create numberic vectors with length of n (n=100)
x1 <- numeric(length = n)
x2 <- numeric(length = n)
x3 <- numeric(length = n)
x4 <- numeric(length = n)

# Run loop function for each time series with different values of ϕ, but the same risiduals (ϵ).
for(i in 2:n){
  x1[i] <- 0.95 * x1[i-1] + epsilon[i]  
  x2[i] <- 0.75 * x2[i-1] + epsilon[i]  
  x3[i] <- 0.50 * x3[i-1] + epsilon[i]  
  x4[i] <- 0.25 * x4[i-1] + epsilon[i]  
}

Plots for each AR(1) time series

Plot 1 for:

\[y_t = 0.95 y_{t-1} + \epsilon_t\]

layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(x1,type="b",ylab="x1")
acf(x1,main="")
pacf(x1,main="")

The ACF plot indicates that there is a high positive autocorrelation at lag 1 that is very close to the phi value we set at 0.95. It also shows that there is autocorrelation for higher lags, which decreases with each lag until it becomes nonsignificant by the time it reaches lag 7 or 8. The Partial ACF plot indicates that once we take the AR(1) model into account, the correlation at subsequent lags falls to zero. This indicates this is an AR(1) process.

Plot 2 for:

\[y_t = 0.75 y_{t-1} + \epsilon_t\]

layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(x2,type="b",ylab="x1")
acf(x2,main="")
pacf(x2,main="")

The ACF plot indicates that there is positive autocorrelation (similar to 0.75, which we set for phi) and that there is some autocorrelation at higher lags, but it decays geometrically and falls to zero before lag 5. The Partial ACF plot indicates that this is an AR(1) process, because once we account for the AR(1) the remaining autocorrelation at higher lags falls to zero.

Plot 3 for:

\[y_t = 0.50 y_{t-1} + \epsilon_t\]

layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(x3,type="b",ylab="x1")
acf(x3,main="")
pacf(x3,main="")

The ACF plot indicates that there is positive autocorrelation (similar to 0.50, which we set for phi) and that there is autocorrelation at k=2, but it falls to zero for higher lags. The Partial ACF plot indicates that this is an AR(1) process, because once we accoutn for the AR(1) the remaining autocorrelation at higher lags falls to zero.

Plot 4 for:

\[y_t = 0.25 y_{t-1} + \epsilon_t\]

layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(x4,type="b",ylab="x1")
acf(x4,main="")
pacf(x4,main="")

The ACF plot indicates that there is not autocorrelation, because the autocorrelation falls to zero by the first lag.

Create second order model AR(2)

\[y_t = \phi_1y_{t-1} + \phi_2y_{t-2} + \epsilon_t\]

mean(\(\mu\))=0 and variance(\(\sigma\))=1

n <- 100
epsilon2 <- rnorm(n = n,mean = 0,sd = 1)

x5 <- numeric(length = n)

phi1 <- 0.05
phi2 <- 0.80

for(i in 3:n){
  x5[i] <- phi1 * x5[i-1] + phi2 * x5[i-2] + epsilon2[i]  
   
}

layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(x5,type="b",ylab="x1")
acf(x5,main="")
pacf(x5,main="")

Now, play with variance.

n <- 100
epsilon3 <- rnorm(n = n,mean = 0,sd = 5)

x6 <- numeric(length = n)

phi1 <- 0.05
phi2 <- 0.80

for(i in 3:n){
  x6[i] <- phi1 * x6[i-1] + phi2 * x6[i-2] + epsilon3[i]  
   
}

Now, play with values of \(\phi\).

n <- 100
epsilon1 <- rnorm(n = n,mean = 0,sd = 1)

x7 <- numeric(length = n)

for(i in 3:n){
  x7[i] <- 0.30 * x7[i-1] + 0.60 * x7[i-2] + epsilon1[i]  
   
}

EDA on temperature

Global Land and Ocean Temperature (degrees C) Anomaly Data from NASA GISS

This data is collected monthly (frequency = 12) from Jan 1, 1880 to Nov 1, 2016.

globe <- readRDS("giss.globe.rds")
summary(globe)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.7800 -0.2300 -0.0500  0.0231  0.2250  1.3300

plot(globe)

acf(globe, main="")

pacf(globe, main="")

## What are θ1 and θ2?
p <- pacf(globe, plot=FALSE)
str(p)

## List of 6
##  $ acf   : num [1:32, 1, 1] 0.9328 0.3239 0.1428 0.1334 0.0753 ...
##  $ type  : chr "partial"
##  $ n.used: int 1643
##  $ lag   : num [1:32, 1, 1] 0.0833 0.1667 0.25 0.3333 0.4167 ...
##  $ series: chr "globe"
##  $ snames: NULL
##  - attr(*, "class")= chr "acf"

p$acf[2:4,,1]

## [1] 0.3239385 0.1427720 0.1334480

The ACF plot indicates high first order autocorrelation and high autocorrelation the higher time lags (shown out monthly for 2 and 1/2 yeras). The PACF indicates that these data are positively autocorrelated for three months, but it falls to zero after these time lags. We can see that based on PACF results that \(\phi_1\) is 0.32 and \(\phi_1\) is 0.14.

Linear model of global data to plot residuals

globetime <- time(globe)
globe.ts <- lm(globe ~ globetime)
summary(globe.ts)

## 
## Call:
## lm(formula = globe ~ globetime)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57814 -0.14040 -0.00303  0.13824  0.82365 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.389e+01  2.364e-01  -58.75   <2e-16 ***
## globetime    7.142e-03  1.213e-04   58.86   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1944 on 1641 degrees of freedom
## Multiple R-squared:  0.6786, Adjusted R-squared:  0.6784 
## F-statistic:  3465 on 1 and 1641 DF,  p-value: < 2.2e-16

## The autocorrelation function and the partial autocorrelation funciton for the residuals of the global data.
acf(residuals.lm(globe.ts), main="")

pacf(residuals.lm(globe.ts), main="")

The Nile River Data - Mean Annual Nile Flow 1871 - 1970

# Read and attach the csv file
nile <- read.csv("nile.csv", header=TRUE)
attach(nile)
class(nile) #this indicates what format this dataset is in

## [1] "data.frame"

# Use the ts function to convert the file to a time series object
nile.ts <- ts(meanflow, start=c(1871, 1), freq=1)

# Check the frequency matches 1 datapoint per year for this dataset
tsp(nile.ts)

## [1] 1871 1971    1

# Plot the data to get a general visual of what it looks like
plot(nile.ts)

# Get the summary statistics to get a better sense of the dataset
summary(nile.ts)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   16.00   40.00   40.96   63.00   87.00

Autocorrelation of the Nile data

It is evident from the ACF and Partial ACF plots that the annual flow rates of the Nile are not autocorrelated. Perhaps if these data were sampled more frequently an autocorrelation would become more evident.

acf(nile.ts, main="")

pacf(nile.ts, main="")

Trend in time for the Nile River

From the plot, it’s pretty evident that there’s not a trend in time, but we can confirm this by looking at a linear model.

As predicted, the linear model indicates that there’s no significant trend of the annual flow of the Nile over this timeframe.

niletime <- time(nile.ts)
nile.ts <- lm(nile.ts ~ niletime)
summary(nile.ts)

## 
## Call:
## lm(formula = nile.ts ~ niletime)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.183 -20.538  -1.518  21.207  52.083 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -242.18851  168.63958  -1.436   0.1541  
## niletime       0.14740    0.08778   1.679   0.0963 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.72 on 99 degrees of freedom
## Multiple R-squared:  0.02769,    Adjusted R-squared:  0.01787 
## F-statistic:  2.82 on 1 and 99 DF,  p-value: 0.09626

Clearwater River at Kamiah, Idaho

Monthly flow data from 1911 - 1965.

clear.c <- read.csv("clearwater.csv", header=TRUE)
attach(clear.c)
class(clear.c)

## [1] "data.frame"

clear <- ts(riverflow, start=c(1911, 1), freq=12)
# Check the frequency matches 1 datapoint per year for this dataset
tsp(clear)

## [1] 1911.000 1961.083   12.000

plot(clear)

summary(clear)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0   147.2   291.5   290.7   432.8   578.0

Decompose Clearwater River Data

clear.d <- decompose(clear)
plot(clear.d)

Autocorrelation of Clearwater River Data

It is evident from the ACF and Partial ACF plots that these data are not autocorrelated.

layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(clear,type="b")
acf(clear,main="")
pacf(clear,main="")

Trend in time of Clearwater River Data

The linear model indicates that there is not a significant trend in the mean flow of Clearwater River during this timeframe (p-value = 0.34).

cleartime <- time(clear)
clear.lm <- lm(clear ~ cleartime)
summary(clear.lm)

## 
## Call:
## lm(formula = clear ~ cleartime)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -294.474 -142.870   -0.946  139.513  296.221 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1164.3014   907.3689   1.283    0.200
## cleartime     -0.4512     0.4687  -0.963    0.336
## 
## Residual standard error: 166.5 on 600 degrees of freedom
## Multiple R-squared:  0.001542,   Adjusted R-squared:  -0.0001216 
## F-statistic: 0.9269 on 1 and 600 DF,  p-value: 0.3361

Australia’s Electricity Production from 1956 - 1995

This is Australia’s monthly electricity production in million kilowatt hours from Jan 1956 to Aug 1995. Source: Time Series Data Library (citing: Australian Bureau of Statistics)

# Read and attach the csv file
elect.c <- read.csv("austelect.csv", header=TRUE)
attach(elect.c)

# Use the ts function to convert the file to a time series object. This data was collected monthly, so we have a frequency of 12.
elect <- ts(kwh, start=c(1956, 1), freq=12)

# Check the frequency matches 1 datapoint per year for this dataset
tsp(elect)

## [1] 1956.000 1995.583   12.000

# Plot the data to get a general visual of what it looks like
plot(elect)

# Get the summary statistics to get a better sense of the dataset
summary(elect)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1254    3147    6380    6903   10220   15360

Decompose electricity data

These data indicate a clear trend and seasonal component and noise from random risiduals.

elect.d <- decompose(elect)
plot(elect.d)

Linear model and moving averages

The one year average removes the seasonal variation and shows the general trend in the data.

# Linear model for data
electtime <- time(elect)
elect.lm <- lm(elect ~ electtime)
summary(elect.lm)

## 
## Call:
## lm(formula = elect ~ electtime)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1456.34  -591.51   -54.18   459.20  2312.57 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.770e+05  5.846e+03  -115.8   <2e-16 ***
## electtime    3.462e+02  2.959e+00   117.0   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 739.2 on 474 degrees of freedom
## Multiple R-squared:  0.9665, Adjusted R-squared:  0.9665 
## F-statistic: 1.369e+04 on 1 and 474 DF,  p-value: < 2.2e-16

# moving average of one year (ma01)
ma01 <- filter(x=elect, filter=rep(x=1/12,times=12), sides=2)
plot(elect,col="grey", ylab="Kilowatt hours")
lines(ma01,col="red",lwd=2)
# Add a trend line from our linear model
abline(elect.lm, col="black",lwd=2, lty="dashed")

Autocorrelation analysis for electricity data

acf(elect, main="")

pacf(elect, main="")

The ARF plot indicates that these data are autocorrelated. The PACF plot indicates that the data are posivitely autocorrelated every 6 months and negatively correlated ever 1 year, which is most likely evidence of a spike in electricity usage during the colder months of the year.

Homework 2 - Autocorrelation

Brooke McIntyre

4/9/2017