The code below will simulate four different AR(1) time series with different values of ϕ. All of the time series have the exact same residuals (ϵ).
First order time series AR(1) Model:
\[y_t = \phi y_{t-1} + \epsilon_t\]
n <- 100
epsilon <- rnorm(n = n,mean = 0,sd = 1) # epsilon will contain 100 values drawn from the normal distribution that have a mean of 0 and standard deviation of 1.
# Create numberic vectors with length of n (n=100)
x1 <- numeric(length = n)
x2 <- numeric(length = n)
x3 <- numeric(length = n)
x4 <- numeric(length = n)
# Run loop function for each time series with different values of ϕ, but the same risiduals (ϵ).
for(i in 2:n){
x1[i] <- 0.95 * x1[i-1] + epsilon[i]
x2[i] <- 0.75 * x2[i-1] + epsilon[i]
x3[i] <- 0.50 * x3[i-1] + epsilon[i]
x4[i] <- 0.25 * x4[i-1] + epsilon[i]
}
\[y_t = 0.95 y_{t-1} + \epsilon_t\]
layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(x1,type="b",ylab="x1")
acf(x1,main="")
pacf(x1,main="")
The ACF plot indicates that there is a high positive autocorrelation at lag 1 that is very close to the phi value we set at 0.95. It also shows that there is autocorrelation for higher lags, which decreases with each lag until it becomes nonsignificant by the time it reaches lag 7 or 8. The Partial ACF plot indicates that once we take the AR(1) model into account, the correlation at subsequent lags falls to zero. This indicates this is an AR(1) process.
\[y_t = 0.75 y_{t-1} + \epsilon_t\]
layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(x2,type="b",ylab="x1")
acf(x2,main="")
pacf(x2,main="")
The ACF plot indicates that there is positive autocorrelation (similar to 0.75, which we set for phi) and that there is some autocorrelation at higher lags, but it decays geometrically and falls to zero before lag 5. The Partial ACF plot indicates that this is an AR(1) process, because once we account for the AR(1) the remaining autocorrelation at higher lags falls to zero.
\[y_t = 0.50 y_{t-1} + \epsilon_t\]
layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(x3,type="b",ylab="x1")
acf(x3,main="")
pacf(x3,main="")
The ACF plot indicates that there is positive autocorrelation (similar to 0.50, which we set for phi) and that there is autocorrelation at k=2, but it falls to zero for higher lags. The Partial ACF plot indicates that this is an AR(1) process, because once we accoutn for the AR(1) the remaining autocorrelation at higher lags falls to zero.
\[y_t = 0.25 y_{t-1} + \epsilon_t\]
layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(x4,type="b",ylab="x1")
acf(x4,main="")
pacf(x4,main="")
The ACF plot indicates that there is not autocorrelation, because the autocorrelation falls to zero by the first lag.
\[y_t = \phi_1y_{t-1} + \phi_2y_{t-2} + \epsilon_t\]
mean(\(\mu\))=0 and variance(\(\sigma\))=1
n <- 100
epsilon2 <- rnorm(n = n,mean = 0,sd = 1)
x5 <- numeric(length = n)
phi1 <- 0.05
phi2 <- 0.80
for(i in 3:n){
x5[i] <- phi1 * x5[i-1] + phi2 * x5[i-2] + epsilon2[i]
}
layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(x5,type="b",ylab="x1")
acf(x5,main="")
pacf(x5,main="")
Now, play with variance.
n <- 100
epsilon3 <- rnorm(n = n,mean = 0,sd = 5)
x6 <- numeric(length = n)
phi1 <- 0.05
phi2 <- 0.80
for(i in 3:n){
x6[i] <- phi1 * x6[i-1] + phi2 * x6[i-2] + epsilon3[i]
}
Now, play with values of \(\phi\).
n <- 100
epsilon1 <- rnorm(n = n,mean = 0,sd = 1)
x7 <- numeric(length = n)
for(i in 3:n){
x7[i] <- 0.30 * x7[i-1] + 0.60 * x7[i-2] + epsilon1[i]
}
This data is collected monthly (frequency = 12) from Jan 1, 1880 to Nov 1, 2016.
globe <- readRDS("giss.globe.rds")
summary(globe)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.7800 -0.2300 -0.0500 0.0231 0.2250 1.3300
plot(globe)
acf(globe, main="")
pacf(globe, main="")
## What are θ1 and θ2?
p <- pacf(globe, plot=FALSE)
str(p)
## List of 6
## $ acf : num [1:32, 1, 1] 0.9328 0.3239 0.1428 0.1334 0.0753 ...
## $ type : chr "partial"
## $ n.used: int 1643
## $ lag : num [1:32, 1, 1] 0.0833 0.1667 0.25 0.3333 0.4167 ...
## $ series: chr "globe"
## $ snames: NULL
## - attr(*, "class")= chr "acf"
p$acf[2:4,,1]
## [1] 0.3239385 0.1427720 0.1334480
The ACF plot indicates high first order autocorrelation and high autocorrelation the higher time lags (shown out monthly for 2 and 1/2 yeras). The PACF indicates that these data are positively autocorrelated for three months, but it falls to zero after these time lags. We can see that based on PACF results that \(\phi_1\) is 0.32 and \(\phi_1\) is 0.14.
globetime <- time(globe)
globe.ts <- lm(globe ~ globetime)
summary(globe.ts)
##
## Call:
## lm(formula = globe ~ globetime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.57814 -0.14040 -0.00303 0.13824 0.82365
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.389e+01 2.364e-01 -58.75 <2e-16 ***
## globetime 7.142e-03 1.213e-04 58.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1944 on 1641 degrees of freedom
## Multiple R-squared: 0.6786, Adjusted R-squared: 0.6784
## F-statistic: 3465 on 1 and 1641 DF, p-value: < 2.2e-16
## The autocorrelation function and the partial autocorrelation funciton for the residuals of the global data.
acf(residuals.lm(globe.ts), main="")
pacf(residuals.lm(globe.ts), main="")
# Read and attach the csv file
nile <- read.csv("nile.csv", header=TRUE)
attach(nile)
class(nile) #this indicates what format this dataset is in
## [1] "data.frame"
# Use the ts function to convert the file to a time series object
nile.ts <- ts(meanflow, start=c(1871, 1), freq=1)
# Check the frequency matches 1 datapoint per year for this dataset
tsp(nile.ts)
## [1] 1871 1971 1
# Plot the data to get a general visual of what it looks like
plot(nile.ts)
# Get the summary statistics to get a better sense of the dataset
summary(nile.ts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 16.00 40.00 40.96 63.00 87.00
It is evident from the ACF and Partial ACF plots that the annual flow rates of the Nile are not autocorrelated. Perhaps if these data were sampled more frequently an autocorrelation would become more evident.
acf(nile.ts, main="")
pacf(nile.ts, main="")
From the plot, it’s pretty evident that there’s not a trend in time, but we can confirm this by looking at a linear model.
As predicted, the linear model indicates that there’s no significant trend of the annual flow of the Nile over this timeframe.
niletime <- time(nile.ts)
nile.ts <- lm(nile.ts ~ niletime)
summary(nile.ts)
##
## Call:
## lm(formula = nile.ts ~ niletime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -47.183 -20.538 -1.518 21.207 52.083
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -242.18851 168.63958 -1.436 0.1541
## niletime 0.14740 0.08778 1.679 0.0963 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.72 on 99 degrees of freedom
## Multiple R-squared: 0.02769, Adjusted R-squared: 0.01787
## F-statistic: 2.82 on 1 and 99 DF, p-value: 0.09626
Monthly flow data from 1911 - 1965.
clear.c <- read.csv("clearwater.csv", header=TRUE)
attach(clear.c)
class(clear.c)
## [1] "data.frame"
clear <- ts(riverflow, start=c(1911, 1), freq=12)
# Check the frequency matches 1 datapoint per year for this dataset
tsp(clear)
## [1] 1911.000 1961.083 12.000
plot(clear)
summary(clear)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 147.2 291.5 290.7 432.8 578.0
clear.d <- decompose(clear)
plot(clear.d)
It is evident from the ACF and Partial ACF plots that these data are not autocorrelated.
layout(matrix(c(1,1,2,3),ncol=2,nrow=2,byrow=T))
plot(clear,type="b")
acf(clear,main="")
pacf(clear,main="")
The linear model indicates that there is not a significant trend in the mean flow of Clearwater River during this timeframe (p-value = 0.34).
cleartime <- time(clear)
clear.lm <- lm(clear ~ cleartime)
summary(clear.lm)
##
## Call:
## lm(formula = clear ~ cleartime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -294.474 -142.870 -0.946 139.513 296.221
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1164.3014 907.3689 1.283 0.200
## cleartime -0.4512 0.4687 -0.963 0.336
##
## Residual standard error: 166.5 on 600 degrees of freedom
## Multiple R-squared: 0.001542, Adjusted R-squared: -0.0001216
## F-statistic: 0.9269 on 1 and 600 DF, p-value: 0.3361
This is Australia’s monthly electricity production in million kilowatt hours from Jan 1956 to Aug 1995. Source: Time Series Data Library (citing: Australian Bureau of Statistics)
# Read and attach the csv file
elect.c <- read.csv("austelect.csv", header=TRUE)
attach(elect.c)
# Use the ts function to convert the file to a time series object. This data was collected monthly, so we have a frequency of 12.
elect <- ts(kwh, start=c(1956, 1), freq=12)
# Check the frequency matches 1 datapoint per year for this dataset
tsp(elect)
## [1] 1956.000 1995.583 12.000
# Plot the data to get a general visual of what it looks like
plot(elect)
# Get the summary statistics to get a better sense of the dataset
summary(elect)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1254 3147 6380 6903 10220 15360
These data indicate a clear trend and seasonal component and noise from random risiduals.
elect.d <- decompose(elect)
plot(elect.d)
The one year average removes the seasonal variation and shows the general trend in the data.
# Linear model for data
electtime <- time(elect)
elect.lm <- lm(elect ~ electtime)
summary(elect.lm)
##
## Call:
## lm(formula = elect ~ electtime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1456.34 -591.51 -54.18 459.20 2312.57
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.770e+05 5.846e+03 -115.8 <2e-16 ***
## electtime 3.462e+02 2.959e+00 117.0 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 739.2 on 474 degrees of freedom
## Multiple R-squared: 0.9665, Adjusted R-squared: 0.9665
## F-statistic: 1.369e+04 on 1 and 474 DF, p-value: < 2.2e-16
# moving average of one year (ma01)
ma01 <- filter(x=elect, filter=rep(x=1/12,times=12), sides=2)
plot(elect,col="grey", ylab="Kilowatt hours")
lines(ma01,col="red",lwd=2)
# Add a trend line from our linear model
abline(elect.lm, col="black",lwd=2, lty="dashed")
acf(elect, main="")
pacf(elect, main="")
The ARF plot indicates that these data are autocorrelated. The PACF plot indicates that the data are posivitely autocorrelated every 6 months and negatively correlated ever 1 year, which is most likely evidence of a spike in electricity usage during the colder months of the year.