library(fpp2)
library(urca)
library(readxl)
knitr::include_graphics("week8_8111.png")
Series X1, X2, and X3 include 36 random numbers, 360 random numbers, and 1,000 random numbers, respectively. The ACF plots show white noise bounds of different widths that decrease in width as the number of random numbers increases. The plots suggest that each series is essentially white noise, though each series includes at least some correlations that approach or just exceed its bounds.
The width of the bounds for white noise varies based on the 95% critical values \(\pm 1.96 / \sqrt{T}\), where \(T =\) series length (e.g. 36, 360, 1,000). As T increases, the width of the bounds decreases. X3 is longer in length (T), so its bounds are lesser in width than those for X2 and X1. The actual correlations are different for each series given each series consists of random numbers and is white noise.
autoplot(ibmclose, main="Daily closing IBM stock prices")
A plot of the series shows different price levels over time, increasing from approximately \(\$450\) to \(\$600\) before eventually to nearly \(\$300\). The changes in price suggest the series is non-stationary and should be differenced.
Acf(ibmclose, main="ACF: Daily closing IBM stock prices")
Plotting the correlations shows high positive correlations starting from \(r1\) and decreasing steadily with an increase in lags. This pattern indicates a series trend, which suggests the series is non-stationary and should be differenced.
Pacf(ibmclose, main="PACF: Daily closing IBM stock prices")
Plotting partial ACF shows a high, significant initial correlation followed by a series of insignificant ones. This pattern suggests the series is non-stationary and should be differenced.
#?usnetelec - Annual US net electricity generation (billion kwh) for 1949-2003
autoplot(usnetelec, main="Annual US net electricity generation")
Acf(usnetelec, main="ACF: Annual US net electricity generation")
Pacf(usnetelec, main="PACF: Annual US net electricity generation")
Plotting the series and its associated autocorrelation and partial autocorrelation plots reveals an upward trend and thus non-stationary data.
BoxCox.lambda(usnetelec)
## [1] 0.5167714
The suggested Box-Cox \(\lambda\) is approximately 0.52, which suggests a square root transformation (\(\lambda =\) 0.5) could be used to ease interpretation.
usnetelec_bc <- BoxCox(usnetelec, lambda=0.5)
ndiffs(usnetelec_bc)
## [1] 2
Applying a unit root test to the transformed series results in a suggestion of 2 differences.
usnetelec_bc_d <- diff(diff(usnetelec_bc))
autoplot(usnetelec_bc_d, main="Annual US net electricity generation, after Box-Cox and differencing")
Acf(usnetelec_bc_d)
Pacf(usnetelec_bc_d)
After differencing twice, the series trend is removed, and the autocorrelation and partial autocorrelation plots show essentially white noise relative to the correlations for non-transformed, non-differenced series. There are, however, still several correlations that fall outside the bounds.
usnetelec_bc_d %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.0707
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
After transformation and differencing, the KPSS test test-statistic is approximately 0.07, which is well within the critical value bounds. The transformed and differenced series is stationary.
#?usgdp - Quarterly US GDP (1947.1 - 2006.1)
autoplot(usgdp, main="Quarterly US GDP")
Acf(usgdp, main="Quarterly US GDP")
Pacf(usgdp, main="Quarterly US GDP")
Plotting the series and its associated autocorrelation and partial autocorrelation plots reveals an upward trend and thus non-stationary data.
BoxCox.lambda(usgdp)
## [1] 0.366352
The suggested Box-Cox \(\lambda\) is approximately 0.37, which suggests a \(\lambda\) of 0.33 could be used to ease interpretation.
usgdp_bc <- BoxCox(usgdp, lambda=0.33)
ndiffs(usgdp_bc)
## [1] 1
Applying a unit root test to the transformed series results in a suggestion of 1 difference.
usgdp_bc_d <- diff(usgdp_bc)
autoplot(usgdp_bc_d, main="Quarterly US GDP, after Box-Cox and differencing")
Acf(usgdp_bc_d)
Pacf(usgdp_bc_d)
After differencing, the series trend is removed, and the autocorrelation and partial autocorrelation plots show essentially white noise relative to the correlations for non-transformed, non-differenced series. There are, however, still several correlations--notably, lags 1, 2, and 12--that fall on or outside the bounds.
usgdp_bc_d %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.1476
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
After transformation and differencing, the KPSS test test-statistic is approximately 0.15, which is within the critical value bounds. The transformed and differenced series is stationary.
#?mcopper - Monthly copper prices
autoplot(mcopper, main="Monthly copper prices")
Acf(mcopper, main="Monthly copper prices")
Pacf(mcopper, main="Monthly copper prices")
Plotting the series and its associated autocorrelation and partial autocorrelation plots reveals an upward trend--amplified by a large increase in price starting in 2005--and thus non-stationary data. There is also a roughly flat period from 1962 to 1964.
BoxCox.lambda(mcopper)
## [1] 0.1919047
The suggested Box-Cox \(\lambda\) is approximately 0.19, which suggests a \(\lambda\) of 0.20 could be used to ease interpretation.
mcopper_bc <- BoxCox(mcopper, lambda=0.2)
ndiffs(mcopper_bc)
## [1] 1
Applying a unit root test to the transformed series results in a suggestion of 1 difference.
mcopper_bc_d <- diff(mcopper_bc)
autoplot(mcopper_bc_d, main="Monthly copper prices, after Box-Cox and differencing")
Acf(mcopper_bc_d)
Pacf(mcopper_bc_d)
After differencing, the series trend is removed, and the autocorrelation and partial autocorrelation plots show essentially white noise relative to the correlations for non-transformed, non-differenced series. There are, however, still several correlations that fall outside the bounds.
mcopper_bc_d %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 0.0581
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
After transformation and differencing, the KPSS test test-statistic is approximately 0.06, which is well within the critical value bounds. The transformed and differenced series is stationary.
#?enplanements - Monthly US domestic enplanements (millions) (1996-2000)
autoplot(enplanements, main="Monthly US domestic enplanements")
Acf(enplanements, main="Monthly US domestic enplanements")
Pacf(enplanements, main="Monthly US domestic enplanements")
Plotting the series and its associated autocorrelation and partial autocorrelation plots reveals an upward trend, monthly seasonality, and thus non-stationary data. There is a large decrease around 2004 that tempers the trend.
BoxCox.lambda(enplanements)
## [1] -0.2269461
The suggested Box-Cox \(\lambda\) is approximately -0.23, which suggests a \(\lambda\) of -0.20 could be used to ease interpretation.
enplanements_bc <- BoxCox(enplanements, lambda=-0.2)
nsdiffs(enplanements_bc) #seasonal
## [1] 1
ndiffs(diff(enplanements_bc, lag=12))
## [1] 1
Given its monthly seasonality, the transformed series requires a seasonal difference followed by a single difference.
enplanements_bc_d <- diff(diff(enplanements_bc, lag=12))
autoplot(enplanements_bc_d, main="Monthly US domestic enplanements, after Box-Cox and differencing")
Acf(enplanements_bc_d)
Pacf(enplanements_bc_d)
After differencing twice, the series trend is removed, and the autocorrelation and partial autocorrelation plots show essentially white noise relative to the correlations for non-transformed, non-differenced series. There are, however, still several correlations--notably, lags 1 and 12--that fall outside the bounds.
enplanements_bc_d %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0427
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
After transformation and differencing, the KPSS test test-statistic is approximately 0.04, which is well within the critical value bounds. The transformed and differenced series is stationary.
#?visitors - Monthly Australian short-term overseas visitors (May 1985 - April 2005)
autoplot(visitors, main="Monthly Australian short-term overseas visitors")
Acf(visitors, main="Monthly Australian short-term overseas visitors")
Pacf(visitors, main="Monthly Australian short-term overseas visitors")
Plotting the series and its associated autocorrelation and partial autocorrelation plots reveals an upward trend, monthly seasonality, and thus non-stationary data. Variation increases with the series.
BoxCox.lambda(visitors)
## [1] 0.2775249
The suggested Box-Cox \(\lambda\) is approximately 0.28, which suggests a \(\lambda\) of 0.25 could be used to ease interpretation.
visitors_bc <- BoxCox(visitors, lambda=0.25)
nsdiffs(visitors_bc) #seasonal
## [1] 1
ndiffs(diff(visitors_bc, lag=12))
## [1] 1
Given its monthly seasonality, the transformed series requires a seasonal difference followed by a single difference.
visitors_bc_d <- diff(diff(visitors_bc, lag=12))
autoplot(visitors_bc_d, main="Monthly Australian overseas visitors, after Box-Cox and differencing")
Acf(visitors_bc_d)
Pacf(visitors_bc_d)
After differencing twice, the series trend is removed, and the autocorrelation and partial autocorrelation plots show essentially white noise relative to the correlations for non-transformed, non-differenced series. There are, however, several correlations--notably, lags 1, 12, and 24--that fall well outside the bounds.
visitors_bc_d %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0159
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
After transformation and differencing, the KPSS test test-statistic is approximately 0.02, which is well within the critical value bounds. The transformed and differenced series is stationary.
retaildata <- read_excel("retail.xlsx", skip=1) # 381 x 19
myts <- ts(retaildata[,4], frequency=12, start=c(1982,4)) # 4th column ("A3349338X")
autoplot(myts, main="Retail - A3349338x")
The retail series shows both an upward trend and monthly seasonality, meaning that is not stationary.
Acf(myts, main="Retail - A3349338x")
Pacf(myts, main="Retail - A3349338x")
Autocorrelation plots provide further evidence of trend, seasonality, and non-stationarity.
BoxCox.lambda(myts)
## [1] -0.2536078
The suggested Box-Cox \(\lambda\) is approximately -0.25, which is already easily interpretable.
myts_bc <- BoxCox(myts, lambda=-0.25)
nsdiffs(myts_bc) #seasonal
## [1] 1
ndiffs(diff(myts_bc, lag=12))
## [1] 1
The transformed series shows monthly seasonality and requires a seasonal difference, which is then followed by a single difference.
myts_bc_d <- diff(diff(myts_bc, lag=12))
autoplot(myts_bc_d, main="Retail - A3349338x, after Box-Cox and differencing")
Acf(myts_bc_d)
Pacf(myts_bc_d)
After transformation and differencing, the series continues to display significant autocorrelation, particularly at lags of 12 and 24.
myts_bc_d %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0219
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
After transformation and differencing, the KPSS test test-statistic is approximately 0.02, which is well within the critical value bounds. The transformed and differenced series is stationary.
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
autoplot(y, main="AR(1) simulation, phi = 0.6")
The simulated series, with \(\phi_1 =\) 0.6 and $_2 =$1, appears stationary, with no apparent trend or seasonality. Variation is inconsistent across the series.
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.1*y[i-1] + e[i]
autoplot(y, main="AR(1) simulation, phi = 0.1")
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- -0.6*y[i-1] + e[i]
autoplot(y, main="AR(1) simulation, phi = -0.6")
Changing the \(\phi\) value results in a shift in series pattern. As \(\phi\) approaches zero, the series approximates white noise. As \(\phi\) falls below zero, the series shows greater variation and oscillation about the mean.
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
autoplot(y, main="MA(1) simulation, theta = 0.6")
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0*e[i-1] + e[i]
autoplot(y, main="MA(1) simulation, theta = 0")
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- -0.6*e[i-1] + e[i]
autoplot(y, main="MA(1) simulation, theta = -0.6")
As \(\theta_1\) decreases towards -1, the series shows increased variation, and it oscillates with greater frequency about the mean.
y1 <- ts(numeric(100))
e1 <- rnorm(100, sd=1)
for(i in 2:100)
y1[i] <- 0.6*y1[i-1] + 0.6*e1[i-1] + e1[i]
y2 <- ts(numeric(100))
e2 <- rnorm(100)
for(i in 3:100)
y2[i] <- -0.8*y2[i-1] + 0.3*y2[i-2] + e2[i]
autoplot(y1, main="ARMA(1,1) simulation, phi_1 = 0.6, theta_1 = 0.6, sigma^2 = 1")
autoplot(y2, main="AR(2) simulation, phi_1 = -0.8, phi_2 = 0.3, sigma^2 = 1")
The ARMA(1,1) model appears stationary with non-patterned variation about zero. Notably, there is a large dip, past -5.0, at around time 13. The AR(2) model displays a clear pattern of increasing variation--from approximately zero to approximately 4000--as time increases.
#?wmurders - Annual female murder rate (per 100,000 standard population) in the USA (1950-2004)
autoplot(wmurders, main="Annual female murder rate per 100,000 in USA (1950-2004)")
Acf(wmurders)
Pacf(wmurders)
The plots suggest that the series shows a weak upward trend, no seasonality, and apparent stationarity.
BoxCox.lambda(wmurders)
## [1] -0.09529835
ndiffs(wmurders)
## [1] 2
The suggested \(\lambda\) for a Box-Cox transformation is approximately -0.09, or roughly zero, which would mean a logarithmic transformation. Based upon the series plot, the series' limited variation does not seem to warrant a log-transform.
The suggested number of differences is 2.
wmurders_d <- diff(diff(wmurders))
autoplot(wmurders_d, main="Annual female murder rate, after differencing")
Acf(wmurders_d)
Pacf(wmurders_d)
After differencing, both autocorrelation plots show particularly large, significant spikes at lag 1. The ACF (p) and PACF (q), then, suggest p = 1 and q = 1. Considering d = 2, an ARIMA(1,2,1) seems appropriate.
No. Based on the non-differenced series and its slight trend, adding a constant and thus drift to the model could be appropriate. However, the difference of 2 suggests a constant should not be included in order to avoid higher-order polynomials.
Removing the constant c = 0,
\((1 - \phi_1B)(1 - B)^2y_t = (1 + \theta_1B)\epsilon_t\)
(m1 <- Arima(wmurders, order=c(1,2,1)))
## Series: wmurders
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
checkresiduals(m1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
##
## Model df: 2. Total lags used: 10
Fitting an ARIMA(1,2,1) model with no constant term results in residuals that are not significantly different from zero per a Ljung-Box test (p-value ~.1335 on 8 degrees of freedom) and thus white noise. The residuals are also relatively symmetrically distributed, if not quite normal. The model appears satisfactory to move forward.
f1 <- forecast(m1, h=3)
f1$mean
## Time Series:
## Start = 2005
## End = 2007
## Frequency = 1
## [1] 2.470660 2.363106 2.252833
Using a reduced and simplified version of the equation above, my forecast calculations are approximately the same as those produced by R.
autoplot(f1)
Visualized, the forecast shows a relatively linear decrease in murder rate. The linearity reflects, in the short-term, that long-term forecasts follow a straight line when c = 0 and d = 2.
(m2 <- auto.arima(wmurders, stepwise=FALSE, approximation=FALSE))
## Series: wmurders
## ARIMA(0,2,3)
##
## Coefficients:
## ma1 ma2 ma3
## -1.0154 0.4324 -0.3217
## s.e. 0.1282 0.2278 0.1737
##
## sigma^2 estimated as 0.04475: log likelihood=7.77
## AIC=-7.54 AICc=-6.7 BIC=0.35
checkresiduals(m2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
##
## Model df: 3. Total lags used: 10
The documentation for auto.arima() suggests setting the stepwise and approximation arguments to FALSE given the analysis focuses on a single series. Doing so results in a suggested ARIMA(0,2,3) compared with the chosen ARIMA(1,2,1). The former shows a slightly higher log-likelihood (approximately 7.77 versus 6.44). Its residuals also appear to check out.
f2 <- forecast(m2, h=3)
f2$mean
## Time Series:
## Start = 2005
## End = 2007
## Frequency = 1
## [1] 2.408817 2.365555 2.290976
In addition, its forecast is more of an arc compared to a straight line. Ultimately, the auto.arima() version seems better given my understanding that the set arguments add stringency to the conditions for modeling.