library(fpp)
library(fpp2)
library(ggplot2)
library(knitr)
library(kableExtra)
library(readxl)
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
funcCmpr <- function(data, ylabtext, title, bcttitle){
print(head(data))
print(summary(data))
print(autoplot(data) + ylab(ylabtext) + ggtitle(title))
lambda <- BoxCox.lambda(data)
print(paste0("Lambda: ", lambda))
print(autoplot(BoxCox(data,lambda)))
print(autoplot(BoxCox(data,lambda)) + ggtitle(bcttitle))
}
funcCmpr(usnetelec, "Annual US Electricity Generation (billion kWh)", "Annual US Net Electricity Generation", "Box Cox Transformation of Annual US Net Electricity Generation")
## Time Series:
## Start = 1949
## End = 1954
## Frequency = 1
## [1] 296.1 334.1 375.3 403.8 447.0 476.3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 296.1 889.0 2040.9 1972.1 3002.7 3858.5
## [1] "Lambda: 0.516771443964645"
The usnetelec plot shows an upward trend and no seasonality. It shows little variance as time progresses.
funcCmpr(usgdp, "Quarterly US GDP", "Quarterly US GDP", "Box Cox Transformation of Quarterly US GDP")
## Qtr1 Qtr2 Qtr3 Qtr4
## 1947 1570.5 1568.7 1568.0 1590.9
## 1948 1616.1 1644.6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1568 2632 4552 5168 7130 11404
## [1] "Lambda: 0.366352049520934"
The usgdp plo shows an upward trend and no apparent seasonality. The BoxCox lambda function was used to choose a value for lambda to make the size of the seasonal variation constant. The value of lambda chosen is 0.36. The transformed data is more linear and has less variation than the original data.
funcCmpr(mcopper, "Monthly Copper Prices", "Monthly Copper Prices", "Box Cox Transformation of Monthly Copper Prices")
## Jan Feb Mar Apr May Jun
## 1960 255.2 259.7 249.3 258.0 244.3 246.8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 216.6 566.0 949.2 997.8 1262.5 4306.0
## [1] "Lambda: 0.191904709003829"
The mcopper plot shows an upward trend and cyclic behavior. There is less variation and shows a sharp increase in price around 2007.
funcCmpr(enplanements, "Domestic Revenue Enplanements (millions)", "Monthly US Domestic Revenue from People Boarding Airplanes", "Box Cox Transformation of Monthly US Domestic Revenue from People Boarding Airplanes")
## Jan Feb Mar Apr May Jun
## 1979 21.12 22.92 25.90 24.38 23.41 26.82
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.14 27.18 34.88 35.67 42.78 56.14
## [1] "Lambda: -0.226946111237065"
The enplanements plot shows an upward trend and a seasonality of 1 year. There is less seasonal variation during som period than there is in the rest of the data set. The BoxCox.lambda function was used to choose a value for lambda to make the size of the seasonal variation constant.
Why is a Box-Cox transformation unhelpful for the cangas data?
funcCmpr(cangas, "Monthly Canadian Gas Production (billions of cubic meters)", "Canadian Gas Production", "Box Cox Transformation of Canadian Gas Production")
## Jan Feb Mar Apr May Jun
## 1960 1.4306 1.3059 1.4022 1.1699 1.1161 1.0113
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.966 6.453 8.831 9.777 14.429 19.528
## [1] "Lambda: 0.576775938228139"
The Box-Cox transformation cannot be used to make the seasonal variation uniform. This is evident in the cangas data. The middle region has a high variability than the lower and the upper regions. The variance is not stable.
What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?
retaildata <- read_excel("data/retail.xlsx", skip = 1)
myts <- ts(retaildata[, "A3349398A"], frequency = 12, start = c(1982, 4))
head(myts)
## Apr May Jun Jul Aug Sep
## 1982 408.7 404.9 401.0 414.4 403.8 411.8
funcCmpr(myts, "Retail Clothing Sales", "New South Wales - Clothing Sales", "Box Cox Transformation of Retail Clothing Sales in New South Wales")
## Apr May Jun Jul Aug Sep
## 1982 408.7 404.9 401.0 414.4 403.8 411.8
## A3349398A
## Min. : 401.0
## 1st Qu.: 791.8
## Median :1311.6
## Mean :1420.6
## 3rd Qu.:2025.8
## Max. :3278.2
## [1] "Lambda: 0.123156269082221"
The retail data shows an upward trend. The variance increase with time. This model is great for forecasting. The transformed data has less seasonal variation throughout.
For your retail time series (from Exercise 3 in Section 2.10):
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
fc <- snaive(myts.train)
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 73.94114 88.31208 75.13514 6.068915 6.134838 1.000000 0.6312891
## Test set 115.00000 127.92727 115.00000 4.459712 4.459712 1.530576 0.2653013
## Theil's U
## Training set NA
## Test set 0.7267171
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test") +
autolayer(fc, series="prediction")
The mean error for the training set is 74 and the mean error for the test set is about 115. These values are close for the training and test set. The root mean square error (RMSE) is 88 and 128 and is close as well. The mean absolute error (MAE) is very similar for the training and testing set, and is about 75 and 115. The mean percentage error (MPE) is 6% for the training set and 4.5% for the testing set.
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 671.41, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Do the residuals appear to be uncorrelated and normally distributed?
The residuals are not centered around 0 and are not normally distributed. It appears to be correlated to each other.
The errors in both test and train set are fairly similar to each other. The test set has slightly larger errors that the training set for the mean error, root mean square error, mean absolute error, mean absolute scaled error and auto correlation function. The test set has a lower error for the mean percentage error and the mean absolute percentage error.