Do exercises 3.1, 3.2, 3.3 and 3.8 from the online Hyndman book. Please include your Rpubs link along with your .rmd file.
library(fpp2)
library(ggplot2)
library(knitr)
library(kableExtra)
- usnetelec
- usgdp
- mcopper
- enplanements
Exploring usnetelec
Annual US net electricity generation (billion kwh) for 1949-2003
#help(usnetelec)
head(usnetelec)
## Time Series:
## Start = 1949
## End = 1954
## Frequency = 1
## [1] 296.1 334.1 375.3 403.8 447.0 476.3
summary(usnetelec)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 296.1 889.0 2040.9 1972.1 3002.7 3858.5
Plotting the time series, we can see that the frequency is yearly.
autoplot(usnetelec)
lambda <- BoxCox.lambda(usnetelec)
print(paste0("Lambda for usnetelec: ", lambda))
## [1] "Lambda for usnetelec: 0.516771443964645"
Exmplorin usgdp
Quarterly US GDP. 1947:1 - 2006.1.
#help("usgdp")
head(usgdp)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1947 1570.5 1568.7 1568.0 1590.9
## 1948 1616.1 1644.6
summary(usgdp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1568 2632 4552 5168 7130 11404
lambda <- BoxCox.lambda(usgdp)
print(paste0("Lambda for usgdp: ", lambda))
## [1] "Lambda for usgdp: 0.366352049520934"
autoplot(BoxCox(usgdp,lambda))
Exmplorin mcopper
Monthly copper prices. Copper, grade A, electrolytic wire bars/cathodes,LME,cash (pounds/ton)
Source: UNCTAD (http://stats.unctad.org/Handbook).
#help("mcopper")
head(mcopper)
## Jan Feb Mar Apr May Jun
## 1960 255.2 259.7 249.3 258.0 244.3 246.8
summary(mcopper)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 216.6 566.0 949.2 997.8 1262.5 4306.0
lambda <- BoxCox.lambda(mcopper)
print(paste0("Lambda for mcopper: ", lambda))
## [1] "Lambda for mcopper: 0.191904709003829"
autoplot(BoxCox(mcopper,lambda))
daily_average <- mcopper/monthdays(mcopper)
copper_df <- cbind(Monthly = mcopper,
daily_average = daily_average)
autoplot(copper_df, facet=TRUE) +
xlab("Years") + ylab("Copper Price") +
ggtitle("Monthly Copper Price")
the BoxCox transformation lambda is ~ 0.2 .
Exmplorin enplanements
“Domestic Revenue Enplanements (millions): 1996-2000. SOURCE: Department of Transportation, Bureau of Transportation Statistics, Air Carrier Traffic Statistic Monthly.
#help("enplanements")
head(enplanements)
## Jan Feb Mar Apr May Jun
## 1979 21.12 22.92 25.90 24.38 23.41 26.82
summary(enplanements)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.14 27.18 34.88 35.67 42.78 56.14
lambda <- BoxCox.lambda(enplanements)
print(paste0("Lambda for enplanements: ", lambda))
## [1] "Lambda for enplanements: -0.226946111237065"
autoplot(BoxCox(enplanements,lambda))
daily_average <- mcopper/monthdays(enplanements)
enp_df <- cbind(Monthly = enplanements,
daily_average = daily_average)
autoplot(enp_df, facet=TRUE) +
xlab("Years") + ylab("Domestic Revenue Enplanements (millions)") +
ggtitle("Monthly US domestic enplanements")
ggseasonplot(enplanements)
gglagplot(enplanements)
ggAcf(enplanements)
There appears to be an upward trend on the ACF plot, also, there appears to be some seasonal component with the “scallop”" shape.
Monthly Canadian gas production, billions of cubic metres, January 1960 - February 2005
#help("cangas")
head(cangas)
## Jan Feb Mar Apr May Jun
## 1960 1.4306 1.3059 1.4022 1.1699 1.1161 1.0113
summary(cangas)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.966 6.453 8.831 9.777 14.429 19.528
autoplot(cangas)
lambda <- BoxCox.lambda(cangas)
print(paste0("Lambda for cangas: ", lambda))
## [1] "Lambda for cangas: 0.576775938228139"
autoplot(BoxCox(cangas,lambda))
Based upon the BoxCox.lambda function it appears that a function/ model may suggesting a transformation that it does not necessarily mean. Also, it does not yield simpler model.
retaildata <- readxl::read_excel("C:\\NITEEN\\GitHub\\CUNY_DATA_624\\Dataset\\retail.xlsx", skip=1)
#kable(head(retaildata))
myts <- ts(retaildata$A3349661X, start = c(1982,4),frequency = 12)
head(myts)
## Apr May Jun Jul Aug Sep
## 1982 19.2 21.9 19.9 19.3 19.6 19.9
autoplot(myts)
lambda <- BoxCox.lambda(myts)
print(paste0("Lambda for myts: ", lambda))
## [1] "Lambda for myts: -0.0493897710767389"
autoplot(BoxCox(myts,lambda))
It appears that log transformation is appropriate.
Split the data into two parts using
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
Check that your data have been split appropriately by producing the following plot.
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
Calculate forecasts using snaive applied to myts.train.
fc <- snaive(myts.train)
Compare the accuracy of your forecasts against the actual values stored in myts.test
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 3.518619 10.42741 7.244144 5.375707 11.81551 1.000000
## Test set 30.512500 33.10478 30.512500 20.388974 20.38897 4.212023
## ACF1 Theil's U
## Training set 0.7935927 NA
## Test set 0.3613754 2.66608
When choosing models, it is common practice to separate the available data into two portions, training and test data, where the training data is used to estimate any parameters of a forecasting method and the test data is used to evaluate its accuracy. Because the test data is not used in determining the forecasts, it should provide a reliable indication of how well the model is likely to forecast on new data.
Check the residuals.
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 1059, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Do the residuals appear to be uncorrelated and normally distributed?
The residuals seem correlated with each other but it doesn’t look like normally distributed
How sensitive are the accuracy measures to the training/test split?
Per lesson 3.4, the accuracy measure is always sensitive to the training/test split. The following points should be noted.
We can also use cross-validation tsCV() to check the accuracy of a time series. In this procedure, there are a series of test sets, each consisting of a single observation. The corresponding training set consists only of observations that occurred prior to the observation that forms the test set. Thus, no future observations can be used in constructing the forecast. Time series cross-validation is implemented with the tsCV() function.
Also, It looks like the Mean Error is highly sensitive by definition. RMSE, MAE, MPE, MASE are sensitive too whereas MAPE and ACF1 are not that sensitive.