5.1 Daily electricity demand for Victoria, Australia, during 2014 is contained in elecdaily. The data for the first 20 days can be obtained as follows.
library(forecast)
library(fpp)
library(fpp2)
library(magrittr)
library(ggplot2)
library(seasonal)
library(dplyr)
library(gridExtra)
daily20 <- head(elecdaily,20)
daily20 <- head(elecdaily,20)
#Plot data
autoplot(daily20[,"Demand"]) + ylab("Demand") + xlab("temperature")
fit = lm(Demand~Temperature, daily20)
summary(fit)
##
## Call:
## lm(formula = Demand ~ Temperature, data = daily20)
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.060 -7.117 -1.437 17.484 27.102
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.2117 17.9915 2.179 0.0428 *
## Temperature 6.7572 0.6114 11.052 1.88e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22 on 18 degrees of freedom
## Multiple R-squared: 0.8716, Adjusted R-squared: 0.8644
## F-statistic: 122.1 on 1 and 18 DF, p-value: 1.876e-09
daily20 %>%
as.data.frame() %>%
ggplot(aes(x=Temperature, y=Demand)) +
ylab("Demand") +
xlab("Temperature") +
geom_point() +
geom_smooth(method="lm", se=FALSE)
Temperature increases, airconditioning increases, people are out more in warmer weather, dining, shopping etc increases
checkresiduals(fit)
##
## Breusch-Godfrey test for serial correlation of order up to 5
##
## data: Residuals
## LM test = 3.8079, df = 5, p-value = 0.5774
cbind(Fitted = fitted(fit),
Residuals=residuals(fit)) %>%
as.data.frame() %>%
ggplot(aes(x=Fitted, y=Residuals)) + geom_point()
Autocorrelation is the phenomonin whereby the random errors in the model are often positively correlated over time, so that each random error is more likely to be similar to the previous random error that it would be if the random errors were independent of one another. Residual plot shows a somewhat U-shaped pattern, with more values falling below 0 - meaning at demand below 250, the model predicted higher than the actual; while at demand > 300, the model predicted lower than actuals. the histogram of the residual count has 2 significant peaks which supports this.
new.temp <- data.frame(Temperature = c(15,35))
predict(fit, newdata = new.temp)
## 1 2
## 140.5701 275.7146
# 1 2
#140.5701 275.7146
We see from teh residual plots that the model is variable at teh lower and higher ends, hence these forecastss are not believable.
predict(fit, newdata = new.temp, interval = "prediction")
## fit lwr upr
## 1 140.5701 90.21166 190.9285
## 2 275.7146 227.57056 323.8586
autoplot(elecdaily, facets = TRUE)
elecdaily %>%
as.data.frame() %>%
ggplot(aes(x=Temperature, y=Demand)) +
geom_point() +
geom_smooth(method="lm", se=FALSE)
the model is underestimating demand at higher temperatures.
5.2. Data set mens400 contains the winning times (in seconds) for the men’s 400 meters final in each Olympic Games from 1896 to 2016.
autoplot(mens400) + ylab("Winning Time in sec") + xlab("Year")
The winning time has decreased from 54 seconds to 43 seconds over the time period 1896 - 2016. There are breaks in the trend, for those years when the Olympics did not take place due to the First and Second World Wars.
time.mens<- time(mens400)
fit.mens <- tslm(mens400 ~ time.mens, data=mens400)
autoplot(mens400)+
geom_abline(slope=fit.mens$coefficients[2], intercept=fit.mens$coefficients[1], color="red")
fit.mens$coefficients[2]
## time.mens
## -0.06457385
Winning times rate of change, as shown by the slope of teh regression line, is -0.065 seconds per year.
checkresiduals(fit.mens)
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals from Linear regression model
## LM test = 3.6082, df = 6, p-value = 0.7295
cbind(Time = time(mens400),
Residuals = fit.mens$residuals) %>%
as.data.frame() %>%
ggplot(aes(x = Time, y = Residuals)) +
geom_point() +
ylab("Residuals of Regression Line")
## Warning: Removed 3 rows containing missing values (geom_point).
t <- seq(2020,2020, 4)
t
## [1] 2020
fcast.mens400 = forecast(fit.mens, newdata = t)
## Warning in forecast.lm(fit.mens, newdata = t): newdata column names not
## specified, defaulting to first variable required.
autoplot(mens400) + autolayer(fcast.mens400) + ggtitle("Forecast of Winning Time")
#Prediction Intervals
fcast.mens400$lower
## Time Series:
## Start = 2020
## End = 2020
## Frequency = 0.25
## Series 1 Series 2
## 2020 40.44975 39.55286
fcast.mens400$upper
## Time Series:
## Start = 2020
## End = 2020
## Frequency = 0.25
## Series 1 Series 2
## 2020 43.63487 44.53176
The prediction interval for teh 2020 forecasted winning time is 40.45 s
This is a dummy variable where a value of “1” means “Easter” and “0” means “not Easter”, in order to account for Easter seasonlaity factor in the ausbeer dataset. The fractional split between quarter occurs when Easter Weekend starts at the end of one quarter and ends in the beginning of the next.
5.5. The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.
autoplot(fancy)
The time series shows distinct peaks at each year end, and a smaller peak in first quarter. 1991 was the only year where the peak was lower in both these 2 periods. The spike in 1993 was steeper than in past years, perhaps due to store and business expansion.
The increasing seasonal fluctuations in the data make it is necessary to take logarithms in order to get an additive model and to stabilize the variance so it does not increase with the level of the series.
log_fancy = log(fancy)
dummy_fest = rep(0,length(fancy)) #festival dummy variable
dummy_fest[seq_along(dummy_fest)%%12 ==3]<- 1
dummy_fest[3]<- 0
#dummy_fest = ts(dummy_fest_mat, freq = 12, start=c(1987,1))
fit.sales = tslm(log_fancy ~ trend + season + dummy_fest)
summary(fit.sales)
##
## Call:
## tslm(formula = log_fancy ~ trend + season + dummy_fest)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33673 -0.12757 0.00257 0.10911 0.37671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6196670 0.0742471 102.626 < 2e-16 ***
## trend 0.0220198 0.0008268 26.634 < 2e-16 ***
## season2 0.2514168 0.0956790 2.628 0.010555 *
## season3 0.2660828 0.1934044 1.376 0.173275
## season4 0.3840535 0.0957075 4.013 0.000148 ***
## season5 0.4094870 0.0957325 4.277 5.88e-05 ***
## season6 0.4488283 0.0957647 4.687 1.33e-05 ***
## season7 0.6104545 0.0958039 6.372 1.71e-08 ***
## season8 0.5879644 0.0958503 6.134 4.53e-08 ***
## season9 0.6693299 0.0959037 6.979 1.36e-09 ***
## season10 0.7473919 0.0959643 7.788 4.48e-11 ***
## season11 1.2067479 0.0960319 12.566 < 2e-16 ***
## season12 1.9622412 0.0961066 20.417 < 2e-16 ***
## dummy_fest 0.5015151 0.1964273 2.553 0.012856 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared: 0.9567, Adjusted R-squared: 0.9487
## F-statistic: 119 on 13 and 70 DF, p-value: < 2.2e-16
checkresiduals(fit.sales)
##
## Breusch-Godfrey test for serial correlation of order up to 17
##
## data: Residuals from Linear regression model
## LM test = 37.954, df = 17, p-value = 0.002494
cbind(Fitted = fitted(fit.sales),
Residuals=residuals(fit.sales)) %>%
as.data.frame() %>%
ggplot(aes(x=Fitted, y=Residuals)) + geom_point()
The residuals do not appear to be randomly distributed, which shows some bias in the model. The residuals show the model over-estimating sales, especially during the period when Observed sales were lower.
boxplot(resid(fit.sales) ~ cycle(resid(fit.sales)))
The boxplot shows a non-linearity to the mean residuals for each month. the variances are very large in January, March, August, September and October.
coefficients(fit.sales)
## (Intercept) trend season2 season3 season4 season5
## 7.61966701 0.02201983 0.25141682 0.26608280 0.38405351 0.40948697
## season6 season7 season8 season9 season10 season11
## 0.44882828 0.61045453 0.58796443 0.66932985 0.74739195 1.20674790
## season12 dummy_fest
## 1.96224123 0.50151509
The coefficients are all positive: season 12 (Christmas time) has the highest positive impact, and trend having the least. Except for season 3, all coefficients are significant, and increasing with the warmer months. The dummy variable for the surfing festival in March
#install.packages("lmtest")
library(lmtest)
bgtest(fit.sales, order = 1)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: fit.sales
## LM test = 25.031, df = 1, p-value = 5.642e-07
The small p-value indicates there is significant autocorrelation remaining in the residuals.
fore_fancy <- data.frame(dummy_fest = rep(0, 36))
fcast.fancy<- forecast(fit.sales, newdata=fore_fancy)
fcast.fancy
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 9.491352 9.238522 9.744183 9.101594 9.88111
## Feb 1994 9.764789 9.511959 10.017620 9.375031 10.15455
## Mar 1994 9.801475 9.461879 10.141071 9.277961 10.32499
## Apr 1994 9.941465 9.688635 10.194296 9.551707 10.33122
## May 1994 9.988919 9.736088 10.241749 9.599161 10.37868
## Jun 1994 10.050280 9.797449 10.303110 9.660522 10.44004
## Jul 1994 10.233926 9.981095 10.486756 9.844168 10.62368
## Aug 1994 10.233456 9.980625 10.486286 9.843698 10.62321
## Sep 1994 10.336841 10.084010 10.589671 9.947083 10.72660
## Oct 1994 10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994 10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994 11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995 9.755590 9.499844 10.011336 9.361338 10.14984
## Feb 1995 10.029027 9.773281 10.284773 9.634775 10.42328
## Mar 1995 10.065713 9.722498 10.408928 9.536620 10.59481
## Apr 1995 10.205703 9.949957 10.461449 9.811451 10.59996
## May 1995 10.253157 9.997411 10.508903 9.858904 10.64741
## Jun 1995 10.314518 10.058772 10.570264 9.920265 10.70877
## Jul 1995 10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995 10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995 10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995 10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995 11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995 11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996 10.019828 9.760564 10.279093 9.620151 10.41951
## Feb 1996 10.293265 10.034000 10.552530 9.893588 10.69294
## Mar 1996 10.329951 9.982679 10.677222 9.794605 10.86530
## Apr 1996 10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996 10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996 10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996 10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996 10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996 10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996 10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996 11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996 12.224288 11.965023 12.483552 11.824611 12.62396
trans_fancy <- fcast.fancy %>% as.data.frame() %>% exp()
trans_fancy
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 13244.70 10285.82 17054.73 8969.583 19557.43
## Feb 1994 17409.81 13520.45 22418.00 11790.284 25707.73
## Mar 1994 18060.36 12860.02 25363.61 10699.594 30484.96
## Apr 1994 20774.16 16133.21 26750.16 14068.696 30675.62
## May 1994 21783.73 16917.24 28050.15 14752.395 32166.37
## Jun 1994 23162.27 17987.81 29825.24 15685.969 34201.95
## Jul 1994 27831.56 21613.98 35837.72 18848.111 41096.73
## Aug 1994 27818.48 21603.82 35820.87 18839.249 41077.41
## Sep 1994 30848.42 23956.87 39722.43 20891.193 45551.50
## Oct 1994 34095.57 26478.61 43903.67 23090.230 50346.32
## Nov 1994 55176.84 42850.31 71049.28 37366.903 81475.41
## Dec 1994 120067.79 93244.59 154607.08 81312.400 177294.90
## Jan 1995 17250.40 13357.65 22277.59 11629.938 25587.08
## Feb 1995 22675.20 17558.28 29283.31 15287.252 33633.55
## Mar 1995 23522.50 16688.88 33154.31 13858.024 39926.92
## Apr 1995 27057.06 20951.33 34942.16 18241.435 40133.06
## May 1995 28371.96 21969.51 36640.25 19127.918 42083.42
## Jun 1995 30167.42 23359.80 38958.95 20338.387 44746.58
## Jul 1995 36248.88 28068.91 46812.70 24438.412 53767.06
## Aug 1995 36231.84 28055.72 46790.69 24426.922 53741.78
## Sep 1995 40178.16 31111.50 51887.06 27087.467 59595.26
## Oct 1995 44407.37 34386.35 57348.77 29938.733 65868.34
## Nov 1995 71864.42 55647.40 92807.48 48449.831 106594.69
## Dec 1995 156380.86 121091.75 201954.08 105429.448 231955.81
## Jan 1996 22467.57 17336.40 29117.46 15065.329 33506.86
## Feb 1996 29533.04 22788.25 38274.14 19802.984 44043.89
## Mar 1996 30636.60 21648.24 43356.94 17936.708 52328.52
## Apr 1996 35240.15 27191.96 45670.42 23629.808 52555.15
## May 1996 36952.72 28513.41 47889.88 24778.151 55109.18
## Jun 1996 39291.20 30317.82 50920.48 26346.183 58596.65
## Jul 1996 47211.93 36429.60 61185.57 31657.322 70409.18
## Aug 1996 47189.73 36412.48 61156.80 31642.439 70376.07
## Sep 1996 52329.57 40378.47 67817.91 35088.887 78041.33
## Oct 1996 57837.85 44628.77 74956.52 38782.394 86256.08
## Nov 1996 93598.96 72222.70 121302.09 62761.521 139588.15
## Dec 1996 203676.38 157160.50 263959.89 136572.460 303751.35
trans_ts<- ts(trans_fancy,freq=12,start=1994, end = 1996)
par(mfrow=c(2,1))
autoplot(log_fancy) + autolayer(fcast.fancy) + ggtitle("Sales Forecast: 1994-1996")
autoplot(fancy) + autolayer(trans_ts) + ggtitle("Sales Forecast: 1994-1996")
5.6. The gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “million barrels per day”. Consider only the data to the end of 2004.
str(gasoline)
## Time-Series [1:1355] from 1991 to 2017: 6.62 6.43 6.58 7.22 6.88 ...
gas2004 <- window(gasoline, end=2004)
autoplot(gas2004) + xlab("Year") + ylab("Million Barrels/Day")
require(forecast)
gas2004 <- window(gasoline, end = 2004)
aic_vals_temp<- NULL
aic_vals<- NULL
for(num in 1:26){
fit<- tslm(gas2004 ~ trend + fourier(gas2004, K=num))
aic_value= CV(fit)[["AICc"]]
aic_vals_temp = cbind(num, aic_value)
aic_vals = rbind(aic_vals, aic_vals_temp)
}
print(aic_vals)
## num aic_value
## [1,] 1 -1659.655
## [2,] 2 -1664.679
## [3,] 3 -1701.618
## [4,] 4 -1701.261
## [5,] 5 -1709.933
## [6,] 6 -1734.857
## [7,] 7 -1745.389
## [8,] 8 -1744.791
## [9,] 9 -1743.020
## [10,] 10 -1743.760
## [11,] 11 -1744.000
## [12,] 12 -1750.161
## [13,] 13 -1746.232
## [14,] 14 -1743.785
## [15,] 15 -1739.520
## [16,] 16 -1737.468
## [17,] 17 -1736.849
## [18,] 18 -1744.511
## [19,] 19 -1740.465
## [20,] 20 -1736.294
## [21,] 21 -1732.044
## [22,] 22 -1727.891
## [23,] 23 -1724.455
## [24,] 24 -1721.903
## [25,] 25 -1719.715
## [26,] 26 -1715.017
colnames(aic_vals)<- c('Fourier Terms', 'AICValue')
aic_vals<- data.frame(aic_vals)
minAICval<- min(aic_vals$AICValue)
minvals<- aic_vals[which(aic_vals$AICValue == minAICval),]
minvals
## Fourier.Terms AICValue
## 12 12 -1750.161
12 Fourier pairs give the lowest AICc value of -1750.161.
fit_gas<- tslm(gas2004 ~ trend + fourier(gas2004 , K = 12))
checkresiduals(fit_gas)
##
## Breusch-Godfrey test for serial correlation of order up to 104
##
## data: Residuals from Linear regression model
## LM test = 149.09, df = 104, p-value = 0.002496
fc <- forecast(fit, newdata=data.frame(fourier(x,K,h))) where fit is the fitted model using tslm(), K is the number of Fourier terms used in creating fit, and h is the forecast horizon required.
Forecast the next year of data.e. Plot the forecasts along with the actual data for 2005. What do you find?
fc <- forecast(fit_gas, newdata=data.frame(fourier(gas2004,12,52)))
gas2005<- window(gasoline, end = 2005)
autoplot(gas2004) + autolayer(fc) + ggtitle("Forecast for 2005: Million Barrels")
The model does a pretty good job of forecasting 2005 production of gasoline.
5.7. Data set huron gives the water level of Lake Huron in feet from 1875 to 1972.
Plot the data and comment on its features. Fit a linear regression and compare this to a piecewise linear trend model with a knot at 1915. Generate forecasts from these two models for the period up to 1980 and comment on these.
plot(huron)
str(huron)
## Time-Series [1:98] from 1875 to 1972: 10.38 11.86 10.97 10.8 9.79 ...
#time(huron)
#View(huron)
Water levels show a downward trend till mid-1920s, then appear to be more stable with similar peak and trough values.
h = 1
fith_lin<- tslm(huron ~ trend)
fc_lin<- forecast(fith_lin, h = h)
t<- time(huron)
t.break<- 1920
tb<- ts(pmax(0,t-t.break), start = 1875)
fith_pw<- tslm(huron ~ t+tb)
t.new<- t[length(t)] + seq(h)
tb.new<- tb[length(tb)] + seq(h)
newdata<- cbind(t=t.new, tb = tb.new) %>% as.data.frame()
fc_pw<- forecast(fith_pw, newdata = newdata)
autoplot(huron) +
autolayer(fitted(fith_lin), series = "Linear")+
autolayer(fitted(fith_pw), series = "Piecewise")+
autolayer(fc_lin, series = "Linear", PI = FALSE)+
autolayer(fc_pw, series = "Piecewise", PI = FALSE)+
xlab("Year") + ylab("Water Level")+
ggtitle("Water Level Forecast: Linear Regression vs Piecewise")+
guides(color = guide_legend(title = " "))
The piecewise linear trend gives a better forecast, taking into account the stabilization after 1930, while the linear regression gives a consistent downward trend.
6.1
Show that a 3?5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.
6.2. The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.
autoplot(plastics) + xlab("Years")
autoplot(decompose(plastics, type = "multiplicative"))
##Below is the naive forecast of the seasonally adjusted flow data
decomp_m<- decompose(plastics, type = "multiplicative")
autoplot(plastics, series="Data") +
autolayer(trendcycle(decomp_m), series="Trend") +
autolayer(seasadj(decomp_m), series="Seasonally Adjusted") +
xlab("Year") + ylab("Sales in $") +
ggtitle("Sales of Product A") +
scale_colour_manual(values=c("gray","blue","red"),
breaks=c("Data","Seasonally Adjusted","Trend"))
## Warning: Removed 12 rows containing missing values (geom_path).
plastics2<- plastics
plastics2[30]<- plastics2[30]+500
decomp_o<- decompose(plastics2, type = "multiplicative")
autoplot(plastics, series="Data") +
autolayer(trendcycle(decomp_o), series="Trend") +
autolayer(seasadj(decomp_o), series="Seasonally Adjusted") +
xlab("Year") + ylab("Sales in $") +
ggtitle("Sales of Product A") +
scale_colour_manual(values=c("gray","blue","red"),
breaks=c("Data","Seasonally Adjusted","Trend"))
## Warning: Removed 12 rows containing missing values (geom_path).
plastics3<- plastics
plastics3[55]<- plastics3[55]+500
decomp_e<- decompose(plastics3, type = "multiplicative")
autoplot(plastics, series="Data") +
autolayer(trendcycle(decomp_e), series="Trend") +
autolayer(seasadj(decomp_e), series="Seasonally Adjusted") +
xlab("Year") + ylab("Sales in $") +
ggtitle("Sales of Product A") +
scale_colour_manual(values=c("gray","blue","red"),
breaks=c("Data","Seasonally Adjusted","Trend"))
## Warning: Removed 12 rows containing missing values (geom_path).
decomp_m %>% seasadj() %>% naive() %>%
autoplot() +
ggtitle("Sales of Product A: Seasonally Adjusted Data")
We can conclude that an outlier causes a spike in the month it is present by increasing seasonality index of that month and hence the deseasonalized data for that month observes a spike.
6.3.
Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
#names(retaildata)
ts_ret <- ts(retaildata[,"A3349566A"],
frequency=12, start=c(1982,4))
autoplot(ts_ret)
The trend grows non-linearly from 2000-2010. From 2010-2013, the trend shifts down but there is much greater variability.
ggseasonplot(ts_ret)
Over time, especially after 2008, seasonal variation has grown larger.
ggsubseriesplot(ts_ret)
ggAcf(ts_ret)
decomp_ret<- decompose(ts_ret)
autoplot(decomp_ret)
autoplot(ts_ret, series="Data") +
autolayer(trendcycle(decomp_ret), series="Trend") +
autolayer(seasadj(decomp_ret), series="Seasonally Adjusted") +
xlab("Year") + ylab("Sales in $") +
ggtitle("Retail Sales") +
scale_colour_manual(values=c("gray","blue","red"),
breaks=c("Data","Seasonally Adjusted","Trend"))
## Warning: Removed 12 rows containing missing values (geom_path).
The decomposition plot shows a lot more leakage of seasonality into remainder from 2010 onwards.
6.4 Figures 6.16 and 6.17 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Isolating the trend and seasonal component from the overall timeseries shows that the trend has increased throught the majority of the time frame, with a few stationary periods occuring in the early 90s. The monthly breakdown of the seasonal component shows that a few months esp March and December show greater peaks in the labor force than other months.
6.5 This exercise uses the cangas data (monthly Canadian gas production in billions of cubic metres, January 1960 - February 2005).
autoplot(cangas)
ggsubseriesplot(cangas)
ggseasonplot(cangas)
can_stl<- cangas %>% stl(s.window = "periodic", robust = TRUE)
autoplot(can_stl)
c. Compare the results with those obtained using SEATS and X11. How are they different?
can_seat<- cangas %>% seas()
autoplot(can_seat) + ggtitle("SEATS Decomposition of Cangas")
can_x<- cangas %>% seas(x11="")
autoplot(can_x) + ggtitle("X11 Decomposition of Cangas")
6.6 We will use the bricksq data (Australian quarterly clay brick production. 1956-1994) for this exercise.
brick_fstl<- bricksq %>% stl(s.window = "periodic")
autoplot(brick_fstl)
brick_cstl<- bricksq %>% stl(s.window = 7)
autoplot(brick_cstl)
autoplot(bricksq, series="Data") +
autolayer(trendcycle(brick_fstl), series="Trend") +
autolayer(seasadj(brick_fstl), series="Seasonally Adjusted") +
xlab("Year") + ylab("Brick Production") +
scale_colour_manual(values=c("gray","blue","red"),
breaks=c("Data","Seasonally Adjusted","Trend"))
brick_cstl %>% seasadj() %>% naive() %>%
autoplot() +
ylab("Brick Prod") + ggtitle("Naive Forecast of Seasonally Adjusted Brick Production")
d. Use stlf() to reseasonalise the results, giving forecasts for the original data.
fcast<- stlf(bricksq, method = "naive")
autoplot(fcast)
checkresiduals(fcast)
## Warning in checkresiduals(fcast): The fitted degrees of freedom is based on the
## model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + Random walk
## Q* = 40.829, df = 8, p-value = 2.244e-06
##
## Model df: 0. Total lags used: 8
There appears to be some correlation at Qtr 4 and 8. Howevere, this is acceptable for a naive forecast.
bricksq.train<-window(bricksq, end=c(1992,3))
bricksq.test<-window(bricksq, start=c(1992,4), end=c(1994,3))
bricksq_naive<-snaive(bricksq.train)
bricksq_stlf<-stlf(bricksq.train)
bricksq.fc_naive<-forecast(bricksq_naive,h=8)
bricksq.fc_stlf<-forecast(bricksq_stlf,h=8)
autoplot(bricksq, series = "Original data") +
geom_line(size = 1) +
autolayer(bricksq.fc_naive, PI = FALSE, size = 1,
series = "stlf") +
autolayer(bricksq.fc_stlf, PI = FALSE, size = 1,
series = "snaive") +
scale_color_manual(values = c("gray50", "blue", "red"),
breaks = c("Original data", "stlf", "snaive")) +
scale_x_continuous(limits = c(1990, 1994.5)) +
guides(colour = guide_legend(title = "Data")) +
ggtitle("Forecast from stlf and snaive functions") +
annotate(
"rect",
xmin=1992.75,xmax=1994.5,ymin=-Inf,ymax=Inf,
fill="lightgreen",alpha = 0.3
)
## Warning: Removed 136 rows containing missing values (geom_path).
Both stlf and naive underestimate the data. Naive follows the trend and variability a bit better.
p1_wr <- autoplot(writing, main = NULL) + xlab("Month") + ylab("Sales")
p2_wr <- ggsubseriesplot(writing, main = NULL) + xlab("Month") + ylab("Sales")
??grid.arrange
grid.arrange(p1_wr, p2_wr, ncol = 2, top = "Paper Sales (Jan 1963 - Dec 1972)")
Here the drift method seems more appropriate.
fit_wr <- stlf(writing, t.window = 13, s.window = "periodic", robust = TRUE, method = "rwdrift")
pfit_wr <- autoplot(fit_wr, main = "Paper Forecast without Box-Cox Transformation") + xlab("Month") + ylab("Sales")
fit_wrt <- stlf(writing, t.window = 13, s.window = "periodic", robust = TRUE,lambda = BoxCox.lambda(writing) )
pfit_wrt <- autoplot(fit_wrt, main = "Paper Forecast with Box-Cox Transformation") + xlab("Month") + ylab("Sales")
grid.arrange(pfit_wr, pfit_wrt, ncol = 1)
fit_fan <- stlf(fancy, t.window = 13, s.window = "periodic", robust = TRUE, method = "rwdrift")
pfit_fan <- autoplot(fit_fan, main = "Fancy Forecast without Box-Cox Transformation") + xlab("Month") + ylab("Sales")
fit_ftr <- stlf(fancy, t.window = 13, s.window = "periodic", robust = TRUE,lambda = BoxCox.lambda(fancy) )
pfit_ftr <- autoplot(fit_ftr, main = "Fancy Forecast with Box-Cox Transformation") + xlab("Month") + ylab("Sales")
grid.arrange(pfit_fan, pfit_ftr, ncol = 1)