First, let’s install the packages
library(fpp2)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.6.2
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 3.6.2
## Registered S3 method overwritten by 'xts':
## method from
## as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Loading required package: fma
## Loading required package: expsmooth
library(BETS)
##
## Attaching package: 'BETS'
## The following object is masked from 'package:stats':
##
## predict
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.6.2
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Chapter 5
Q1.
daily20 <- head(elecdaily,20)
autoplot(daily20, facets = TRUE)
reg1 <- tslm(Demand~Temperature, data = daily20)
reg1
##
## Call:
## tslm(formula = Demand ~ Temperature, data = daily20)
##
## Coefficients:
## (Intercept) Temperature
## 39.212 6.757
When the temperature goes up, people will use more refrigerates and air-conditions, which requires more electricity.
checkresiduals(reg1)
##
## Breusch-Godfrey test for serial correlation of order up to 5
##
## data: Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774
Here we can see in the residual plot that the residuals are left-skewed. Moreover, there are several outliers are smaller than others.The residual is correlated with temperature, which infers it is a model which omits necessary variables.
newTem <- data.frame(Temperature = c(15,35))
felec <- forecast(reg1, newdata = newTem)
I can partially believe the result. It shows a reasonable result.
felec
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 3.857143 140.5701 108.6810 172.4591 90.21166 190.9285
## 4.000000 275.7146 245.2278 306.2014 227.57056 323.8586
autoplot(felec)
as.data.frame(elecdaily) %>%
ggplot(aes(x = Temperature, y = Demand))+
geom_point()
The relationship between Demand and Temperature is non-linear. The truth is when temperature is below 20, the demand of electricity is negatively correlated with temperature. While when the temperature is above 20, the demand of electricity is positively correlated with temperature. A better way is to use the piecewise model.
Q2.
autoplot(mens400)
We can see that there are two interrupted intervals because of world war. The time spent decreases as time goes by.
reg2<-tslm(mens400 ~ trend, mens400)
reg2
##
## Call:
## tslm(formula = mens400 ~ trend, data = mens400)
##
## Coefficients:
## (Intercept) trend
## 50.3078 -0.2583
autoplot(mens400)+
geom_smooth(method = 'lm', se =FALSE)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3 rows containing non-finite values (stat_smooth).
Average decreasing rate per year is 0.2583.
checkresiduals(reg2)
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals from Linear regression model
## LM test = 3.6082, df = 6, p-value = 0.7295
We can find a outlier in the residual plot. The mean of residual is 0, so the fitted line does a good job.
h=2020-2016
forecast(reg2, h)
## Warning in forecast.lm(reg2, h): newdata column names not specified,
## defaulting to first variable required.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2020 42.04231 40.44975 43.63487 39.55286 44.53176
I assume that the linear decreasing trend remains. But we all know that the time is not going to be negative. So, this assumption is not useful in long term.
Q3.
easter(ausbeer)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00
The result shows which quarter contains easter in that year. There are two exceptions: If Easter is on Apirl 1st, then Qtr1 = 0.67 and Qtr2 = 0.33; If Easter is on Apirl 2nd, then Qtr1 = 0.33 and Qtr2 = 0.67.
y = e^(β0+β1*logx+ε)
logy = β0+β1*logx+ε
Take the derivatives for both sides,
1/y*dy/dx = β1/x
so, β1=(x/dx)/(y/dy) = (dy/dx)*(x/y)
autoplot(fancy)
ggseasonplot(fancy,polar=TRUE)
ggsubseriesplot(fancy)
I notice that the monthly sale increases dramatically during Christmas and moderately increases in every March.
Because monthly sales in December is much larger than other months.
Time <- time(fancy)
festival <- c()
for(i in 1:length(Time)){
month <- round(12*(Time[i] - floor(Time[i]))) + 1
year <- floor(Time[i])
if(year >= 1988 & month == 3){
festival[i] <- 1
} else {
festival[i] <- 0
}
}
reg5<-tslm(log(fancy) ~ trend+season+festival)
reg5
##
## Call:
## tslm(formula = log(fancy) ~ trend + season + festival)
##
## Coefficients:
## (Intercept) trend season2 season3 season4
## 7.61967 0.02202 0.25142 0.26608 0.38405
## season5 season6 season7 season8 season9
## 0.40949 0.44883 0.61045 0.58796 0.66933
## season10 season11 season12 festival
## 0.74739 1.20675 1.96224 0.50152
autoplot(reg5$residuals)
cbind(residuals = reg5$residuals,fitted_value=fitted(reg5)) %>%
as.data.frame() %>%
ggplot(aes(x=fitted_value,y=residuals))+
geom_point()
In these two plots, we find that the residuals are correlated with both time and fitted value.
month = factor(
month.abb[round(12*(Time - floor(Time)) + 1)],
labels = month.abb,
ordered = TRUE
)
cbind.data.frame(month=month,residuals = reg5$residuals) %>%
ggplot(aes(x=month,y=residuals))+
geom_boxplot()
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
The residual is correlated with month
reg5$coefficients
## (Intercept) trend season2 season3 season4 season5
## 7.61966701 0.02201983 0.25141682 0.26608280 0.38405351 0.40948697
## season6 season7 season8 season9 season10 season11
## 0.44882828 0.61045453 0.58796443 0.66932985 0.74739195 1.20674790
## season12 festival
## 1.96224123 0.50151509
As predicted, season 12 has the highest coefficient, and festival dummy has positive coefficient.
checkresiduals(reg5)
##
## Breusch-Godfrey test for serial correlation of order up to 17
##
## data: Residuals from Linear regression model
## LM test = 37.954, df = 17, p-value = 0.002494
P-value is 0.002494, which is lower than 0.01. So, we can tell that the model is not free of autocorrelation.
h <- rep(0, 36)
f_fancy<-forecast(reg5,h)
## Warning in forecast.lm(reg5, h): newdata column names not specified,
## defaulting to first variable required.
f_fancy
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 9.491352 9.238522 9.744183 9.101594 9.88111
## Feb 1994 9.764789 9.511959 10.017620 9.375031 10.15455
## Mar 1994 9.801475 9.461879 10.141071 9.277961 10.32499
## Apr 1994 9.941465 9.688635 10.194296 9.551707 10.33122
## May 1994 9.988919 9.736088 10.241749 9.599161 10.37868
## Jun 1994 10.050280 9.797449 10.303110 9.660522 10.44004
## Jul 1994 10.233926 9.981095 10.486756 9.844168 10.62368
## Aug 1994 10.233456 9.980625 10.486286 9.843698 10.62321
## Sep 1994 10.336841 10.084010 10.589671 9.947083 10.72660
## Oct 1994 10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994 10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994 11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995 9.755590 9.499844 10.011336 9.361338 10.14984
## Feb 1995 10.029027 9.773281 10.284773 9.634775 10.42328
## Mar 1995 10.065713 9.722498 10.408928 9.536620 10.59481
## Apr 1995 10.205703 9.949957 10.461449 9.811451 10.59996
## May 1995 10.253157 9.997411 10.508903 9.858904 10.64741
## Jun 1995 10.314518 10.058772 10.570264 9.920265 10.70877
## Jul 1995 10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995 10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995 10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995 10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995 11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995 11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996 10.019828 9.760564 10.279093 9.620151 10.41951
## Feb 1996 10.293265 10.034000 10.552530 9.893588 10.69294
## Mar 1996 10.329951 9.982679 10.677222 9.794605 10.86530
## Apr 1996 10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996 10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996 10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996 10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996 10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996 10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996 10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996 11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996 12.224288 11.965023 12.483552 11.824611 12.62396
autoplot(f_fancy)
raw_fancy<-f_fancy[c('mean','lower','upper')] %>%
as.data.frame() %>%
exp()
raw_fancy
## mean lower.1 lower.2 upper.1 upper.2
## 1 13244.70 10285.82 8969.583 17054.73 19557.43
## 2 17409.81 13520.45 11790.284 22418.00 25707.73
## 3 18060.36 12860.02 10699.594 25363.61 30484.96
## 4 20774.16 16133.21 14068.696 26750.16 30675.62
## 5 21783.73 16917.24 14752.395 28050.15 32166.37
## 6 23162.27 17987.81 15685.969 29825.24 34201.95
## 7 27831.56 21613.98 18848.111 35837.72 41096.73
## 8 27818.48 21603.82 18839.249 35820.87 41077.41
## 9 30848.42 23956.87 20891.193 39722.43 45551.50
## 10 34095.57 26478.61 23090.230 43903.67 50346.32
## 11 55176.84 42850.31 37366.903 71049.28 81475.41
## 12 120067.79 93244.59 81312.400 154607.08 177294.90
## 13 17250.40 13357.65 11629.938 22277.59 25587.08
## 14 22675.20 17558.28 15287.252 29283.31 33633.55
## 15 23522.50 16688.88 13858.024 33154.31 39926.92
## 16 27057.06 20951.33 18241.435 34942.16 40133.06
## 17 28371.96 21969.51 19127.918 36640.25 42083.42
## 18 30167.42 23359.80 20338.387 38958.95 44746.58
## 19 36248.88 28068.91 24438.412 46812.70 53767.06
## 20 36231.84 28055.72 24426.922 46790.69 53741.78
## 21 40178.16 31111.50 27087.467 51887.06 59595.26
## 22 44407.37 34386.35 29938.733 57348.77 65868.34
## 23 71864.42 55647.40 48449.831 92807.48 106594.69
## 24 156380.86 121091.75 105429.448 201954.08 231955.81
## 25 22467.57 17336.40 15065.329 29117.46 33506.86
## 26 29533.04 22788.25 19802.984 38274.14 44043.89
## 27 30636.60 21648.24 17936.708 43356.94 52328.52
## 28 35240.15 27191.96 23629.808 45670.42 52555.15
## 29 36952.72 28513.41 24778.151 47889.88 55109.18
## 30 39291.20 30317.82 26346.183 50920.48 58596.65
## 31 47211.93 36429.60 31657.322 61185.57 70409.18
## 32 47189.73 36412.48 31642.439 61156.80 70376.07
## 33 52329.57 40378.47 35088.887 67817.91 78041.33
## 34 57837.85 44628.77 38782.394 74956.52 86256.08
## 35 93598.96 72222.70 62761.521 121302.09 139588.15
## 36 203676.38 157160.50 136572.460 263959.89 303751.35
Maybe we can improve by using multiplicative method (ie,season*trend) to forecast.
a.b.
gasoline_until_2004<-window(gasoline, end=2005)
reg6<-tslm(gasoline_until_2004~trend)
cbind(gasoline = gasoline_until_2004, fitted_value = fitted(reg6))%>%
as.data.frame()%>%
ggplot(aes(x=fitted_value,y=gasoline))+
geom_point()
#Harmonic regression is too difficult for me...
for(num in c(1:10)){
var_name <- paste("tslm_ft",
as.character(num),
"_gasoline_until_2004",
sep = "")
assign(var_name,
tslm(gasoline_until_2004 ~ trend + fourier(
gasoline_until_2004,
K = num
))
)
print(
autoplot(gasoline_until_2004) +
autolayer(get(var_name)$fitted.values,
series = as.character(num)) +
ggtitle(var_name) +
ylab("gasoline") +
guides(colour = guide_legend(title = "Number of Fourier Transform pairs"))
)
}
autoplot(gasoline_until_2004) +
autolayer(tslm_ft1_gasoline_until_2004$fitted.values, series = "1") +
autolayer(tslm_ft2_gasoline_until_2004$fitted.values, series = "2") +
autolayer(tslm_ft3_gasoline_until_2004$fitted.values, series = "3") +
autolayer(tslm_ft5_gasoline_until_2004$fitted.values, series = "5") +
autolayer(tslm_ft10_gasoline_until_2004$fitted.values, series = "10") +
guides(colour = guide_legend(title = "Fourier Transform pairs")) +
scale_color_discrete(breaks = c(1:10))
for(i in c(1:10)){
tslm_ft_gasoline_until_2004.name <- paste(
"tslm_ft", as.character(i), "_gasoline_until_2004",
sep = ""
)
writeLines(
paste(
"\n", tslm_ft_gasoline_until_2004.name, "\n"
)
)
print(CV(get(tslm_ft_gasoline_until_2004.name)))
}
##
## tslm_ft1_gasoline_until_2004
##
## CV AIC AICc BIC AdjR2
## 8.201921e-02 -1.813292e+03 -1.813208e+03 -1.790354e+03 8.250858e-01
##
## tslm_ft2_gasoline_until_2004
##
## CV AIC AICc BIC AdjR2
## 8.136854e-02 -1.819113e+03 -1.818957e+03 -1.787001e+03 8.269569e-01
##
## tslm_ft3_gasoline_until_2004
##
## CV AIC AICc BIC AdjR2
## 7.658846e-02 -1.863085e+03 -1.862834e+03 -1.821797e+03 8.375702e-01
##
## tslm_ft4_gasoline_until_2004
##
## CV AIC AICc BIC AdjR2
## 7.648608e-02 -1.864112e+03 -1.863742e+03 -1.813649e+03 8.382404e-01
##
## tslm_ft5_gasoline_until_2004
##
## CV AIC AICc BIC AdjR2
## 7.553646e-02 -1.873234e+03 -1.872723e+03 -1.813596e+03 8.406928e-01
##
## tslm_ft6_gasoline_until_2004
##
## CV AIC AICc BIC AdjR2
## 0.0725333 -1902.7534284 -1902.0773721 -1833.9401782 0.8474536
##
## tslm_ft7_gasoline_until_2004
##
## CV AIC AICc BIC AdjR2
## 0.0714734 -1913.5362459 -1912.6718391 -1835.5478956 0.8501073
##
## tslm_ft8_gasoline_until_2004
##
## CV AIC AICc BIC AdjR2
## 7.147431e-02 -1.913619e+03 -1.912542e+03 -1.826455e+03 8.505267e-01
##
## tslm_ft9_gasoline_until_2004
##
## CV AIC AICc BIC AdjR2
## 7.161451e-02 -1.912292e+03 -1.910980e+03 -1.815954e+03 8.506543e-01
##
## tslm_ft10_gasoline_until_2004
##
## CV AIC AICc BIC AdjR2
## 7.135754e-02 -1.915014e+03 -1.913441e+03 -1.809500e+03 8.516102e-01
Therefore, we know that K=7 has the lowest CV and AICc value.
gasoline_until_2004<-window(gasoline, end=2005)
tslm_ft7_gasoline_until_2004 <- tslm(
gasoline_until_2004 ~ trend + fourier(
gasoline_until_2004,
K = 7
)
)
checkresiduals(tslm_ft7_gasoline_until_2004)
##
## Breusch-Godfrey test for serial correlation of order up to 104
##
## data: Residuals from Linear regression model
## LM test = 149.31, df = 104, p-value = 0.002404
tslm_forecast7 <- forecast(
tslm_ft7_gasoline_until_2004,
newdata = data.frame(fourier(
gasoline_until_2004,
K = 7,
h=52
))
)
#Since we already know K=7 has the lowest CV and AICc value.
autoplot(tslm_forecast7)+
autolayer(window(
gasoline,
start = 2005,
end=2006
)
)+
scale_x_continuous(limits=c(2005,2006))
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
## Warning: Removed 726 row(s) containing missing values (geom_path).
# This forecast does a not bad job. Its interval fits 90% of the actual value.
autoplot(huron)
# It decreases over time
#linear
fit.lin<- tslm(huron~trend)
#piecewise
t <- time(huron)
t.break <- 1915
tb1 <- ts(pmax(0,t-t.break), start=1875)
fit.pw <- tslm(huron ~ t+tb1)
h<-1980-t[length(t)]
t.new<- t[length(t)]+seq(h)
tb1.new<- tb1[length(tb1)]+seq(h)
newdata <- cbind(t= t.new, tb1= tb1.new) %>%
as.data.frame()
fc_linear <- forecast(fit.lin, h=h)
fc_piecewise <- forecast(fit.pw, newdata=newdata)
autoplot(huron)+
autolayer(fc_linear, series='Linear', PI=FALSE)+
autolayer(fc_piecewise, series='Piecewise',PI=FALSE)
# These two model is not continuous and smooth, which infers they are not good forecasts.
Chapter 6.
T = 1/3(1/5Yt-3 + 1/5Yt-2 + 1/5Yt-1 + 1/5Yt + 1/5Yt+1)+ 1/3(1/5Yt-2 + 1/5Yt-1 + 1/5Yt + 1/5Yt+1 + 1/5Yt+2)+ 1/3(1/5Yt-1 + 1/5Yt + 1/5Yt+1 + 1/5Yt+2 + 1/5Yt+3) = 0.067Yt-3 + 0.133Yt-2 + 0.200Yt-1 + 0.200Yt + 0.200Yt+1 + 0.133Yt+2 + 0.133Yt+3
autoplot(plastics)
I can identify a significant seasonal component and positive trend.
dc_plas<-decompose(plastics, 'multiplicative')
autoplot(dc_plas)
Yes. It shows a positive trend and seasonal effect.
autoplot(plastics, series='Data')+
autolayer(trendcycle(dc_plas),series='Trend')+
autolayer(seasadj(dc_plas), series='Seasonally Adjusted')+
xlab("Year") + ylab("plastics manufacture") +
ggtitle("product A for a plastics manufacturer for five years") +
scale_colour_manual(values=c("gray","blue","red"))
## Warning: Removed 12 row(s) containing missing values (geom_path).
plastics1 <- plastics
plastics1[15]<-plastics1[15]+500
dc_plas<-decompose(plastics1, 'multiplicative')
autoplot(dc_plas)
autoplot(plastics1, series='Data')+
autolayer(trendcycle(dc_plas),series='Trend')+
autolayer(seasadj(dc_plas), series='Seasonally Adjusted')+
xlab("Year") + ylab("plastics manufacture") +
ggtitle("product A for a plastics manufacturer for five years") +
scale_colour_manual(values=c("gray","blue","red"))
## Warning: Removed 12 row(s) containing missing values (geom_path).
The seasonally adjusted data changes dramatically at the time point of outlier. But it affects trend line little.
plastics2 <- plastics
plastics2[55]<-plastics2[55]+500
dc_plas<-decompose(plastics2, 'multiplicative')
autoplot(dc_plas)
autoplot(plastics2, series='Data')+
autolayer(trendcycle(dc_plas),series='Trend')+
autolayer(seasadj(dc_plas), series='Seasonally Adjusted')+
xlab("Year") + ylab("plastics manufacture") +
ggtitle("product A for a plastics manufacturer for five years") +
scale_colour_manual(values=c("gray","blue","red"))
## Warning: Removed 12 row(s) containing missing values (geom_path).
If the outlier is near the end, the effect to trend will decrease.
library(seasonal)
## Warning: package 'seasonal' was built under R version 3.6.2
retail<- read.csv('/Users/jiahaolin/Downloads/Forecasting/HW1/tute1.csv')
ts_retail <- ts(retail[,"Sales"],
frequency=12,
start=c(1982,4))
ts_retail %>% seas(x11="") -> fit
autoplot(fit) +
ggtitle("X11 decomposition of retail")
There are trend and seasonlity
Overall, the number of persons in the civilian labour force in Australia is increasing. The size of seasonal effect is increasing overtime. There is a positive lonbg-term trend. Several residual outlier exist in period 1991-1992
We can find that the seasonal component is increase in the period of 1991/1992, so it is visible. But not so significant.
autoplot(cangas)
ggsubseriesplot(cangas)
ggseasonplot(cangas)
The gas production increases in winter and decreases in summer. The possible reason is that Canada is high latitude. Therefore, in winter the demand of gas for heating increases.
stl_cangas <- stl(cangas, s.window = 13)
autoplot(stl_cangas)
x11_cangas <- seas(cangas, x11 = "")
seats_cangas <- seas(cangas)
autoplot(cangas, series = "Data") +
autolayer(seasadj(stl_cangas), series = "Seasonally Adjusted") +
autolayer(trendcycle(stl_cangas), series = "Trend-cycle") +
scale_color_manual(values = c("gray", "blue", "red"),
breaks = c("Data", "Seasonally Adjusted", "Trend-cycle"))
autoplot(cangas, series = "Data") +
autolayer(seasadj(x11_cangas), series = "Seasonally Adjusted") +
autolayer(trendcycle(x11_cangas), series = "Trend-cycle") +
scale_color_manual(values = c("gray", "blue", "red"),
breaks = c("Data", "Seasonally Adjusted", "Trend-cycle"))
autoplot(cangas, series = "Data") +
autolayer(seasadj(seats_cangas), series = "Seasonally Adjusted") +
autolayer(trendcycle(seats_cangas), series = "Trend-cycle") +
scale_color_manual(values = c("gray", "blue", "red"),
breaks = c("Data", "Seasonally Adjusted", "Trend-cycle"))
The trend for stl_decomposition method is more smooth, and the seasonally adjusted curve is more variable.
stl_changing_bricksq <- stl(bricksq, s.window = 13)
stl_fixed_bricksq <- stl(bricksq, s.window = 'periodic')
autoplot(stl_changing_bricksq)
autoplot(stl_fixed_bricksq)
autoplot(bricksq, series = "Data") +
autolayer(trendcycle(stl_fixed_bricksq),
series = "Trend-cycle") +
autolayer(seasadj(stl_fixed_bricksq),
series = "Seasonally Adjusted Data") +
scale_color_manual(values = c("gray", "red", "blue"),
breaks = c("Data", "Trend-cycle", "Seasonally Adjusted Data"))
autoplot(bricksq, series = "Data") +
autolayer(trendcycle(stl_changing_bricksq),
series = "Trend-cycle") +
autolayer(seasadj(stl_changing_bricksq),
series = "Seasonally Adjusted Data") +
scale_color_manual(values = c("gray", "red", "blue"),
breaks = c("Data", "Trend-cycle", "Seasonally Adjusted Data"))
stl_fixed_bricksq %>% seasadj() %>% naive() %>% autoplot()
stl_changing_bricksq %>% seasadj() %>% naive() %>% autoplot()
bricksq %>% stlf() %>% autoplot()
bricksq %>% stlf() %>% checkresiduals()
## Warning in checkresiduals(.): The fitted degrees of freedom is based on the
## model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(M,N,N)
## Q* = 41.128, df = 6, p-value = 2.733e-07
##
## Model df: 2. Total lags used: 8
They look correlated.
stlf_brick_robust <- stlf(bricksq, robust = TRUE)
autoplot(stlf_brick_robust)
checkresiduals(stlf_brick_robust)
## Warning in checkresiduals(stlf_brick_robust): The fitted degrees of freedom
## is based on the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(M,N,N)
## Q* = 28.163, df = 6, p-value = 8.755e-05
##
## Model df: 2. Total lags used: 8
It does not make a big difference.
trainset_brick <- subset(bricksq,
end = length(bricksq) - 8)
testset_brick <- subset(bricksq,
start = length(bricksq) - 7)
snaive_brick <- snaive(trainset_brick)
stlf_brick_part <- stlf(trainset_brick, robust = TRUE)
autoplot(bricksq, series = "Original data") +
geom_line(size = 1) +
autolayer(stlf_brick_part, PI = FALSE, size = 1,
series = "stlf") +
autolayer(snaive_brick, PI = FALSE, size = 1,
series = "snaive") +
scale_color_manual(values = c("gray50", "blue", "red"),
breaks = c("Original data", "stlf", "snaive")) +
scale_x_continuous(limits = c(1990, 1994.5)) +
scale_y_continuous(limits = c(300, 600)) +
guides(colour = guide_legend(title = "Data")) +
ggtitle("Forecast from stlf and snaive functions") +
annotate(
"rect",
xmin=1992.75,xmax=1994.5,ymin=-Inf,ymax=Inf,
fill="lightgreen",alpha = 0.3
)
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
## Warning: Removed 136 row(s) containing missing values (geom_path).
Can see that the stlf curve is closer to the original data
stlf_writing_rwdridt <- stlf(writing,
s.window = 13,
robust = TRUE,
lambda = BoxCox.lambda(writing),
method = "rwdrift")
autoplot(stlf_writing_rwdridt)
stlf_writing_naive <- stlf(writing,
s.window = 13,
robust = TRUE,
lambda = BoxCox.lambda(writing),
method = "naive")
autoplot(stlf_writing_naive)
Can see it is better to use rwdrift
stlf_fancy_rwdridt <- stlf(fancy,
s.window = 13,
robust = TRUE,
lambda = BoxCox.lambda(fancy),
method = "rwdrift")
autoplot(stlf_fancy_rwdridt)
stlf_fancy_naive <- stlf(fancy,
s.window = 13,
robust = TRUE,
lambda = BoxCox.lambda(fancy),
method = "naive")
autoplot(stlf_fancy_naive)
It is better to use rwdrift