Libraries
library(knitr)
library(fpp2)
## Warning: package 'fpp2' was built under R version 3.6.2
## Registered S3 method overwritten by 'xts':
## method from
## as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────── fpp2 2.4 ──
## ✓ ggplot2 3.3.2 ✓ fma 2.4
## ✓ forecast 8.13 ✓ expsmooth 2.3
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'forecast' was built under R version 3.6.2
##
library(seasonal)
## Warning: package 'seasonal' was built under R version 3.6.2
daily20 <- head(elecdaily,20)
autoplot(daily20)
regq1 <- tslm(formula = Demand~Temperature,data = daily20)
summary(regq1)
##
## Call:
## tslm(formula = Demand ~ Temperature, data = daily20)
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.060 -7.117 -1.437 17.484 27.102
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.2117 17.9915 2.179 0.0428 *
## Temperature 6.7572 0.6114 11.052 1.88e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22 on 18 degrees of freedom
## Multiple R-squared: 0.8716, Adjusted R-squared: 0.8644
## F-statistic: 122.1 on 1 and 18 DF, p-value: 1.876e-09
As the temperature increases by 1C, on average, the demand for energy increases by 6.75 units. There is a positive relationship because people in Australia demand energy to cool their houses when the temperature increases.
residuals(regq1)
## Time Series:
## Start = c(1, 4)
## End = c(4, 2)
## Frequency = 7
## 1 2 3 4 5 6
## -40.0032629 -6.0369497 -0.3052144 -2.5691341 -46.0601058 24.0754737
## 7 8 9 10 11 12
## 25.5466767 -19.0221433 -30.0675910 -10.3589677 11.2234590 -3.6194867
## 13 14 15 16 17 18
## 12.7351505 -3.4857856 24.3568150 16.5137374 2.4974049 20.3937519
## 19 20
## -2.9162723 27.1024444
checkresiduals(regq1)
##
## Breusch-Godfrey test for serial correlation of order up to 5
##
## data: Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774
It is apparent that autocorrelation shows a singificant spike at Lag 4, but it stays within the determined bounds. Also, the histogram exhibits a slight left skew. These are factors to note, but it does not significantly impact the model’s efficacy.
data.1c= data.frame(Temperature = 15)
fcast.1c = forecast(regq1, h=1, newdata = data.1c)
autoplot(fcast.1c)
data.1cc= data.frame(Temperature = 35)
fcast.1cc = forecast(regq1, h=1, newdata = data.1cc)
autoplot(fcast.1cc)
The forecast for the range of energy consumption at 15C (60F) is reasonably accurate. It’s likely that houses will have light heat running, especially during the night. The forecast for 35C is also reasonably accurate. Considering that it gets up to 45C in Victoria (110F!), we can expect that the highest energy consumption would be during the very hottest days of the year. 35C appears to be normal summer weather and would not reflect peak energy consumption.
autoplot(daily20, facets=TRUE)
daily20 %>%
as.data.frame() %>%
ggplot(aes(x=Temperature, y=Demand)) +
geom_point() +
geom_smooth(method="lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'
fit <- tslm(Demand ~ Temperature, data=daily20)
checkresiduals(fit)
##
## Breusch-Godfrey test for serial correlation of order up to 5
##
## data: Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774
forecast(fit, newdata=data.frame(Temperature=c(15,35)))
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 4.285714 140.5701 108.6810 172.4591 90.21166 190.9285
## 4.428571 275.7146 245.2278 306.2014 227.57056 323.8586
Using the given code:
Thus 15C at 80% lies at (108.6810, 172.4591). At 95%, (90.21166, 190.9285)
Thus 35C at 80% lies at (245.2278, 306.2014). At 95%, (227.57056, 323.8586)
plot(Demand~Temperature, data=elecdaily)
From the full dataset, it’s apparent that energy consumption tends to increase when Temperature is less than 20 and greater than 40. Considering that the data in daily20 had many observations at >40C, it exhibited a strong increase in energy consumption as Temperature increased.
autoplot(mens400)
From the plot, it is clear that competition for the Men’s 400M event has become more competitive over the years. Overall the data shows a general downnward tend. There is also missing data for years 1916, 1940, and 1944 due to World Wars.
timemens400 <- time(mens400)
regq2 <- tslm(mens400~timemens400, data = mens400)
summary(regq2)
##
## Call:
## tslm(formula = mens400 ~ timemens400, data = mens400)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6002 -0.5747 -0.2858 0.5751 4.1505
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 172.481477 11.487522 15.02 2.52e-14 ***
## timemens400 -0.064574 0.005865 -11.01 2.75e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.136 on 26 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.8234, Adjusted R-squared: 0.8166
## F-statistic: 121.2 on 1 and 26 DF, p-value: 2.752e-11
As the time increases by 1 year, on average, the winning time for the event decreases by about 0.06 seconds. In other words, every Olympic Games, Men’s 400 winning times decreases by about 0.26 seconds.
autoplot(mens400) +
geom_abline(slope = regq2$coefficients[2],intercept = regq2$coefficients[1],color="violet")
cbind(Time = timemens400, Residuals = regq2$residuals) %>%
as.data.frame() %>%
ggplot(aes(x = Time, y = Residuals)) +
geom_point()
## Warning: Removed 3 rows containing missing values (geom_point).
residuals(regq2)
## Time Series:
## Start = 1896
## End = 2016
## Frequency = 0.25
## 1 2 3 4 5 6
## 4.15053282 -0.39117180 -0.33287642 0.72541896 -0.81628566 NA
## 7 8 9 10 11 12
## 1.10030510 -0.64139952 -0.18310414 -1.52480876 -0.96651338 NA
## 13 14 15 16 17 18
## NA -0.49162724 -0.53333186 0.52496352 -1.01674110 -0.55844572
## 19 20 21 22 23 24
## -1.60015034 -0.48185496 -0.62355958 -0.02526420 -0.09696882 -0.23867344
## 25 26 27 28 29 30
## -0.35037806 -0.10208268 0.50621270 0.92450808 0.93280346 1.38109884
## 31
## 0.72939422
checkresiduals(regq2)
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals from Linear regression model
## LM test = 3.6082, df = 6, p-value = 0.7295
It’s clear that there is a significant outlier at 1896. Outside of this, the model appears to suit the data.
fcast.2d = forecast(regq2, newdata = data.frame(timemens400 = 2020))
autoplot(mens400) + autolayer(fcast.2d)
fcast.2d
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2020 42.04231 40.44975 43.63487 39.55286 44.53176
fcast.2d$lower
## Time Series:
## Start = 2020
## End = 2020
## Frequency = 0.25
## Series 1 Series 2
## 2020 40.44975 39.55286
fcast.2d$upper
## Time Series:
## Start = 2020
## End = 2020
## Frequency = 0.25
## Series 1 Series 2
## 2020 43.63487 44.53176
The predicted winning time for 2020 is 42.04. The prediction intervals assume normality for the data.
easter(ausbeer)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00
The command shows a dummy variable representing when the holiday Easter falls - “1” for either in March or April and “0” otherwise. When Easter starts in March but finishes in April, the variable is split proportionately between the two months.
log(y)=β0+β1log(x)+ε
can be transformed into
y = e^(β0+β1log(x)+ε)
to put y in terms of x. Using elasticity, β1 = (dy/dx)×(x/y), the equation may further be transformed into
y = e^(β0+(dy/dx)*(x/y)log(x)+ε)
autoplot(fancy)
It’s clear from the graph that the sales for the souvenir shop is highly seasonal, presumably for holiday gifts. The spikes at the end of 1993 and 1994 are significantly higher than previous years. Because of the seasonal nature of the data, it’s a useful tool to take logs.
Time = time(fancy)
surf.dummy = c()
for(i in 1:length(Time)){
month = round(12*(Time[i]-floor(Time[i]))) + 1
year = floor(Time[i])
if(year>=1998 & month == 3){
surf.dummy[i] = 1
} else {
surf.dummy[i] = 0}
}
regq5 = tslm(log(fancy)~ trend+season+surf.dummy, data=fancy)
summary(regq5)
##
## Call:
## tslm(formula = log(fancy) ~ trend + season + surf.dummy, data = fancy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.41644 -0.12619 0.00608 0.11389 0.38567
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6058604 0.0768740 98.939 < 2e-16 ***
## trend 0.0223930 0.0008448 26.508 < 2e-16 ***
## season2 0.2510437 0.0993278 2.527 0.013718 *
## season3 0.6952066 0.0993386 6.998 1.18e-09 ***
## season4 0.3829341 0.0993565 3.854 0.000252 ***
## season5 0.4079944 0.0993817 4.105 0.000106 ***
## season6 0.4469625 0.0994140 4.496 2.63e-05 ***
## season7 0.6082156 0.0994534 6.116 4.69e-08 ***
## season8 0.5853524 0.0995001 5.883 1.21e-07 ***
## season9 0.6663446 0.0995538 6.693 4.27e-09 ***
## season10 0.7440336 0.0996148 7.469 1.61e-10 ***
## season11 1.2030164 0.0996828 12.068 < 2e-16 ***
## season12 1.9581366 0.0997579 19.629 < 2e-16 ***
## surf.dummy NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1858 on 71 degrees of freedom
## Multiple R-squared: 0.9527, Adjusted R-squared: 0.9447
## F-statistic: 119.1 on 12 and 71 DF, p-value: < 2.2e-16
autoplot(regq5$residuals)
The residuals show no discernable pattern.
cbind.data.frame(
Month = factor(
month.abb[round(12*(Time - floor(Time)) + 1)],
labels = month.abb,
ordered = TRUE
),
Residuals = regq5$residuals
) %>%
ggplot(aes(x = Month,
y = Residuals)) +
geom_boxplot()
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
regq5$coefficients
## (Intercept) trend season2 season3 season4 season5
## 7.60586042 0.02239298 0.25104367 0.69520658 0.38293405 0.40799437
## season6 season7 season8 season9 season10 season11
## 0.44696253 0.60821562 0.58535238 0.66634464 0.74403359 1.20301639
## season12 surf.dummy
## 1.95813657 NA
The coefficients reveal a positive trend for increasing sales over time.
#checkresiduals(regq5)
checkresiduals(regq5) in the R Markdown runs but returns an error, ‘element (31, 31) is zero, so the inverse cannot be computed.’ It does not knit.
The Breusch-Godfrey Test reveals significant spikes at 24, 26, 28, 21, 2, 1, and 10, indicating autocorrelation.
gas.04 <- window(gasoline,end=2005)
autoplot(gas.04)
regq6.1 <- tslm(gas.04~trend + fourier(gas.04, K=3))
regq6.2 <- tslm(gas.04~trend + fourier(gas.04, K=8))
regq6.3 <- tslm(gas.04~trend + fourier(gas.04, K=14))
autoplot(gas.04, ylab = "Gas, Millions of Barrels") + autolayer(fitted(regq6.1), series = "K=3") + autolayer(fitted(regq6.2), series = "K=8") + autolayer(fitted(regq6.3), series = "K=14")
AIC(regq6.1)
## [1] 197.2137
AIC(regq6.2)
## [1] 146.6802
AIC(regq6.3)
## [1] 146.0811
Of all of the regressions, regq6.3 with K=14 has the lowest AIC which indicates the best fit.
checkresiduals(regq6.3)
##
## Breusch-Godfrey test for serial correlation of order up to 104
##
## data: Residuals from Linear regression model
## LM test = 154.88, df = 104, p-value = 0.0009033
fc.gas <- forecast(regq6.3, newdata=data.frame(fourier(gas.04,14,52)))
fc.gas
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005.014 8.688467 8.344908 9.032027 8.162623 9.214312
## 2005.033 8.652196 8.308515 8.995877 8.126166 9.178227
## 2005.052 8.681805 8.338152 9.025458 8.155817 9.207793
## 2005.071 8.646715 8.303063 8.990367 8.120729 9.172701
## 2005.090 8.662688 8.319198 9.006178 8.136950 9.188427
## 2005.110 8.802433 8.459097 9.145769 8.276931 9.327935
## 2005.129 8.916500 8.573165 9.259834 8.390999 9.442000
## 2005.148 8.915439 8.572131 9.258747 8.389980 9.440898
## 2005.167 8.923797 8.580485 9.267108 8.398332 9.449262
## 2005.186 9.014273 8.670958 9.357587 8.488803 9.539743
## 2005.205 9.075322 8.732011 9.418633 8.549858 9.600786
## 2005.225 9.045933 8.702627 9.389238 8.520477 9.571388
## 2005.244 9.023517 8.680207 9.366828 8.498054 9.548980
## 2005.263 9.070471 8.727159 9.413782 8.545006 9.595935
## 2005.282 9.126463 8.783157 9.469769 8.601006 9.651920
## 2005.301 9.140407 8.797100 9.483714 8.614949 9.665865
## 2005.320 9.133874 8.790564 9.477184 8.608411 9.659337
## 2005.340 9.148640 8.805332 9.491949 8.623180 9.674101
## 2005.359 9.215731 8.872426 9.559036 8.690275 9.741186
## 2005.378 9.309807 8.966499 9.653115 8.784347 9.835267
## 2005.397 9.326552 8.983242 9.669861 8.801090 9.852013
## 2005.416 9.236585 8.893278 9.579891 8.711127 9.762042
## 2005.435 9.205722 8.862416 9.549028 8.680266 9.731179
## 2005.455 9.365368 9.022059 9.708677 8.839907 9.890829
## 2005.474 9.560867 9.217559 9.904176 9.035407 10.086327
## 2005.493 9.573649 9.230344 9.916954 9.048193 10.099105
## 2005.512 9.469964 9.126657 9.813271 8.944505 9.995423
## 2005.531 9.455877 9.112568 9.799186 8.930415 9.981338
## 2005.550 9.525789 9.182482 9.869096 9.000331 10.051247
## 2005.570 9.539059 9.195753 9.882364 9.013603 10.064515
## 2005.589 9.505437 9.162128 9.848745 8.979976 10.030897
## 2005.608 9.523581 9.180272 9.866890 8.998120 10.049042
## 2005.627 9.563090 9.219784 9.906396 9.037634 10.088546
## 2005.646 9.531682 9.188375 9.874988 9.006224 10.057139
## 2005.665 9.439316 9.096007 9.782625 8.913854 9.964778
## 2005.685 9.339393 8.996085 9.682700 8.813933 9.864852
## 2005.704 9.237800 8.894495 9.581106 8.712345 9.763256
## 2005.723 9.159260 8.815951 9.502569 8.633799 9.684721
## 2005.742 9.164571 8.821261 9.507880 8.639109 9.690033
## 2005.761 9.244484 8.901178 9.587790 8.719027 9.769941
## 2005.780 9.319073 8.975767 9.662380 8.793616 9.844530
## 2005.800 9.349678 9.006367 9.692988 8.824214 9.875141
## 2005.819 9.348996 9.005687 9.692306 8.823535 9.874458
## 2005.838 9.318053 8.974748 9.661358 8.792598 9.843508
## 2005.857 9.256287 8.912977 9.599597 8.730824 9.781750
## 2005.876 9.182031 8.838718 9.525343 8.656564 9.707497
## 2005.895 9.116522 8.773214 9.459831 8.591062 9.641983
## 2005.915 9.114523 8.771216 9.457830 8.589065 9.639981
## 2005.934 9.246152 8.902832 9.589471 8.720674 9.771629
## 2005.953 9.430333 9.087014 9.773652 8.904856 9.955809
## 2005.972 9.423424 9.080112 9.766735 8.897959 9.948888
## 2005.991 9.149290 8.805908 9.492671 8.623717 9.674862
autoplot(fc.gas)
gas.05 <- window(gasoline, start=2005, end=2006)
autoplot(gas.05, series="Observed", color = "brown") + autolayer(fc.gas$mean, series="Forecast", color="blue")
autoplot(huron)
The data shows the level of Lake Huron in feet from 1875-1972. There appears to be some seasonal changes in the level of the lake. This could be due to the water expanding and contracting with temperature changes or due to icy conditions.
regq7 <- tslm(huron~trend, data=huron)
summary(regq7)
##
## Call:
## tslm(formula = huron ~ trend, data = huron)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.50997 -0.72726 0.00083 0.74402 2.53565
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.202037 0.230111 44.335 < 2e-16 ***
## trend -0.024201 0.004036 -5.996 3.55e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared: 0.2725, Adjusted R-squared: 0.2649
## F-statistic: 35.95 on 1 and 96 DF, p-value: 3.545e-08
Time <- (time(huron))
knot <- ts(pmax(0, Time-1915), start = 1875)
piecewise.huron <- tslm(huron~Time+knot)
summary(piecewise.huron)
##
## Call:
## tslm(formula = huron ~ Time + knot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.49626 -0.66240 -0.07139 0.85163 2.39222
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 132.90870 19.97687 6.653 1.82e-09 ***
## Time -0.06498 0.01051 -6.181 1.58e-08 ***
## knot 0.06486 0.01563 4.150 7.26e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.045 on 95 degrees of freedom
## Multiple R-squared: 0.3841, Adjusted R-squared: 0.3711
## F-statistic: 29.62 on 2 and 95 DF, p-value: 1.004e-10
autoplot(huron) + autolayer(fitted(piecewise.huron), series = "Piecewise", color = "green") + autolayer(fitted(regq7), series = "Linear", color="orange")
The multiple R2 and the adjusted R2 is higher in the piecewise model. The piecewise plot reveals that the general trend since the 1910’s is, the lake tends to fluctuate between 6-11 but with no general trend up or down.
3x5 MA = [(Y1 + Y2 + Y3 + Y4 + Y5) / 5 + (Y1 + Y2 + Y3 + Y4 + Y5 + Y6) / 5 + (Y1 + Y2 + Y3 + Y4 + Y5 + Y6 + Y7) / 5] / 3
= 1/15Y1 + 2/15Y2 + 3/15Y3 + 3/15Y4 + 3/15Y5 + 2/15Y6 + 1/15Y7
Weights = (0.067, 0.133, 0.200, 0.200, 0.200, 0.133, 0.067)
0.067 is roughly equal to 1/15.
autoplot(plastics)
There is a general upward trend with apparent seasonality.
decompq1 <- decompose(plastics, type = "multiplicative")
decompq1
## $x
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 742 697 776 898 1030 1107 1165 1216 1208 1131 971 783
## 2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
## 3 896 793 885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4 951 861 938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
##
## $seasonal
## Jan Feb Mar Apr May Jun Jul
## 1 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 2 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 3 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 4 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 5 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## Aug Sep Oct Nov Dec
## 1 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 2 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 3 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 4 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 5 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
##
## $trend
## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500 NA
## Aug Sep Oct Nov Dec
## 1 977.0417 977.0833 978.4167 982.7083 990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5 NA NA NA NA NA
##
## $random
## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 1.0247887
## 2 0.9656005 0.9745267 0.9750081 0.9894824 1.0061175 1.0024895 1.0401641
## 3 1.0454117 0.9953920 1.0079773 1.0142083 0.9990100 0.9854384 0.9567618
## 4 1.0257400 0.9924762 0.9807020 0.9798704 0.9684851 0.9627557 0.9917766
## 5 0.9767392 1.0510964 1.0498039 1.0299302 1.0398787 1.0628077 NA
## Aug Sep Oct Nov Dec
## 1 1.0157335 1.0040354 0.9724119 0.9961368 0.9489762
## 2 1.0230774 1.0040674 0.9962088 0.9735577 0.9721203
## 3 0.9969907 1.0132932 1.0314752 0.9910657 1.0258002
## 4 0.9776897 0.9920952 1.0133954 1.0527311 1.0665946
## 5 NA NA NA NA NA
##
## $figure
## [1] 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## [8] 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
##
## $type
## [1] "multiplicative"
##
## attr(,"class")
## [1] "decomposed.ts"
autoplot(decompq1)
Yes; the decomposition also shows a general upward trend with seasonality.
autoplot(plastics, series="Observed") +
autolayer(trendcycle(decompq1), series="Trend") +
autolayer(seasadj(decompq1), series="Seasonal")
## Warning: Removed 12 row(s) containing missing values (geom_path).
plastics[15] = plastics[15] + 500
multdecomp1 = decompose(plastics, type= "multiplicative")
autoplot(multdecomp1)
The outlier adds a significant spike to the remainder graph. It also impacts the trend line data and adds irregularity to the seasonal data. The point in time at which the outlier lies makes a singificant difference; it could possibly accelerate the trend line or demonstrate abnormality for otherwise typical seasonal data.
retaildata <- readxl::read_excel("/Users/nelsonwhite/Documents/ms applied economics/Predictive Analytics:Forecasting/assignment 1/retail.xlsx", skip=1)
head(retaildata)
## # A tibble: 6 x 190
## `Series ID` A3349335T A3349627V A3349338X A3349398A A3349468W
## <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1982-04-01 00:00:00 303. 41.7 63.9 409. 65.8
## 2 1982-05-01 00:00:00 298. 43.1 64 405. 65.8
## 3 1982-06-01 00:00:00 298 40.3 62.7 401 62.3
## 4 1982-07-01 00:00:00 308. 40.9 65.6 414. 68.2
## 5 1982-08-01 00:00:00 299. 42.1 62.6 404. 66
## 6 1982-09-01 00:00:00 305. 42 64.4 412. 62.3
## # … with 184 more variables: A3349336V <dbl>, A3349337W <dbl>,
## # A3349397X <dbl>, A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>,
## # A3349790V <dbl>, A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>,
## # A3349873A <dbl>, A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>,
## # A3349789K <dbl>, A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>,
## # A3349799R <dbl>, A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>,
## # A3349416V <dbl>, A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>,
## # A3349727C <dbl>, A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>,
## # A3349349F <dbl>, A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>,
## # A3349566A <dbl>, A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>,
## # A3349561R <dbl>, A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>,
## # A3349637X <dbl>, A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>,
## # A3349719C <dbl>, A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>,
## # A3349480L <dbl>, A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>,
## # A3349481R <dbl>, A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>,
## # A3349654A <dbl>, A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>,
## # A3349656F <dbl>, A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>,
## # A3349360V <dbl>, A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>,
## # A3349575C <dbl>, A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>,
## # A3349433W <dbl>, A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>,
## # A3349815C <dbl>, A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>,
## # A3349742A <dbl>, A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>,
## # A3349824F <dbl>, A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>,
## # A3349434X <dbl>, A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>,
## # A3349908R <dbl>, A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>,
## # A3349365F <dbl>, A3349746K <dbl>, …
myts <- ts(retaildata[,"A3349873A"],
frequency=12, start=c(1982,4))
decompq3 <- seas(myts, x11 = "")
autoplot(decompq3)
As with the previous decomposition, the outliers are apparent in the remainder graph. They correspond with hills, valleys, and other irregulatiries in the trend line.
Overall, the graph reveals a general upward trend and seasonality for the data. During the early 1990’s, the trend paused when there was a dip in the labor force; this is apparent through the trend and data graphs. The remainder graph reinforces this because it dips all the way to -400. The recession of 1991/1992 correlate with these phenomena; from looking at the graphs, one could guess that there was a shock to the economy during these years.
autoplot(cangas)
ggsubseriesplot(cangas)
ggseasonplot(cangas)
It’s apparent that the demand for Canadian gas has steadily increased over time, with a strong seasonal component. Considering Canada’s economy has also grown at a steady rate over time, it follows that the demand for energy will also increase in tandem. Energy demand also declines during the summer months, but increases in the winter when people heat their homes.
stlq5 <- stl(cangas, s.window = 12)
autoplot(stlq5)
seasq5 <- seas(cangas)
autoplot(seasq5)
x11q5 <- seas(cangas, x11="")
autoplot(x11q5)
It appears that X11 has a smoother trend line than SEATS, and a more sporradic remainder graph.
stlq6 <- stl(bricksq, s.window = "periodic", t.window = 12)
autoplot(stlq6)
stlq6.1 <- stl(bricksq,s.window=12)
autoplot(stlq6.1)
seaq6 <- seasadj(stlq6)
autoplot(seaq6)
naiveq6 <- naive(seaq6)
autoplot(naiveq6)
stlf(bricksq)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1994 Q4 465.7326 437.4903 493.9748 422.5398 508.9254
## 1995 Q1 424.6727 384.7119 464.6335 363.5580 485.7874
## 1995 Q2 483.9914 435.0232 532.9595 409.1010 558.8817
## 1995 Q3 493.9988 437.4242 550.5734 407.4754 580.5222
## 1995 Q4 465.7326 402.4454 529.0198 368.9431 562.5220
## 1996 Q1 424.6727 355.3067 494.0387 318.5865 530.7589
## 1996 Q2 483.9914 409.0259 558.9568 369.3416 598.6411
## 1996 Q3 493.9988 413.8128 574.1848 371.3650 616.6326
autoplot(stlf(bricksq))
checkresiduals(stlf(bricksq))
## Warning in checkresiduals(stlf(bricksq)): The fitted degrees of freedom is
## based on the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(M,N,N)
## Q* = 41.128, df = 6, p-value = 2.733e-07
##
## Model df: 2. Total lags used: 8
There are significant spikes at Lags 1 and 8. Overall the model fits appropriately well.
robustq6 <- stlf(bricksq, robust = TRUE)
autoplot(robustq6)
checkresiduals(robustq6)
## Warning in checkresiduals(robustq6): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(M,N,N)
## Q* = 28.163, df = 6, p-value = 8.755e-05
##
## Model df: 2. Total lags used: 8
The residuals appear similar for both the robust and non-robust models. There are significant spikes at Lag 5 and 8 instead of 1 and 8.
brick.train <- window(bricksq, end=1992)
brick.test <- window(bricksq, start = 1992, end = 1995)
## Warning in window.default(x, ...): 'end' value not changed
brick.stlf <- stlf(brick.train)
brick.stlf.fc <- forecast(brick.stlf)
brick.stlf.fc
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1992 Q2 455.7403 429.3780 482.1027 415.4226 496.0581
## 1992 Q3 459.7767 422.4756 497.0778 402.7296 516.8238
## 1992 Q4 432.5131 386.8038 478.2223 362.6068 502.4193
## 1993 Q1 386.9997 334.1900 439.8095 306.2342 467.7653
## 1993 Q2 455.7403 396.6643 514.8163 365.3914 546.0893
## 1993 Q3 459.7767 395.0260 524.5275 360.7490 558.8044
## 1993 Q4 432.5131 362.5351 502.4911 325.4909 539.5352
## 1994 Q1 386.9997 312.1481 461.8514 272.5241 501.4754
autoplot(brick.stlf.fc)
brick.naive <- snaive(brick.train)
brick.naive.fc <- forecast(brick.naive)
brick.naive.fc
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1992 Q2 436 371.9545 500.0455 338.0509 533.9491
## 1992 Q3 424 359.9545 488.0455 326.0509 521.9491
## 1992 Q4 430 365.9545 494.0455 332.0509 527.9491
## 1993 Q1 387 322.9545 451.0455 289.0509 484.9491
## 1993 Q2 436 345.4260 526.5740 297.4790 574.5210
## 1993 Q3 424 333.4260 514.5740 285.4790 562.5210
## 1993 Q4 430 339.4260 520.5740 291.4790 568.5210
## 1994 Q1 387 296.4260 477.5740 248.4790 525.5210
autoplot(brick.naive.fc)
brick.test
## Qtr1 Qtr2 Qtr3 Qtr4
## 1992 387 413 451 420
## 1993 394 462 476 443
## 1994 421 472 494
The Naive forecast appears to perform more accurately than the STLF.
forecastq7 <- stlf(writing, method = 'rwdrift')
autoplot(forecastq7)
forecastq8 <- stlf(fancy, method='naive')
autoplot(forecastq8)