Hyndman Ch. 5, 6

Hyndman Ch. 5 Exercises

Libraries

library(knitr)
library(fpp2)

## Warning: package 'fpp2' was built under R version 3.6.2

## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## ── Attaching packages ──────────────────────────────────────────────────────────────────────────── fpp2 2.4 ──

## ✓ ggplot2   3.3.2     ✓ fma       2.4  
## ✓ forecast  8.13      ✓ expsmooth 2.3

## Warning: package 'ggplot2' was built under R version 3.6.2

## Warning: package 'forecast' was built under R version 3.6.2

##

library(seasonal)

## Warning: package 'seasonal' was built under R version 3.6.2

Q. 1

daily20 <- head(elecdaily,20)
autoplot(daily20)

regq1 <- tslm(formula = Demand~Temperature,data = daily20)
summary(regq1)

## 
## Call:
## tslm(formula = Demand ~ Temperature, data = daily20)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.060  -7.117  -1.437  17.484  27.102 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  39.2117    17.9915   2.179   0.0428 *  
## Temperature   6.7572     0.6114  11.052 1.88e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22 on 18 degrees of freedom
## Multiple R-squared:  0.8716, Adjusted R-squared:  0.8644 
## F-statistic: 122.1 on 1 and 18 DF,  p-value: 1.876e-09

As the temperature increases by 1C, on average, the demand for energy increases by 6.75 units. There is a positive relationship because people in Australia demand energy to cool their houses when the temperature increases.

residuals(regq1)

## Time Series:
## Start = c(1, 4) 
## End = c(4, 2) 
## Frequency = 7 
##           1           2           3           4           5           6 
## -40.0032629  -6.0369497  -0.3052144  -2.5691341 -46.0601058  24.0754737 
##           7           8           9          10          11          12 
##  25.5466767 -19.0221433 -30.0675910 -10.3589677  11.2234590  -3.6194867 
##          13          14          15          16          17          18 
##  12.7351505  -3.4857856  24.3568150  16.5137374   2.4974049  20.3937519 
##          19          20 
##  -2.9162723  27.1024444

checkresiduals(regq1)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774

It is apparent that autocorrelation shows a singificant spike at Lag 4, but it stays within the determined bounds. Also, the histogram exhibits a slight left skew. These are factors to note, but it does not significantly impact the model’s efficacy.

data.1c= data.frame(Temperature = 15)
fcast.1c = forecast(regq1, h=1, newdata = data.1c)
autoplot(fcast.1c)

data.1cc= data.frame(Temperature = 35)
fcast.1cc = forecast(regq1, h=1, newdata = data.1cc)
autoplot(fcast.1cc)

The forecast for the range of energy consumption at 15C (60F) is reasonably accurate. It’s likely that houses will have light heat running, especially during the night. The forecast for 35C is also reasonably accurate. Considering that it gets up to 45C in Victoria (110F!), we can expect that the highest energy consumption would be during the very hottest days of the year. 35C appears to be normal summer weather and would not reflect peak energy consumption.

autoplot(daily20, facets=TRUE)

daily20 %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

fit <- tslm(Demand ~ Temperature, data=daily20)
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774

forecast(fit, newdata=data.frame(Temperature=c(15,35)))

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 4.285714       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.428571       275.7146 245.2278 306.2014 227.57056 323.8586

Using the given code:

Thus 15C at 80% lies at (108.6810, 172.4591). At 95%, (90.21166, 190.9285)

Thus 35C at 80% lies at (245.2278, 306.2014). At 95%, (227.57056, 323.8586)

plot(Demand~Temperature, data=elecdaily)

From the full dataset, it’s apparent that energy consumption tends to increase when Temperature is less than 20 and greater than 40. Considering that the data in daily20 had many observations at >40C, it exhibited a strong increase in energy consumption as Temperature increased.

Q. 2

autoplot(mens400)

From the plot, it is clear that competition for the Men’s 400M event has become more competitive over the years. Overall the data shows a general downnward tend. There is also missing data for years 1916, 1940, and 1944 due to World Wars.

timemens400 <- time(mens400)
regq2 <- tslm(mens400~timemens400, data = mens400)
summary(regq2)

## 
## Call:
## tslm(formula = mens400 ~ timemens400, data = mens400)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6002 -0.5747 -0.2858  0.5751  4.1505 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 172.481477  11.487522   15.02 2.52e-14 ***
## timemens400  -0.064574   0.005865  -11.01 2.75e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.136 on 26 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.8234, Adjusted R-squared:  0.8166 
## F-statistic: 121.2 on 1 and 26 DF,  p-value: 2.752e-11

As the time increases by 1 year, on average, the winning time for the event decreases by about 0.06 seconds. In other words, every Olympic Games, Men’s 400 winning times decreases by about 0.26 seconds.

autoplot(mens400) +
  geom_abline(slope = regq2$coefficients[2],intercept = regq2$coefficients[1],color="violet")

cbind(Time = timemens400, Residuals = regq2$residuals) %>%
  as.data.frame() %>%
  ggplot(aes(x = Time, y = Residuals)) +
    geom_point()

## Warning: Removed 3 rows containing missing values (geom_point).

residuals(regq2)

## Time Series:
## Start = 1896 
## End = 2016 
## Frequency = 0.25 
##           1           2           3           4           5           6 
##  4.15053282 -0.39117180 -0.33287642  0.72541896 -0.81628566          NA 
##           7           8           9          10          11          12 
##  1.10030510 -0.64139952 -0.18310414 -1.52480876 -0.96651338          NA 
##          13          14          15          16          17          18 
##          NA -0.49162724 -0.53333186  0.52496352 -1.01674110 -0.55844572 
##          19          20          21          22          23          24 
## -1.60015034 -0.48185496 -0.62355958 -0.02526420 -0.09696882 -0.23867344 
##          25          26          27          28          29          30 
## -0.35037806 -0.10208268  0.50621270  0.92450808  0.93280346  1.38109884 
##          31 
##  0.72939422

checkresiduals(regq2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals from Linear regression model
## LM test = 3.6082, df = 6, p-value = 0.7295

It’s clear that there is a significant outlier at 1896. Outside of this, the model appears to suit the data.

fcast.2d = forecast(regq2, newdata = data.frame(timemens400 = 2020))
autoplot(mens400) + autolayer(fcast.2d)

fcast.2d

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2020       42.04231 40.44975 43.63487 39.55286 44.53176

fcast.2d$lower

## Time Series:
## Start = 2020 
## End = 2020 
## Frequency = 0.25 
##      Series 1 Series 2
## 2020 40.44975 39.55286

fcast.2d$upper

## Time Series:
## Start = 2020 
## End = 2020 
## Frequency = 0.25 
##      Series 1 Series 2
## 2020 43.63487 44.53176

The predicted winning time for 2020 is 42.04. The prediction intervals assume normality for the data.

Q. 3

easter(ausbeer)

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00

The command shows a dummy variable representing when the holiday Easter falls - “1” for either in March or April and “0” otherwise. When Easter starts in March but finishes in April, the variable is split proportionately between the two months.

Q. 4

log(y)=β0+β1log(x)+ε

can be transformed into

y = e^(β0+β1log(x)+ε)

to put y in terms of x. Using elasticity, β1 = (dy/dx)×(x/y), the equation may further be transformed into

y = e^(β0+(dy/dx)*(x/y)log(x)+ε)

Q. 5

autoplot(fancy)

It’s clear from the graph that the sales for the souvenir shop is highly seasonal, presumably for holiday gifts. The spikes at the end of 1993 and 1994 are significantly higher than previous years. Because of the seasonal nature of the data, it’s a useful tool to take logs.

Time = time(fancy)
surf.dummy = c()
for(i in 1:length(Time)){
  month = round(12*(Time[i]-floor(Time[i]))) + 1
  year = floor(Time[i])
  if(year>=1998 & month == 3){
  surf.dummy[i] = 1 
  } else {
  surf.dummy[i] = 0}
}

regq5 = tslm(log(fancy)~ trend+season+surf.dummy, data=fancy)

summary(regq5)

## 
## Call:
## tslm(formula = log(fancy) ~ trend + season + surf.dummy, data = fancy)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.41644 -0.12619  0.00608  0.11389  0.38567 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.6058604  0.0768740  98.939  < 2e-16 ***
## trend       0.0223930  0.0008448  26.508  < 2e-16 ***
## season2     0.2510437  0.0993278   2.527 0.013718 *  
## season3     0.6952066  0.0993386   6.998 1.18e-09 ***
## season4     0.3829341  0.0993565   3.854 0.000252 ***
## season5     0.4079944  0.0993817   4.105 0.000106 ***
## season6     0.4469625  0.0994140   4.496 2.63e-05 ***
## season7     0.6082156  0.0994534   6.116 4.69e-08 ***
## season8     0.5853524  0.0995001   5.883 1.21e-07 ***
## season9     0.6663446  0.0995538   6.693 4.27e-09 ***
## season10    0.7440336  0.0996148   7.469 1.61e-10 ***
## season11    1.2030164  0.0996828  12.068  < 2e-16 ***
## season12    1.9581366  0.0997579  19.629  < 2e-16 ***
## surf.dummy         NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1858 on 71 degrees of freedom
## Multiple R-squared:  0.9527, Adjusted R-squared:  0.9447 
## F-statistic: 119.1 on 12 and 71 DF,  p-value: < 2.2e-16

autoplot(regq5$residuals)

The residuals show no discernable pattern.

  cbind.data.frame(
    Month = factor(
      month.abb[round(12*(Time - floor(Time)) + 1)],
      labels = month.abb,
      ordered = TRUE
    ),
    Residuals = regq5$residuals
    ) %>%
  ggplot(aes(x = Month,
             y = Residuals)) +
    geom_boxplot()

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

regq5$coefficients

## (Intercept)       trend     season2     season3     season4     season5 
##  7.60586042  0.02239298  0.25104367  0.69520658  0.38293405  0.40799437 
##     season6     season7     season8     season9    season10    season11 
##  0.44696253  0.60821562  0.58535238  0.66634464  0.74403359  1.20301639 
##    season12  surf.dummy 
##  1.95813657          NA

The coefficients reveal a positive trend for increasing sales over time.

#checkresiduals(regq5)

checkresiduals(regq5) in the R Markdown runs but returns an error, ‘element (31, 31) is zero, so the inverse cannot be computed.’ It does not knit.

The Breusch-Godfrey Test reveals significant spikes at 24, 26, 28, 21, 2, 1, and 10, indicating autocorrelation.

Q. 6

gas.04 <- window(gasoline,end=2005)
autoplot(gas.04)

regq6.1 <- tslm(gas.04~trend + fourier(gas.04, K=3))
regq6.2 <- tslm(gas.04~trend + fourier(gas.04, K=8))
regq6.3 <- tslm(gas.04~trend + fourier(gas.04, K=14))

autoplot(gas.04, ylab = "Gas, Millions of Barrels") + autolayer(fitted(regq6.1), series = "K=3")  + autolayer(fitted(regq6.2), series = "K=8") + autolayer(fitted(regq6.3), series = "K=14")

AIC(regq6.1)

## [1] 197.2137

AIC(regq6.2)

## [1] 146.6802

AIC(regq6.3)

## [1] 146.0811

Of all of the regressions, regq6.3 with K=14 has the lowest AIC which indicates the best fit.

checkresiduals(regq6.3)

## 
##  Breusch-Godfrey test for serial correlation of order up to 104
## 
## data:  Residuals from Linear regression model
## LM test = 154.88, df = 104, p-value = 0.0009033

fc.gas <- forecast(regq6.3, newdata=data.frame(fourier(gas.04,14,52)))
fc.gas

##          Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2005.014       8.688467 8.344908 9.032027 8.162623  9.214312
## 2005.033       8.652196 8.308515 8.995877 8.126166  9.178227
## 2005.052       8.681805 8.338152 9.025458 8.155817  9.207793
## 2005.071       8.646715 8.303063 8.990367 8.120729  9.172701
## 2005.090       8.662688 8.319198 9.006178 8.136950  9.188427
## 2005.110       8.802433 8.459097 9.145769 8.276931  9.327935
## 2005.129       8.916500 8.573165 9.259834 8.390999  9.442000
## 2005.148       8.915439 8.572131 9.258747 8.389980  9.440898
## 2005.167       8.923797 8.580485 9.267108 8.398332  9.449262
## 2005.186       9.014273 8.670958 9.357587 8.488803  9.539743
## 2005.205       9.075322 8.732011 9.418633 8.549858  9.600786
## 2005.225       9.045933 8.702627 9.389238 8.520477  9.571388
## 2005.244       9.023517 8.680207 9.366828 8.498054  9.548980
## 2005.263       9.070471 8.727159 9.413782 8.545006  9.595935
## 2005.282       9.126463 8.783157 9.469769 8.601006  9.651920
## 2005.301       9.140407 8.797100 9.483714 8.614949  9.665865
## 2005.320       9.133874 8.790564 9.477184 8.608411  9.659337
## 2005.340       9.148640 8.805332 9.491949 8.623180  9.674101
## 2005.359       9.215731 8.872426 9.559036 8.690275  9.741186
## 2005.378       9.309807 8.966499 9.653115 8.784347  9.835267
## 2005.397       9.326552 8.983242 9.669861 8.801090  9.852013
## 2005.416       9.236585 8.893278 9.579891 8.711127  9.762042
## 2005.435       9.205722 8.862416 9.549028 8.680266  9.731179
## 2005.455       9.365368 9.022059 9.708677 8.839907  9.890829
## 2005.474       9.560867 9.217559 9.904176 9.035407 10.086327
## 2005.493       9.573649 9.230344 9.916954 9.048193 10.099105
## 2005.512       9.469964 9.126657 9.813271 8.944505  9.995423
## 2005.531       9.455877 9.112568 9.799186 8.930415  9.981338
## 2005.550       9.525789 9.182482 9.869096 9.000331 10.051247
## 2005.570       9.539059 9.195753 9.882364 9.013603 10.064515
## 2005.589       9.505437 9.162128 9.848745 8.979976 10.030897
## 2005.608       9.523581 9.180272 9.866890 8.998120 10.049042
## 2005.627       9.563090 9.219784 9.906396 9.037634 10.088546
## 2005.646       9.531682 9.188375 9.874988 9.006224 10.057139
## 2005.665       9.439316 9.096007 9.782625 8.913854  9.964778
## 2005.685       9.339393 8.996085 9.682700 8.813933  9.864852
## 2005.704       9.237800 8.894495 9.581106 8.712345  9.763256
## 2005.723       9.159260 8.815951 9.502569 8.633799  9.684721
## 2005.742       9.164571 8.821261 9.507880 8.639109  9.690033
## 2005.761       9.244484 8.901178 9.587790 8.719027  9.769941
## 2005.780       9.319073 8.975767 9.662380 8.793616  9.844530
## 2005.800       9.349678 9.006367 9.692988 8.824214  9.875141
## 2005.819       9.348996 9.005687 9.692306 8.823535  9.874458
## 2005.838       9.318053 8.974748 9.661358 8.792598  9.843508
## 2005.857       9.256287 8.912977 9.599597 8.730824  9.781750
## 2005.876       9.182031 8.838718 9.525343 8.656564  9.707497
## 2005.895       9.116522 8.773214 9.459831 8.591062  9.641983
## 2005.915       9.114523 8.771216 9.457830 8.589065  9.639981
## 2005.934       9.246152 8.902832 9.589471 8.720674  9.771629
## 2005.953       9.430333 9.087014 9.773652 8.904856  9.955809
## 2005.972       9.423424 9.080112 9.766735 8.897959  9.948888
## 2005.991       9.149290 8.805908 9.492671 8.623717  9.674862

autoplot(fc.gas)

gas.05 <- window(gasoline, start=2005, end=2006)
autoplot(gas.05, series="Observed", color = "brown") + autolayer(fc.gas$mean, series="Forecast", color="blue")

Q. 7

autoplot(huron)

The data shows the level of Lake Huron in feet from 1875-1972. There appears to be some seasonal changes in the level of the lake. This could be due to the water expanding and contracting with temperature changes or due to icy conditions.

regq7 <- tslm(huron~trend, data=huron)

summary(regq7)

## 
## Call:
## tslm(formula = huron ~ trend, data = huron)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50997 -0.72726  0.00083  0.74402  2.53565 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.202037   0.230111  44.335  < 2e-16 ***
## trend       -0.024201   0.004036  -5.996 3.55e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared:  0.2725, Adjusted R-squared:  0.2649 
## F-statistic: 35.95 on 1 and 96 DF,  p-value: 3.545e-08

Time <- (time(huron))
knot <- ts(pmax(0, Time-1915), start = 1875)
piecewise.huron <- tslm(huron~Time+knot)

summary(piecewise.huron)

## 
## Call:
## tslm(formula = huron ~ Time + knot)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.49626 -0.66240 -0.07139  0.85163  2.39222 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 132.90870   19.97687   6.653 1.82e-09 ***
## Time         -0.06498    0.01051  -6.181 1.58e-08 ***
## knot          0.06486    0.01563   4.150 7.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.045 on 95 degrees of freedom
## Multiple R-squared:  0.3841, Adjusted R-squared:  0.3711 
## F-statistic: 29.62 on 2 and 95 DF,  p-value: 1.004e-10

autoplot(huron) + autolayer(fitted(piecewise.huron), series = "Piecewise", color = "green") + autolayer(fitted(regq7), series = "Linear", color="orange")

The multiple R2 and the adjusted R2 is higher in the piecewise model. The piecewise plot reveals that the general trend since the 1910’s is, the lake tends to fluctuate between 6-11 but with no general trend up or down.

Hyndman Ch. 6 Exercises

Q. 1

3x5 MA = [(Y1 + Y2 + Y3 + Y4 + Y5) / 5 + (Y1 + Y2 + Y3 + Y4 + Y5 + Y6) / 5 + (Y1 + Y2 + Y3 + Y4 + Y5 + Y6 + Y7) / 5] / 3

= 1/15Y1 + 2/15Y2 + 3/15Y3 + 3/15Y4 + 3/15Y5 + 2/15Y6 + 1/15Y7

Weights = (0.067, 0.133, 0.200, 0.200, 0.200, 0.133, 0.067)

0.067 is roughly equal to 1/15.

Q. 2

autoplot(plastics)

There is a general upward trend with apparent seasonality.

decompq1 <- decompose(plastics, type = "multiplicative")
decompq1

## $x
##    Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  742  697  776  898 1030 1107 1165 1216 1208 1131  971  783
## 2  741  700  774  932 1099 1223 1290 1349 1341 1296 1066  901
## 3  896  793  885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4  951  861  938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
## 
## $seasonal
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 2 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 3 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 4 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 5 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
##         Aug       Sep       Oct       Nov       Dec
## 1 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 2 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 3 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 4 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 5 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 
## $trend
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1        NA        NA        NA        NA        NA        NA  976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500        NA
##         Aug       Sep       Oct       Nov       Dec
## 1  977.0417  977.0833  978.4167  982.7083  990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5        NA        NA        NA        NA        NA
## 
## $random
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1        NA        NA        NA        NA        NA        NA 1.0247887
## 2 0.9656005 0.9745267 0.9750081 0.9894824 1.0061175 1.0024895 1.0401641
## 3 1.0454117 0.9953920 1.0079773 1.0142083 0.9990100 0.9854384 0.9567618
## 4 1.0257400 0.9924762 0.9807020 0.9798704 0.9684851 0.9627557 0.9917766
## 5 0.9767392 1.0510964 1.0498039 1.0299302 1.0398787 1.0628077        NA
##         Aug       Sep       Oct       Nov       Dec
## 1 1.0157335 1.0040354 0.9724119 0.9961368 0.9489762
## 2 1.0230774 1.0040674 0.9962088 0.9735577 0.9721203
## 3 0.9969907 1.0132932 1.0314752 0.9910657 1.0258002
## 4 0.9776897 0.9920952 1.0133954 1.0527311 1.0665946
## 5        NA        NA        NA        NA        NA
## 
## $figure
##  [1] 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
##  [8] 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 
## $type
## [1] "multiplicative"
## 
## attr(,"class")
## [1] "decomposed.ts"

autoplot(decompq1)

Yes; the decomposition also shows a general upward trend with seasonality.

autoplot(plastics, series="Observed") +
  autolayer(trendcycle(decompq1), series="Trend") +
  autolayer(seasadj(decompq1), series="Seasonal")

## Warning: Removed 12 row(s) containing missing values (geom_path).

plastics[15] = plastics[15] + 500
multdecomp1 = decompose(plastics, type= "multiplicative")
autoplot(multdecomp1)

The outlier adds a significant spike to the remainder graph. It also impacts the trend line data and adds irregularity to the seasonal data. The point in time at which the outlier lies makes a singificant difference; it could possibly accelerate the trend line or demonstrate abnormality for otherwise typical seasonal data.

Q.4

retaildata <- readxl::read_excel("/Users/nelsonwhite/Documents/ms applied economics/Predictive Analytics:Forecasting/assignment 1/retail.xlsx", skip=1)
head(retaildata)

## # A tibble: 6 x 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## 3 1982-06-01 00:00:00      298       40.3      62.7      401       62.3
## 4 1982-07-01 00:00:00      308.      40.9      65.6      414.      68.2
## 5 1982-08-01 00:00:00      299.      42.1      62.6      404.      66  
## 6 1982-09-01 00:00:00      305.      42        64.4      412.      62.3
## # … with 184 more variables: A3349336V <dbl>, A3349337W <dbl>,
## #   A3349397X <dbl>, A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>,
## #   A3349790V <dbl>, A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>,
## #   A3349873A <dbl>, A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>,
## #   A3349789K <dbl>, A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>,
## #   A3349799R <dbl>, A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>,
## #   A3349416V <dbl>, A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>,
## #   A3349727C <dbl>, A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>,
## #   A3349349F <dbl>, A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>,
## #   A3349566A <dbl>, A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>,
## #   A3349561R <dbl>, A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>,
## #   A3349637X <dbl>, A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>,
## #   A3349719C <dbl>, A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>,
## #   A3349480L <dbl>, A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>,
## #   A3349481R <dbl>, A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>,
## #   A3349654A <dbl>, A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>,
## #   A3349656F <dbl>, A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>,
## #   A3349360V <dbl>, A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>,
## #   A3349575C <dbl>, A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>,
## #   A3349433W <dbl>, A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>,
## #   A3349815C <dbl>, A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>,
## #   A3349742A <dbl>, A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>,
## #   A3349824F <dbl>, A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>,
## #   A3349434X <dbl>, A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>,
## #   A3349908R <dbl>, A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>,
## #   A3349365F <dbl>, A3349746K <dbl>, …

myts <- ts(retaildata[,"A3349873A"],
  frequency=12, start=c(1982,4))

decompq3 <- seas(myts, x11 = "")

autoplot(decompq3)

As with the previous decomposition, the outliers are apparent in the remainder graph. They correspond with hills, valleys, and other irregulatiries in the trend line.

Q. 4

Overall, the graph reveals a general upward trend and seasonality for the data. During the early 1990’s, the trend paused when there was a dip in the labor force; this is apparent through the trend and data graphs. The remainder graph reinforces this because it dips all the way to -400. The recession of 1991/1992 correlate with these phenomena; from looking at the graphs, one could guess that there was a shock to the economy during these years.

Q. 5

autoplot(cangas)

ggsubseriesplot(cangas)

ggseasonplot(cangas)

It’s apparent that the demand for Canadian gas has steadily increased over time, with a strong seasonal component. Considering Canada’s economy has also grown at a steady rate over time, it follows that the demand for energy will also increase in tandem. Energy demand also declines during the summer months, but increases in the winter when people heat their homes.

stlq5 <- stl(cangas, s.window = 12)

autoplot(stlq5)

seasq5 <- seas(cangas)
autoplot(seasq5)

x11q5 <- seas(cangas, x11="")

autoplot(x11q5)

It appears that X11 has a smoother trend line than SEATS, and a more sporradic remainder graph.

Q. 6

stlq6 <- stl(bricksq, s.window = "periodic", t.window = 12)
autoplot(stlq6)

stlq6.1 <- stl(bricksq,s.window=12)
autoplot(stlq6.1)

seaq6 <- seasadj(stlq6)
autoplot(seaq6)

naiveq6 <- naive(seaq6)
autoplot(naiveq6)

stlf(bricksq)

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       465.7326 437.4903 493.9748 422.5398 508.9254
## 1995 Q1       424.6727 384.7119 464.6335 363.5580 485.7874
## 1995 Q2       483.9914 435.0232 532.9595 409.1010 558.8817
## 1995 Q3       493.9988 437.4242 550.5734 407.4754 580.5222
## 1995 Q4       465.7326 402.4454 529.0198 368.9431 562.5220
## 1996 Q1       424.6727 355.3067 494.0387 318.5865 530.7589
## 1996 Q2       483.9914 409.0259 558.9568 369.3416 598.6411
## 1996 Q3       493.9988 413.8128 574.1848 371.3650 616.6326

autoplot(stlf(bricksq))

checkresiduals(stlf(bricksq))

## Warning in checkresiduals(stlf(bricksq)): The fitted degrees of freedom is
## based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,N,N)
## Q* = 41.128, df = 6, p-value = 2.733e-07
## 
## Model df: 2.   Total lags used: 8

There are significant spikes at Lags 1 and 8. Overall the model fits appropriately well.

robustq6 <- stlf(bricksq, robust = TRUE)
autoplot(robustq6)

checkresiduals(robustq6)

## Warning in checkresiduals(robustq6): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,N,N)
## Q* = 28.163, df = 6, p-value = 8.755e-05
## 
## Model df: 2.   Total lags used: 8

The residuals appear similar for both the robust and non-robust models. There are significant spikes at Lag 5 and 8 instead of 1 and 8.

brick.train <- window(bricksq, end=1992)
brick.test <- window(bricksq, start = 1992, end = 1995)

## Warning in window.default(x, ...): 'end' value not changed

brick.stlf <- stlf(brick.train)
brick.stlf.fc <- forecast(brick.stlf)
brick.stlf.fc

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1992 Q2       455.7403 429.3780 482.1027 415.4226 496.0581
## 1992 Q3       459.7767 422.4756 497.0778 402.7296 516.8238
## 1992 Q4       432.5131 386.8038 478.2223 362.6068 502.4193
## 1993 Q1       386.9997 334.1900 439.8095 306.2342 467.7653
## 1993 Q2       455.7403 396.6643 514.8163 365.3914 546.0893
## 1993 Q3       459.7767 395.0260 524.5275 360.7490 558.8044
## 1993 Q4       432.5131 362.5351 502.4911 325.4909 539.5352
## 1994 Q1       386.9997 312.1481 461.8514 272.5241 501.4754

autoplot(brick.stlf.fc)

brick.naive <- snaive(brick.train)
brick.naive.fc <- forecast(brick.naive)
brick.naive.fc

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1992 Q2            436 371.9545 500.0455 338.0509 533.9491
## 1992 Q3            424 359.9545 488.0455 326.0509 521.9491
## 1992 Q4            430 365.9545 494.0455 332.0509 527.9491
## 1993 Q1            387 322.9545 451.0455 289.0509 484.9491
## 1993 Q2            436 345.4260 526.5740 297.4790 574.5210
## 1993 Q3            424 333.4260 514.5740 285.4790 562.5210
## 1993 Q4            430 339.4260 520.5740 291.4790 568.5210
## 1994 Q1            387 296.4260 477.5740 248.4790 525.5210

autoplot(brick.naive.fc)

brick.test

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1992  387  413  451  420
## 1993  394  462  476  443
## 1994  421  472  494

The Naive forecast appears to perform more accurately than the STLF.

Q. 7

forecastq7 <- stlf(writing, method = 'rwdrift')
autoplot(forecastq7)

Q. 8

forecastq8 <- stlf(fancy, method='naive')
autoplot(forecastq8)

Hyndman Ch. 5, 6

Nelson White

Due 11/08/2020

Hyndman Ch. 5 Exercises

Q. 1

Q. 2

Q. 3

Q. 4

Q. 5

Q. 6

Q. 7

Hyndman Ch. 6 Exercises

Q. 1

Q. 2

Q.4

Q. 4

Q. 5

Q. 6

Q. 7

Q. 8