Homework1

Setup

library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(fpp2)

## Loading required package: ggplot2

## Loading required package: fma

## Loading required package: expsmooth

library(seasonal)
library(e1071)

Chapter 5 Exercises

1A. Plot and Regression

daily20 = head(elecdaily,20)
autoplot(daily20, facets = TRUE, main = 'elecdaily First 20 Data Points')

mod = tslm(Demand~Temperature, data = daily20)
summary(mod)

## 
## Call:
## tslm(formula = Demand ~ Temperature, data = daily20)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.060  -7.117  -1.437  17.484  27.102 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  39.2117    17.9915   2.179   0.0428 *  
## Temperature   6.7572     0.6114  11.052 1.88e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22 on 18 degrees of freedom
## Multiple R-squared:  0.8716, Adjusted R-squared:  0.8644 
## F-statistic: 122.1 on 1 and 18 DF,  p-value: 1.876e-09

There is a positive relationship between electricity demand and temperature because as temperature rises, people use their air conditioners more and therefore demand more energy. This data uses the first 20 days of 2014 in Victoria, Australia, which means this data is for summer, so we do not have to consider possible electricity demand associated with heating systems.

1B. Residual plot

daily20 %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

checkresiduals(mod)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774

skewness(mod$residuals)

## [1] -0.58939

Yes, there does seem to be two points that are outliers. The two residuals beyond -40 are much further from zero than the rest of the residuals. One thing to be careful of this chart is the y axis is not symmetric around zero, so larger negative values appear to be equivalent to smaller positive values. This model seems to be adequate.The ACF plot is not troublesome and Breusch-Godfrey tests show there is not serial correlation. The residuals are slightly left-skewed, but it is not egregious.

1C/1D Forecast and prediction intervals

forecast1  = forecast(mod, newdata=data.frame(Temperature=c(15,35)))
summary(forecast1)

## 
## Forecast method: Linear regression model
## 
## Model Information:
## 
## Call:
## tslm(formula = Demand ~ Temperature, data = daily20)
## 
## Coefficients:
## (Intercept)  Temperature  
##      39.212        6.757  
## 
## 
## Error measures:
##                        ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 4.263603e-15 20.87457 16.44449 -0.8271304 7.763681 0.2872158
##                   ACF1
## Training set 0.0332374
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 3.857143       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.000000       275.7146 245.2278 306.2014 227.57056 323.8586

While these forecasts could be improved, based on the answer to part 1B, I do somewhat believe these forecasts, as the model is adequate. We can also see from these results that the model is not biased (Mean Error is essentially zero). We can also see the prediction intervals with the forecast results. These are confidence intervals around the point forecast.

1E. Plot all data

elecdaily %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point() +
  xlab("Temperature")+
  ylab("Demand") +
  geom_smooth(method="lm", se=TRUE)

## `geom_smooth()` using formula 'y ~ x'

This tells us that the model created on the first 20 days of data is not a good model. This relationship is not linear, so modelling it in a linear fashion is not ideal. As noted in 1A, when we only take into account the first 20 days, we are leaving out temperature demands for other times of the year. Over the course of the year, the temperature fluctuates much more than in any 20 day period, so our model does not work well for the entire data.

2A. Plot

autoplot(mens400, ylab = "Winning Time", main = 'Mens 400 Olympic Winner Times')

The trend goes down over time. But it does not go down every single Olympics. We can also see that times have largely stabilized since around 1992. The rate at which times declined was much higher earlier in the data set. There is also some missing data. The data description notes this is due to the Olympics not being held, and not due to issues with the data set itself.

2B. Fit regression line

getYears = time(mens400)
print(getYears)

## Time Series:
## Start = 1896 
## End = 2016 
## Frequency = 0.25 
##  [1] 1896 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952
## [16] 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012
## [31] 2016

mod2 = tslm(mens400~getYears)
mod2

## 
## Call:
## tslm(formula = mens400 ~ getYears)
## 
## Coefficients:
## (Intercept)     getYears  
##   172.48148     -0.06457

Times have been decreasing at an average rate of 0.06 seconds per year. Put another way, in each consecutive Olympics, the winning time decreases by 0.24 seconds from the previous Olympics (assuming 4-year interval).

2C. Plot residuals

newDataFrame = data.frame(cbind(getYears, mod2$residuals))
newDataFrame %>%
  ggplot(aes(y=mod2$residuals, x=getYears)) +
  geom_point() +
  ylab("Residuals")+
  xlab("Years") +
  geom_smooth(method="lm", se=TRUE)

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (stat_smooth).

## Warning: Removed 3 rows containing missing values (geom_point).

Most of the residuals are negative, so the model seems to be overestimating the winning times. Also, the residuals start to become larger (in absolute terms) when the data stabilizes around 1990. Despite this, the model does a pretty good job and is suitable for the data, as much of the data has residuals close to zero.

2D. Forecast

mod2

## 
## Call:
## tslm(formula = mens400 ~ getYears)
## 
## Coefficients:
## (Intercept)     getYears  
##   172.48148     -0.06457

forecast2  = forecast(mod2, newdata=data.frame(getYears=2020))
summary(forecast2)

## 
## Forecast method: Linear regression model
## 
## Model Information:
## 
## Call:
## tslm(formula = mens400 ~ getYears)
## 
## Coefficients:
## (Intercept)     getYears  
##   172.48148     -0.06457  
## 
## 
## Error measures:
##                         ME     RMSE       MAE         MPE     MAPE       MASE
## Training set -2.538059e-16 1.094591 0.7839455 -0.04728455 1.667093 0.01703555
##                   ACF1
## Training set 0.1461939
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2020       42.04231 40.44975 43.63487 39.55286 44.53176

The prediction for the 2020 winning time is 42.04. We are 95% confident that the winning time will fall between 39.55 and 44.53 and 80% confident the time will fall between 40.45 and 43.63. This assumes that the winning time in 2020 will be lower than the winning time in 2016, or that is will at least continue the trend in a predictable way, even if the value itself is not lower than in 2016.

easter(ausbeer)

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00

This shows in whic month Easter took place. This assumes the effect of Easter is not just the single day but is spread out over multiple days. When the effect is wholly contained in a single month, that month gets a 1 and the rest are zero. But if the effect spills over (from March to April) the effect is weighted. This chart shows the data in terms of quarters and not months, but it is the same interpretation, as March is the end of quarter one and April is the start of quarter two. We can see from this data that most of the time the effect of Easter is wholly contained in a single quarter.

I used wolfram alpha to isolate y and take dy/dx.

y = e^(B0 + e) * x^(B1)

dy/dx = e^(B0 + e) * B1*x^(B1-1)

Since x^(B1-1) = x^(B1)/x

We can rewrite the entire expression as

dy/dx = (B1 * e^(B0 + e) * x^(B1)) / x

We then substite e^(B0 + e) * x^(B1) for y

dy/dx = B1 *y/x

Multiply both sides by x/y

(dy/dx) * (x/y) = B1

Since this is the formula for elasticity, B1 must be the elasticity coefficient.

5A. Plot

autoplot(fancy, ylab = "Monthly Sales")

The data is affected by seasonality. We also see that the rate of change in the seasonality increased over time. If we were to decompose this, we would want to use multiplicative decomposition. the seems to be a fluctuation around the end of the year and then another smaller fluctuation early in the year.

5B. It is necessary to take the log of this data because we care about the percentage change and not the raw change. This will also allow us to see more easily how the seasonal flucuations within each year varies over time. Taking percentage changes will also help us to interpret the data, because measuring growth in sales as a percentage provides more context than the raw data.

5C.

logFancy = log(fancy)
dummyColumn = c()
for (i in (1:length(logFancy))) {
  if (i %% 12 == 3 & i>3) {
    dummyColumn[i] = 1
  }
  else {
    dummyColumn[i] = 0

  }
}

fancyMod = tslm(logFancy~trend+season+dummyColumn)
summary(fancyMod)

## 
## Call:
## tslm(formula = logFancy ~ trend + season + dummyColumn)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33673 -0.12757  0.00257  0.10911  0.37671 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.6196670  0.0742471 102.626  < 2e-16 ***
## trend       0.0220198  0.0008268  26.634  < 2e-16 ***
## season2     0.2514168  0.0956790   2.628 0.010555 *  
## season3     0.2660828  0.1934044   1.376 0.173275    
## season4     0.3840535  0.0957075   4.013 0.000148 ***
## season5     0.4094870  0.0957325   4.277 5.88e-05 ***
## season6     0.4488283  0.0957647   4.687 1.33e-05 ***
## season7     0.6104545  0.0958039   6.372 1.71e-08 ***
## season8     0.5879644  0.0958503   6.134 4.53e-08 ***
## season9     0.6693299  0.0959037   6.979 1.36e-09 ***
## season10    0.7473919  0.0959643   7.788 4.48e-11 ***
## season11    1.2067479  0.0960319  12.566  < 2e-16 ***
## season12    1.9622412  0.0961066  20.417  < 2e-16 ***
## dummyColumn 0.5015151  0.1964273   2.553 0.012856 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared:  0.9567, Adjusted R-squared:  0.9487 
## F-statistic:   119 on 13 and 70 DF,  p-value: < 2.2e-16

5D. Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?

plot(as.numeric(fancyMod$residuals), ylab = "Residuals", main = 'Residuals vs. Time')

plot(as.numeric(fancyMod$fitted.values),as.numeric(fancyMod$residuals),  ylab = "Residuals", xlab = "Fitted", main = "Model Fitted vs Residuals")

checkresiduals(fancyMod)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals from Linear regression model
## LM test = 37.954, df = 17, p-value = 0.002494

The residuals exhibit serial correlation, which can invalidate the model.

5E. Do boxplots of the residuals for each month. Does this reveal any problems with the model?

boxplot(fancyMod$residuals~cycle(fancyMod$residuals), ylab = "Residuals", xlab = "Month")

We can see that the model predicts certain months with more accuracy than others. This would indicate heteroskedasticity, as the variation in the residuals is not constant within each year.

5F. What do the values of the coefficients tell you about each variable?

print(fancyMod$coefficients)

## (Intercept)       trend     season2     season3     season4     season5 
##  7.61966701  0.02201983  0.25141682  0.26608280  0.38405351  0.40948697 
##     season6     season7     season8     season9    season10    season11 
##  0.44882828  0.61045453  0.58796443  0.66932985  0.74739195  1.20674790 
##    season12 dummyColumn 
##  1.96224123  0.50151509

We can see that not only does the general trend increase over time, but also the seasonal trend increases throughout the year, with the end of the year (especially November and December) having large increases in sales. Season3 (March) on its own does not seem extraordinary, but when we add in the dummyColumn to its interpretation, we see that it will be an outlier for sales increases early in the year. Also, all the coefficients are positive, so as time goes on we will not expect sales to be lower than in that month the previous year.

5G. What does the Breusch-Godfrey test tell you about your model?

checkresiduals(fancyMod)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals from Linear regression model
## LM test = 37.954, df = 17, p-value = 0.002494

The Breusch-Godfrey test indicates that there is serial correlation.

5H. Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.

dummyColumn2 = c()
for (i in (1:36)) {
  if (i %% 12 == 3 & i>3) {
    dummyColumn2[i] = 1
  }
  else {
    dummyColumn2[i] = 0

  }
}

fancyModfcast = forecast(fancyMod, newdata = dummyColumn2, h=36)

## Warning in forecast.lm(fancyMod, newdata = dummyColumn2, h = 36): newdata column
## names not specified, defaulting to first variable required.

print(fancyModfcast)

##          Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## Jan 1994       9.491352  9.238522  9.744183  9.101594  9.88111
## Feb 1994       9.764789  9.511959 10.017620  9.375031 10.15455
## Mar 1994       9.801475  9.461879 10.141071  9.277961 10.32499
## Apr 1994       9.941465  9.688635 10.194296  9.551707 10.33122
## May 1994       9.988919  9.736088 10.241749  9.599161 10.37868
## Jun 1994      10.050280  9.797449 10.303110  9.660522 10.44004
## Jul 1994      10.233926  9.981095 10.486756  9.844168 10.62368
## Aug 1994      10.233456  9.980625 10.486286  9.843698 10.62321
## Sep 1994      10.336841 10.084010 10.589671  9.947083 10.72660
## Oct 1994      10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994      10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994      11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995       9.755590  9.499844 10.011336  9.361338 10.14984
## Feb 1995      10.029027  9.773281 10.284773  9.634775 10.42328
## Mar 1995      10.567228 10.310518 10.823938 10.171489 10.96297
## Apr 1995      10.205703  9.949957 10.461449  9.811451 10.59996
## May 1995      10.253157  9.997411 10.508903  9.858904 10.64741
## Jun 1995      10.314518 10.058772 10.570264  9.920265 10.70877
## Jul 1995      10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995      10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995      10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995      10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995      11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995      11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996      10.019828  9.760564 10.279093  9.620151 10.41951
## Feb 1996      10.293265 10.034000 10.552530  9.893588 10.69294
## Mar 1996      10.831466 10.571566 11.091365 10.430810 11.23212
## Apr 1996      10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996      10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996      10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996      10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996      10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996      10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996      10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996      11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996      12.224288 11.965023 12.483552 11.824611 12.62396

autoplot(fancyModfcast)

5I. Transform your predictions and intervals to obtain predictions and intervals for the raw data.

rawDataPrediction = exp(as.data.frame(fancyModfcast))
rawDataPrediction

##          Point Forecast     Lo 80     Hi 80      Lo 95     Hi 95
## Jan 1994       13244.70  10285.82  17054.73   8969.583  19557.43
## Feb 1994       17409.81  13520.45  22418.00  11790.284  25707.73
## Mar 1994       18060.36  12860.02  25363.61  10699.594  30484.96
## Apr 1994       20774.16  16133.21  26750.16  14068.696  30675.62
## May 1994       21783.73  16917.24  28050.15  14752.395  32166.37
## Jun 1994       23162.27  17987.81  29825.24  15685.969  34201.95
## Jul 1994       27831.56  21613.98  35837.72  18848.111  41096.73
## Aug 1994       27818.48  21603.82  35820.87  18839.249  41077.41
## Sep 1994       30848.42  23956.87  39722.43  20891.193  45551.50
## Oct 1994       34095.57  26478.61  43903.67  23090.230  50346.32
## Nov 1994       55176.84  42850.31  71049.28  37366.903  81475.41
## Dec 1994      120067.79  93244.59 154607.08  81312.400 177294.90
## Jan 1995       17250.40  13357.65  22277.59  11629.938  25587.08
## Feb 1995       22675.20  17558.28  29283.31  15287.252  33633.55
## Mar 1995       38840.85  30046.98  50208.44  26146.972  57697.39
## Apr 1995       27057.06  20951.33  34942.16  18241.435  40133.06
## May 1995       28371.96  21969.51  36640.25  19127.918  42083.42
## Jun 1995       30167.42  23359.80  38958.95  20338.387  44746.58
## Jul 1995       36248.88  28068.91  46812.70  24438.412  53767.06
## Aug 1995       36231.84  28055.72  46790.69  24426.922  53741.78
## Sep 1995       40178.16  31111.50  51887.06  27087.467  59595.26
## Oct 1995       44407.37  34386.35  57348.77  29938.733  65868.34
## Nov 1995       71864.42  55647.40  92807.48  48449.831 106594.69
## Dec 1995      156380.86 121091.75 201954.08 105429.448 231955.81
## Jan 1996       22467.57  17336.40  29117.46  15065.329  33506.86
## Feb 1996       29533.04  22788.25  38274.14  19802.984  44043.89
## Mar 1996       50587.81  39009.73  65602.25  33887.802  75517.62
## Apr 1996       35240.15  27191.96  45670.42  23629.808  52555.15
## May 1996       36952.72  28513.41  47889.88  24778.151  55109.18
## Jun 1996       39291.20  30317.82  50920.48  26346.183  58596.65
## Jul 1996       47211.93  36429.60  61185.57  31657.322  70409.18
## Aug 1996       47189.73  36412.48  61156.80  31642.439  70376.07
## Sep 1996       52329.57  40378.47  67817.91  35088.887  78041.33
## Oct 1996       57837.85  44628.77  74956.52  38782.394  86256.08
## Nov 1996       93598.96  72222.70 121302.09  62761.521 139588.15
## Dec 1996      203676.38 157160.50 263959.89 136572.460 303751.35

5J. How could you improve these predictions by modifying the model?

The best way to improve the model would be to try to reduce the serial correlation with an alternative method to the log model. We could use an auto regressive term or the first differences approach.

6A. Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.

AND

6B. Select the appropriate number of Fourier terms to include by minimising the AICc or CV value.

newGas = window(gasoline, end = 2005)

fourierNums = c(2,4,8,16,24)

for (i in 1:length(fourierNums)) {
  x = fourierNums[i]
  fourierMod = tslm(newGas~trend + fourier(newGas, x))
  chartTitle = paste("Fourier = ", fourierNums[i])
  print(autoplot(newGas, main =  chartTitle, ylab = "Gasoline") + autolayer(fourierMod$fitted.values))
  print(paste("Fourier = ", x))
  print(forecast::CV(fourierMod))
}

## [1] "Fourier =  2"
##            CV           AIC          AICc           BIC         AdjR2 
##  8.136854e-02 -1.819113e+03 -1.818957e+03 -1.787001e+03  8.269569e-01

## [1] "Fourier =  4"
##            CV           AIC          AICc           BIC         AdjR2 
##  7.648608e-02 -1.864112e+03 -1.863742e+03 -1.813649e+03  8.382404e-01

## [1] "Fourier =  8"
##            CV           AIC          AICc           BIC         AdjR2 
##  7.147431e-02 -1.913619e+03 -1.912542e+03 -1.826455e+03  8.505267e-01

## [1] "Fourier =  16"
##            CV           AIC          AICc           BIC         AdjR2 
##  7.206894e-02 -1.908817e+03 -1.905165e+03 -1.748253e+03  8.526940e-01

## [1] "Fourier =  24"
##            CV           AIC          AICc           BIC         AdjR2 
##  7.356848e-02 -1.895870e+03 -1.888001e+03 -1.661905e+03  8.531134e-01

We see that when the number of fourier terms increases, the fitted values start to more closely rememble the observed values. However, this does not hold in an analysis of the CV. We see that the model with the lowest AICc and CV is the model with 8 fourier terms.

6C. Check the residuals of the final model using the checkresiduals() function. Even though the residuals fail the correlation tests, the results are probably not severe enough to make much difference to the forecasts and prediction intervals. (Note that the correlations are relatively small, even though they are significant.)

finalMod = tslm(newGas~trend + fourier(newGas, 8))
checkresiduals(finalMod)

## 
##  Breusch-Godfrey test for serial correlation of order up to 104
## 
## data:  Residuals from Linear regression model
## LM test = 152.37, df = 104, p-value = 0.001414

6D. To forecast using harmonic regression, you will need to generate the future values of the Fourier terms. This can be done as follows.

fc <- forecast(fit, newdata=data.frame(fourier(x,K,h))) where fit is the fitted model using tslm(), K is the number of Fourier terms used in creating fit, and h is the forecast horizon required.

Forecast the next year of data.

fc = forecast(finalMod, newdata=data.frame(fourier(newGas,8,h=52)))
fc

##          Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2005.014       8.799321 8.455821 9.142820 8.273575  9.325067
## 2005.033       8.638422 8.294831 8.982013 8.112536  9.164307
## 2005.052       8.580027 8.236399 8.923655 8.054085  9.105969
## 2005.071       8.625418 8.281810 8.969026 8.099506  9.151330
## 2005.090       8.726718 8.383171 9.070266 8.200899  9.252538
## 2005.110       8.824669 8.481180 9.168159 8.298939  9.350399
## 2005.129       8.886509 8.543039 9.229980 8.360808  9.412211
## 2005.148       8.918033 8.574555 9.261512 8.392320  9.443747
## 2005.167       8.945724 8.602245 9.289202 8.420010  9.471437
## 2005.186       8.987569 8.644102 9.331036 8.461873  9.513266
## 2005.205       9.037388 8.693927 9.380848 8.511701  9.563074
## 2005.225       9.073625 8.730165 9.417085 8.547940  9.599310
## 2005.244       9.082270 8.738808 9.425732 8.556582  9.607958
## 2005.263       9.072411 8.728945 9.415877 8.546716  9.598106
## 2005.282       9.070531 8.727062 9.413999 8.544833  9.596228
## 2005.301       9.098580 8.755115 9.442045 8.572887  9.624273
## 2005.320       9.154930 8.811469 9.498391 8.629243  9.680617
## 2005.340       9.214808 8.871348 9.558268 8.689123  9.740493
## 2005.359       9.250329 8.906868 9.593790 8.724642  9.776015
## 2005.378       9.253745 8.910281 9.597209 8.728054  9.779437
## 2005.397       9.244784 8.901318 9.588250 8.719089  9.770478
## 2005.416       9.255879 8.912415 9.599344 8.730187  9.781571
## 2005.435       9.306954 8.963492 9.650415 8.781266  9.832641
## 2005.455       9.389526 9.046066 9.732986 8.863841  9.915212
## 2005.474       9.472231 9.128770 9.815692 8.946545  9.997917
## 2005.493       9.522989 9.179526 9.866453 8.997299 10.048679
## 2005.512       9.530727 9.187261 9.874192 9.005033 10.056420
## 2005.531       9.510936 9.167472 9.854401 8.985244 10.036629
## 2005.550       9.492434 9.148972 9.835896 8.966746 10.018122
## 2005.570       9.496352 9.152892 9.839812 8.970666 10.022037
## 2005.589       9.522500 9.179039 9.865960 8.996814 10.048186
## 2005.608       9.551001 9.207538 9.894463 9.025312 10.076690
## 2005.627       9.555663 9.212198 9.899128 9.029969 10.081356
## 2005.646       9.519035 9.175570 9.862501 8.993342 10.044729
## 2005.665       9.440914 9.097452 9.784377 8.915225  9.966603
## 2005.685       9.338363 8.994902 9.681823 8.812677  9.864049
## 2005.704       9.239828 8.896368 9.583288 8.714143  9.765514
## 2005.723       9.175995 8.832533 9.519457 8.650306  9.701684
## 2005.742       9.168477 8.825011 9.511942 8.642783  9.694171
## 2005.761       9.218398 8.874931 9.561865 8.692703  9.744093
## 2005.780       9.300508 8.957045 9.643972 8.774818  9.826199
## 2005.800       9.369945 9.026484 9.713406 8.844259  9.895631
## 2005.819       9.383406 9.039946 9.726866 8.857721  9.909091
## 2005.838       9.325628 8.982166 9.669090 8.799940  9.851317
## 2005.857       9.223931 8.880463 9.567398 8.698234  9.749628
## 2005.876       9.137010 8.793539 9.480481 8.611307  9.662712
## 2005.895       9.120060 8.776593 9.463528 8.594364  9.645757
## 2005.915       9.186567 8.843105 9.530030 8.660878  9.712256
## 2005.934       9.293371 8.949910 9.636832 8.767684  9.819058
## 2005.953       9.362637 9.019174 9.706100 8.836947  9.888326
## 2005.972       9.329401 8.985912 9.672889 8.803672  9.855130
## 2005.991       9.184109 8.840541 9.527678 8.658258  9.709960

6E. Plot the forecasts along with the actual data for 2005. What do you find?

newGas2 = window(gasoline, end = 2006)
autoplot(fc) + autolayer(newGas2)

We can see that the prediction with 8 fourier series is pretty accurate in predicting 2005's data, except that it fails to predict the large drop midway through the year.

7A. Plot the data and comment on its features.

autoplot(huron, ylab = "Level of Lake Huron (Feet)", main = "Huron Levels Over Time")

We see that the lake levels vary greatly over time It does not seem that these variations are seasonal (that is, within each year), but the data does seem to move up and down somewhat predictably (the issue here being the maginitude of the rise and fall is not so predictable). Overally, there is a downward trend, but this is largely due to the high levels in the late 1800's. The levels have similar peaks and valleys after 1920.

7B. Fit a linear regression and compare this to a piecewise linear trend model with a knot at 1915.

linearHuron = tslm(huron~trend)
summary(linearHuron)

## 
## Call:
## tslm(formula = huron ~ trend)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50997 -0.72726  0.00083  0.74402  2.53565 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.202037   0.230111  44.335  < 2e-16 ***
## trend       -0.024201   0.004036  -5.996 3.55e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared:  0.2725, Adjusted R-squared:  0.2649 
## F-statistic: 35.95 on 1 and 96 DF,  p-value: 3.545e-08

t=time(huron)
knot1 = 1915
piece1 = ts(pmax(0, t - knot1), start = 1875)
piecewiseHuron = tslm(huron~trend+piece1)
summary(piecewiseHuron)

## 
## Call:
## tslm(formula = huron ~ trend + piece1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.49626 -0.66240 -0.07139  0.85163  2.39222 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.12685    0.30817  36.106  < 2e-16 ***
## trend       -0.06498    0.01051  -6.181 1.58e-08 ***
## piece1       0.06486    0.01563   4.150 7.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.045 on 95 degrees of freedom
## Multiple R-squared:  0.3841, Adjusted R-squared:  0.3711 
## F-statistic: 29.62 on 2 and 95 DF,  p-value: 1.004e-10

The trend in the linear model is three times smaller in the linear model than the piecewise model. Also the coefficient for the knot portion is positive. The adjusted r-squared for the piecewise model is higher, 0.37 to 0.26.

7C. Generate forecasts from these two models for the period up to 1980 and comment on these.

linForecast = forecast(linearHuron, h = 8)

h=8
t.new <- t[length(t)] + seq(h)
piece1.new <- piece1[length(piece1)] + seq(h)

newdata <- cbind(t=t.new, piece1=piece1.new) %>%
  as.data.frame()
pieceForecast <- forecast(piecewiseHuron, newdata = newdata)

autoplot(linForecast, main = "Linear forecast")

autoplot(pieceForecast, main = "Piecewise forecast")

linForecast

##      Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 1973       7.806127 6.317648 9.294605 5.516501 10.095752
## 1974       7.781926 6.292536 9.271315 5.490899 10.072952
## 1975       7.757724 6.267406 9.248043 5.465269 10.050179
## 1976       7.733523 6.242259 9.224788 5.439613 10.027434
## 1977       7.709322 6.217094 9.201550 5.413929 10.004715
## 1978       7.685121 6.191912 9.178331 5.388219  9.982024
## 1979       7.660920 6.166712 9.155128 5.362481  9.959359
## 1980       7.636719 6.141494 9.131943 5.336717  9.936721

pieceForecast

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1973       8.455119 7.063583 9.846655 6.314483 10.59575
## 1974       8.454992 7.061518 9.848467 6.311374 10.59861
## 1975       8.454866 7.059398 9.850333 6.308182 10.60155
## 1976       8.454739 7.057225 9.852253 6.304906 10.60457
## 1977       8.454612 7.054998 9.854227 6.301549 10.60768
## 1978       8.454486 7.052717 9.856254 6.298109 10.61086
## 1979       8.454359 7.050384 9.858334 6.294587 10.61413
## 1980       8.454232 7.047997 9.860467 6.290984 10.61748

The piecewise forecast starts at a higher level and projects a mostly flat line. The linear forecast starts lower and continutes to project downwards. This makes sense given the coefficients discussed in the previous section. The piecewise coefficient and trend components in the piecewise model largely cancel each other out. Neither model seems particularly good. Both have wide ranges of output at both the 80% and 95% confidence levels.

Chapter 6 Exercises

A 7-term weighted moving average has weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.

This is equal to a 3 x 5 MA because:

T = 1/3 [1/5(y0 + y1 + y2 + y3 + y4) + 1/5(y1 + y2 + y3 + y4 + y5) = 1/5(y2 + y3 + y4 + y5 + y6)]

T = 1/3 [(y0/5) + (2y1/5) + (3y2/5) + (3y3/5) + (3y4/5) + (2*y5/5) + (y6/5)]

T = y0/15 + 2(y1)/15 + 3(y2)/15 + 3(y3)/15 + 3(y4)/15 + 2(y5)/15 + y6/15

T = 0.067(y0) + 0.133(y1) + 0.200(y2) + 0.200(y3) + 0.200(y4) + 0.133(y5) + 0.067(y6)

2A. Plot

autoplot(plastics, ylab = "Monthly Sales (in thousands)", main = "Time Series of Sales of Product A")

The sales rise the first half of the year, peak in what appears to be July, and then fall again until the start of the following year. This seasonality appears to have a constant rate of change over time. There is a trend, too. The amount of sales tends to increase year-over-year.

2B. Multiplicative decomposition

decomposedPlastics = plastics %>% decompose(type="multiplicative")
decomposedPlastics

## $x
##    Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  742  697  776  898 1030 1107 1165 1216 1208 1131  971  783
## 2  741  700  774  932 1099 1223 1290 1349 1341 1296 1066  901
## 3  896  793  885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4  951  861  938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
## 
## $seasonal
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 2 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 3 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 4 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 5 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
##         Aug       Sep       Oct       Nov       Dec
## 1 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 2 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 3 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 4 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 5 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 
## $trend
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1        NA        NA        NA        NA        NA        NA  976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500        NA
##         Aug       Sep       Oct       Nov       Dec
## 1  977.0417  977.0833  978.4167  982.7083  990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5        NA        NA        NA        NA        NA
## 
## $random
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1        NA        NA        NA        NA        NA        NA 1.0247887
## 2 0.9656005 0.9745267 0.9750081 0.9894824 1.0061175 1.0024895 1.0401641
## 3 1.0454117 0.9953920 1.0079773 1.0142083 0.9990100 0.9854384 0.9567618
## 4 1.0257400 0.9924762 0.9807020 0.9798704 0.9684851 0.9627557 0.9917766
## 5 0.9767392 1.0510964 1.0498039 1.0299302 1.0398787 1.0628077        NA
##         Aug       Sep       Oct       Nov       Dec
## 1 1.0157335 1.0040354 0.9724119 0.9961368 0.9489762
## 2 1.0230774 1.0040674 0.9962088 0.9735577 0.9721203
## 3 0.9969907 1.0132932 1.0314752 0.9910657 1.0258002
## 4 0.9776897 0.9920952 1.0133954 1.0527311 1.0665946
## 5        NA        NA        NA        NA        NA
## 
## $figure
##  [1] 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
##  [8] 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 
## $type
## [1] "multiplicative"
## 
## attr(,"class")
## [1] "decomposed.ts"

C. Yes, these results support the graphical interpretation from part A. We can confirm that the seasonal peak is in July and that, for the most part, sales are rising up to July and falling after July. I will note that October sees a slight seasonal uptick, and the the lowest values are actually in February and not in January, as I surmised in part A.

D. Compute and plot the seasonally adjusted data.

autoplot(seasadj(decomposedPlastics), main = "Seasonally Adjusted Plastics Data", ylab = "Monthly Sales (in thousands)")

E. Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

newPlastics = plastics
newPlastics[32] = newPlastics[32] + 500
decomp2 = decompose(newPlastics, type = "multiplicative")
autoplot(seasadj(decomp2))

The outlier dramatically changes the shape of the seasonally adjusted data. It makes it so there is a peak in the middle of year three of the data and thena sudden fall back to the trend. This indicates that the seasonally adjusted model is not robust to outliers.

newPlastics1 = plastics
newPlastics1[50] = newPlastics1[50] + 500
decomp3 = decompose(newPlastics1, type = "multiplicative")
autoplot(seasadj(decomp3))

newPlastics2 = plastics
newPlastics2[10] = newPlastics2[10] + 500
decomp3 = decompose(newPlastics2, type = "multiplicative")
autoplot(seasadj(decomp3))

From these graphs, we can see again that outliers change the shape of the graph regardless of where they are positioned. However, the outlier early in the data doesn't disrupt the tend of the data as much as the outlier towards the end of the data. The outlier at the end of the data set makes the curve look almost foreign to the original curve. The outlier in the beginning shows up in the model, but the curve is still recognizeable.

Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Set up time series and x11 decomposition

retailGrocery = ts(retail[,"A3349335T"], frequency = 12, start = c(1982,4))
x11Retail = retailGrocery %>% seas(x11="")
autoplot(x11Retail)

There appears to be some more unusual data points with high remainders at the beginning and ends to the data. There is also a patch around 1990 that has consistently higher remainders. The seasonal component also varies slightly over the course of the time series.

4A. The decomposition shows a seasonal component that is becoming more extreme over time. This is especially true for the lows of the chart, which start at around -50 and end up past -100 by the end of the graph. the highs gets slightly higher, but are relatively consistent. It also shows a general trend that flattened out around 1991 after steadily increasing up until then. The subseries plot shows dramatic seasonal changes in each mont ove the course of the data. August and March see large drops, while July, September, and December all increase over time.

4B. The recession of 1991/1992 is visible in the remainder component, which has extreme negative values for that time period. The seasonal, trend, and subseries plots do not show this change.

5A. Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?

autoplot(cangas, main = "Monthly Canada Gas Production", ylab = "Gasoline (billions of cubic metres)")

ggsubseriesplot(cangas, main = "Subseries plot", ylab = "Gasoline (billions of cubic metres)")

ggseasonplot(cangas, ylab = "Gasoline (billions of cubic metres)")

The rate of change in the seasonlity is decreasing until the mid 1980's and then again after that period. This could be causing some of the large changes we see in the seasonality over time. With each month becoming more similar to each other as the data progresses, we can expect to see the seasonality effects change over time as well, and that is what appears to be happening here. We can see that midway through the data of the subseries plots, the seasonality flattens out before rising again. That is the mid-1980's period.

5B. Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.

cangas %>%
  stl(t.window=13, s.window="periodic", robust=TRUE) %>%
  autoplot()

5C. Compare the results with those obtained using SEATS and X11. How are they different?

x11cangas = cangas %>% seas(x11="")
autoplot(x11cangas, main = "x11")

SEATScangas = cangas %>% seas()
autoplot(SEATScangas, main = "SEATS")

The x11 and SEATS results show the changing seasonality over the course of the graph while the stl model does not. The remainders in the SEATS model appear to be much larger than in the x11 model, but this is actually just due to changes in the y-axis. It actually seems that the x11 decomposition has more extreme outliers than the SEATS model, which is largely consistent. The stl model has a wide range of remainders, with the most extreme coming in the mid 1980's. All three have relatively similar trend lines.

6A. Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)

bricksqDecomp = bricksq %>% stl(t.window=13, s.window="periodic", robust=FALSE) 
autoplot(bricksqDecomp, main = "Bricksq stl decomp")

6B. Compute and plot the seasonally adjusted data.

seasAdjbricksq = seasadj(bricksqDecomp)
autoplot(seasAdjbricksq, main = "Seasonally Adjusted bricksq Data", ylab = "Clay Brick Production")

6C. Use a naïve method to produce forecasts of the seasonally adjusted data.

naiveFC = (naive(seasAdjbricksq, h = 30))
autoplot(naiveFC)

naiveFC

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4        469.591 440.5726 498.6093 425.2113 513.9707
## 1995 Q1        469.591 428.5528 510.6291 406.8286 532.3534
## 1995 Q2        469.591 419.3297 519.8522 392.7230 546.4589
## 1995 Q3        469.591 411.5543 527.6276 380.8315 558.3504
## 1995 Q4        469.591 404.7040 534.4779 370.3549 568.8270
## 1996 Q1        469.591 398.5109 540.6711 360.8833 578.2986
## 1996 Q2        469.591 392.8157 546.3663 352.1733 587.0087
## 1996 Q3        469.591 387.5147 551.6672 344.0662 595.1158
## 1996 Q4        469.591 382.5360 556.6460 336.4518 602.7301
## 1997 Q1        469.591 377.8269 561.3550 329.2500 609.9319
## 1997 Q2        469.591 373.3480 565.8339 322.4001 616.7818
## 1997 Q3        469.591 369.0685 570.1134 315.8551 623.3268
## 1997 Q4        469.591 364.9639 574.2181 309.5776 629.6043
## 1998 Q1        469.591 361.0143 578.1676 303.5373 635.6447
## 1998 Q2        469.591 357.2034 581.9785 297.7091 641.4729
## 1998 Q3        469.591 353.5176 585.6643 292.0721 647.1098
## 1998 Q4        469.591 349.9453 589.2366 286.6087 652.5732
## 1999 Q1        469.591 346.4766 592.7053 281.3038 657.8781
## 1999 Q2        469.591 343.1030 596.0790 276.1443 663.0377
## 1999 Q3        469.591 339.8170 599.3649 271.1189 668.0631
## 1999 Q4        469.591 336.6122 602.5697 266.2176 672.9644
## 2000 Q1        469.591 333.4829 605.6990 261.4317 677.7503
## 2000 Q2        469.591 330.4239 608.7580 256.7533 682.4286
## 2000 Q3        469.591 327.4307 611.7512 252.1757 687.0063
## 2000 Q4        469.591 324.4993 614.6826 247.6924 691.4895
## 2001 Q1        469.591 321.6259 617.5560 243.2979 695.8840
## 2001 Q2        469.591 318.8073 620.3747 238.9872 700.1947
## 2001 Q3        469.591 316.0404 623.1416 234.7556 704.4263
## 2001 Q4        469.591 313.3224 625.8595 230.5989 708.5830
## 2002 Q1        469.591 310.6510 628.5309 226.5133 712.6687

6D. Use stlf() to reseasonalise the results, giving forecasts for the original data.

STLFbricksq = stlf(seasAdjbricksq, method='naive')
autoplot(STLFbricksq)

6E. Do the residuals look uncorrelated?

checkresiduals(STLFbricksq)

## Warning in checkresiduals(STLFbricksq): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  Random walk
## Q* = 40.829, df = 8, p-value = 2.244e-06
## 
## Model df: 0.   Total lags used: 8

There seems to be some autocorrelation. There are a couple of bars past the 95% level in the ACF plot, and the Breusch-Godfrey test shows possible autocorrelation. The autocorrelation is not terrible, but it is not ideal.

6F. Repeat with a robust STL decomposition. Does it make much difference?

bricksqDecomp1 = bricksq %>% stl(t.window=13, s.window="periodic", robust=TRUE) 
seasAdjbricksq1 = seasadj(bricksqDecomp1)
STLFbricksqRobust = stlf(seasAdjbricksq1, method='naive')
autoplot(STLFbricksqRobust)

checkresiduals(STLFbricksqRobust)

## Warning in checkresiduals(STLFbricksqRobust): The fitted degrees of freedom is
## based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  Random walk
## Q* = 40.829, df = 8, p-value = 2.244e-06
## 
## Model df: 0.   Total lags used: 8

The robust model does not make much of a difference. The forecasts look similar, and there still is some degree of autocorrelation.

6G. Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?

bricksqtrain = window(bricksq, end = c(1992,3))
bricksqtest = window(bricksq, start = c(1992,4))

bricksq_snaive = snaive(bricksqtrain,h=8)
bricksq_stlf = stlf(bricksqtrain, method='naive', h=8)

autoplot(bricksq_snaive) + autolayer(bricksqtest) + xlim(c(1990,1994))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

## Warning: Removed 136 row(s) containing missing values (geom_path).

## Warning: Removed 2 row(s) containing missing values (geom_path).

autoplot(bricksq_stlf) + autolayer(bricksqtest) +xlim(c(1990,1994))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

## Warning: Removed 136 row(s) containing missing values (geom_path).

## Warning: Removed 2 row(s) containing missing values (geom_path).

The stlf model appears to produce more accurate results than the seasonal naive model. This is because the stlf both has smaller 80% and 95% CI intervals and does a better job predicting the shape of the data, even if is consistents underestimates the values.

Use stlf() to produce forecasts of the writing series with either method="naive" or method="rwdrift", whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

lambdaWriting <- BoxCox.lambda(writing)

writing_naive = stlf(writing, method='naive', lambda = lambdaWriting)
writing_rwf = stlf(writing, method='rwdrift', lambda = lambdaWriting)

autoplot(writing_naive)

autoplot(writing_rwf)

The naive method appears to be more appropriate, as it has smaller confidence intervals around the otherwise similar forecasts.

Use stlf() to produce forecasts of the fancy series with either method="naive" or method="rwdrift", whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(fancy, main = "Fancy")

lambdaFancy <- BoxCox.lambda(fancy)

fancy_naive = stlf(fancy, method='naive', lambda = lambdaFancy)
fancy_rwf = stlf(fancy, method='rwdrift', lambda = lambdaFancy)

autoplot(fancy_naive)

autoplot(fancy_rwf)

Here we can see there is a large transformation. It also appears that the rwdrift method is better due to smaller confidence intervals.

Homework1

Justin Lynch

7/2/2020