Stemmerich_Homework1

Chapter 5

a)Plot the data and find the regression model for Demand with temperature as an explanatory variable. Why is there a positive relationship?

library(fpp2)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## -- Attaching packages ---------------------------------------------- fpp2 2.4 --

## v ggplot2   3.3.5     v fma       2.4  
## v forecast  8.15      v expsmooth 2.3

##

daily20 = head(elecdaily,20)
daily20_ts = ts(daily20)
plot(Demand~Temperature, data = daily20_ts)

Answer: As temperature increases, the demand of electricity increases, which is why we see a positive relationship between Demand and Temperature.

Produce residual plot. Is the model adequate? Are there any outliers or influential observations?

daily_tslm = tslm(Demand~Temperature, data = daily20_ts)
checkresiduals(daily_tslm$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

Answer: there seems to be no correlation with the residuals, so it seems the model will be fairly accurate

Use the model to forecast the demand you would expect for the next day if the max temperature was 15 degrees and compare it with the forecast if the maximum temperature was 35 degrees. Do you believe the forecasts?

forecast15 = forecast(daily_tslm, newdata = data.frame(Temperature = 15))
forecast35 = forecast(daily_tslm, newdata = data.frame(Temperature = 35))
forecast15

##    Point Forecast   Lo 80    Hi 80    Lo 95    Hi 95
## 21       140.5701 108.681 172.4591 90.21166 190.9285

forecast35

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 21       275.7146 245.2278 306.2014 227.5706 323.8586

Answer: Forecasts seem realistic, if the temperature is higher we would expect
a higher demand of electricity.

Give prediction intervals for your forecasts.

autoplot(daily20_ts, facets = TRUE)

daily20 %>%
  as.data.frame( ) %>%
  ggplot(aes(x=Temperature, y=Demand)) + 
  geom_point( ) + 
  geom_smooth(method = "lm", se = F)

## `geom_smooth()` using formula 'y ~ x'

fit = tslm(Demand~Temperature, data = daily20_ts)
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774

forecast(fit, newdata = data.frame(Temperature = c(15,35)))

##    Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 21       140.5701 108.6810 172.4591  90.21166 190.9285
## 22       275.7146 245.2278 306.2014 227.57056 323.8586

Plot Demand vs Temperature for all of the available data in elecdaily

plot(Demand ~ Temperature, data = elecdaily)

2. a. Plot the winning time against the year. Describe main features of the plot

autoplot(mens400, xlab = "Time", ylab = "Year")

Answer: The winning time has decreased from the late 1800s until 2016. There are some gaps in history where I assume the even did not take place, but there is a clear decrease in winning time over the century.

Fit a regression line to the data. What is the average rate per year decrease in time?

fit = tslm(mens400~time(mens400), data = mens400)
fit

## 
## Call:
## tslm(formula = mens400 ~ time(mens400), data = mens400)
## 
## Coefficients:
##   (Intercept)  time(mens400)  
##     172.48148       -0.06457

Answer: we can see a decrease of about 0.065 seconds per year.

autoplot(mens400, xlab = "Time", ylab = "Year") + 
  geom_abline(slope = fit$coefficients[2], intercept = fit$coefficients[1])

c) Plot the residuals against the year. What does this indicate about the suitability of the fitted line?

checkresiduals(fit$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

Answer: Residuals seem to have a fair fit. Not too many fall outside the scopes.

Predict the winning time for the men’s 400 meters final in the 2020 Olympics. give a prediction inverval for your forecasts. What assumptions have you made in these calculations?

time = time(mens400)
forecast(fit, newdata = data.frame(time=time + 1))

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2020       50.04947 48.45492 51.64401 47.55691 52.54202
## 2024       49.79117 48.20567 51.37668 47.31275 52.26959
## 2028       49.53288 47.95586 51.10989 47.06772 51.99803
## 2032       49.27458 47.70549 50.84367 46.82182 51.72734
## 2036       49.01629 47.45456 50.57801 46.57503 51.45754
## 2040       48.75799 47.20304 50.31294 46.32734 51.18864
## 2044       48.49969 46.95095 50.04844 46.07873 50.92066
## 2048       48.24140 46.69826 49.78454 45.82920 50.65360
## 2052       47.98310 46.44497 49.52124 45.57873 50.38748
## 2056       47.72481 46.19108 49.25854 45.32732 50.12230
## 2060       47.46651 45.93657 48.99645 45.07495 49.85807
## 2064       47.20822 45.68146 48.73498 44.82163 49.59481
## 2068       46.94992 45.42573 48.47412 44.56734 49.33251
## 2072       46.69163 45.16937 48.21388 44.31208 49.07118
## 2076       46.43333 44.91239 47.95427 44.05584 48.81082
## 2080       46.17504 44.65479 47.69528 43.79863 48.55144
## 2084       45.91674 44.39656 47.43692 43.54044 48.29304
## 2088       45.65845 44.13771 47.17918 43.28127 48.03562
## 2092       45.40015 43.87823 46.92207 43.02112 47.77918
## 2096       45.14185 43.61813 46.66558 42.76000 47.52371
## 2100       44.88356 43.35740 46.40972 42.49791 47.26921
## 2104       44.62526 43.09606 46.15447 42.23485 47.01568
## 2108       44.36697 42.83410 45.89984 41.97083 46.76311
## 2112       44.10867 42.57153 45.64581 41.70585 46.51149
## 2116       43.85038 42.30836 45.39240 41.43993 46.26082
## 2120       43.59208 42.04458 45.13958 41.17308 46.01109
## 2124       43.33379 41.78022 44.88736 40.90529 45.76229
## 2128       43.07549 41.51527 44.63572 40.63659 45.51440
## 2132       42.81720 41.24973 44.38466 40.36698 45.26741
## 2136       42.55890 40.98363 44.13417 40.09648 45.02132
## 2140       42.30061 40.71697 43.88425 39.82510 44.77611

Answer: The forecasted winning time is 50.05 seconds with the 80% interval of 48.45 to 51.64 seconds. And a 95% interval of 47.56 seconds to 52.54 seconds. I assumed the trend remained the same from the previous years.

Type easter(ausbeer) and interpret what you see.

easter(ausbeer)

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00

Answer: The easter() function tells your what quarter Easter weekend falls on. If there is a number between 0 and 1, Easter weekend falls inbetween March and April.

An elasticity coefficient is the ratio of the percentage change in the forecast variable (y) to the percentage change in the predictor variable (x). Mathematically, the elasticity is defined as (dy/dx)*(x/y). Consider the log-log model, logy=β0+β1logx+ε. Express y as a function of x and show that the coefficient β1 is the elasticity coefficient.

Produce a time plot of the data and describe the patterens in the graph. Identify and unusual or unexpected fluctuations in the time series.

autoplot(fancy)

Answer: There seems to be some seasonality in the time series. It also seems that each year, there is a bigger spike in the season.

Explain why it is necessary to take logarithms of these data before fitting a model. Answer: The logarithms will smooth out the extreme increases of each year. This will make the data easier to work with and have less bias.
Use R to fit a regression model to the lagarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.

fit = tslm(BoxCox(fancy, 0)~season + trend)
fit

## 
## Call:
## tslm(formula = BoxCox(fancy, 0) ~ season + trend)
## 
## Coefficients:
## (Intercept)      season2      season3      season4      season5      season6  
##     7.60586      0.25104      0.69521      0.38293      0.40799      0.44696  
##     season7      season8      season9     season10     season11     season12  
##     0.60822      0.58535      0.66634      0.74403      1.20302      1.95814  
##       trend  
##     0.02239

Plot the residuals against time and the fitted values. Do these plots reveal any problems with the model?

autoplot(fit$residuals)

Answer: There seems to be a slight correlation of the residuals and time.

Do boxplots of the residuals for each month. Does this reveal any problems with the model?

time = time(fancy)
cbind.data.frame(month = factor(month.abb[round(12*(time - floor(time))+1)],
                                labels = month.abb,
                                ordered = T),
                 Residuals = fit$residuals) %>%
  ggplot(aes(x = month, y = Residuals)) + geom_boxplot()

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

Answer: There are some irregularities per month, which can indicate some bias or correlation in certain months.

What do the coefficients tell you about each variable?

fit$coefficients

## (Intercept)     season2     season3     season4     season5     season6 
##  7.60586042  0.25104367  0.69520658  0.38293405  0.40799437  0.44696253 
##     season7     season8     season9    season10    season11    season12 
##  0.60821562  0.58535238  0.66634464  0.74403359  1.20301639  1.95813657 
##       trend 
##  0.02239298

Answer: Coefficients are positive telling me there is a positive trend through time.

What does the Breusch-Godfrey test tell you about your model?

checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals from Linear regression model
## LM test = 37.152, df = 17, p-value = 0.003209

Answer: Assuming an alpha value of 0.05, we can say the residuals are significantly correlated.

Regardless of you answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.

fancy_forecast = forecast(fit, newdata = data.frame(time=time + 1))
fancy_forecast

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 1994       9.509264  9.246995  9.771532  9.105002  9.913525
## Feb 1994       9.782700  9.520432 10.044969  9.378439 10.186962
## Mar 1994      10.249256  9.986988 10.511525  9.844995 10.653518
## Apr 1994       9.959377  9.697108 10.221645  9.555115 10.363638
## May 1994      10.006830  9.744562 10.269098  9.602569 10.411091
## Jun 1994      10.068191  9.805923 10.330459  9.663930 10.472452
## Jul 1994      10.251837  9.989569 10.514105  9.847576 10.656099
## Aug 1994      10.251367  9.989099 10.513635  9.847106 10.655628
## Sep 1994      10.354752 10.092484 10.617020  9.950491 10.759014
## Oct 1994      10.454834 10.192566 10.717102 10.050573 10.859095
## Nov 1994      10.936210 10.673942 11.198478 10.531949 11.340471
## Dec 1994      11.713723 11.451455 11.975991 11.309462 12.117984
## Jan 1995       9.777979  9.512777 10.043182  9.369195 10.186763
## Feb 1995      10.051416  9.786214 10.316618  9.642632 10.460200
## Mar 1995      10.517972 10.252770 10.783174 10.109188 10.926756
## Apr 1995      10.228092  9.962890 10.493295  9.819308 10.636876
## May 1995      10.275546 10.010343 10.540748  9.866762 10.684330
## Jun 1995      10.336907 10.071704 10.602109  9.928123 10.745691
## Jul 1995      10.520553 10.255351 10.785755 10.111769 10.929337
## Aug 1995      10.520083 10.254880 10.785285 10.111299 10.928867
## Sep 1995      10.623468 10.358266 10.888670 10.214684 11.032252
## Oct 1995      10.723550 10.458347 10.988752 10.314766 11.132334
## Nov 1995      11.204926 10.939723 11.470128 10.796142 11.613710
## Dec 1995      11.982439 11.717236 12.247641 11.573655 12.391223
## Jan 1996      10.046695  9.777950 10.315440  9.632451 10.460940
## Feb 1996      10.320132 10.051387 10.588877  9.905887 10.734376
## Mar 1996      10.786688 10.517943 11.055433 10.372443 11.200932
## Apr 1996      10.496808 10.228063 10.765553 10.082564 10.911053
## May 1996      10.544261 10.275516 10.813007 10.130017 10.958506
## Jun 1996      10.605623 10.336878 10.874368 10.191378 11.019867
## Jul 1996      10.789269 10.520524 11.058014 10.375024 11.203513
## Aug 1996      10.788798 10.520053 11.057543 10.374554 11.203043
## Sep 1996      10.892184 10.623439 11.160929 10.477939 11.306428
## Oct 1996      10.992266 10.723521 11.261011 10.578021 11.406510
## Nov 1996      11.473641 11.204896 11.742386 11.059397 11.887886
## Dec 1996      12.251155 11.982409 12.519900 11.836910 12.665399
## Jan 1997      10.315411 10.042538 10.588283  9.894804 10.736018
## Feb 1997      10.588848 10.315975 10.861720 10.168241 11.009454
## Mar 1997      11.055403 10.782531 11.328276 10.634797 11.476010
## Apr 1997      10.765524 10.492651 11.038396 10.344917 11.186131
## May 1997      10.812977 10.540105 11.085850 10.392371 11.233584
## Jun 1997      10.874338 10.601466 11.147211 10.453732 11.294945
## Jul 1997      11.057984 10.785112 11.330857 10.637378 11.478591
## Aug 1997      11.057514 10.784642 11.330387 10.636907 11.478121
## Sep 1997      11.160899 10.888027 11.433772 10.740293 11.581506
## Oct 1997      11.260981 10.988109 11.533854 10.840375 11.681588
## Nov 1997      11.742357 11.469485 12.015230 11.321750 12.162964
## Dec 1997      12.519870 12.246998 12.792743 12.099264 12.940477
## Jan 1998      10.584127 10.306568 10.861685 10.156297 11.011957
## Feb 1998      10.857563 10.580005 11.135122 10.429733 11.285393
## Mar 1998      11.324119 11.046561 11.601678 10.896289 11.751949
## Apr 1998      11.034240 10.756681 11.311798 10.606410 11.462070
## May 1998      11.081693 10.804134 11.359252 10.653863 11.509523
## Jun 1998      11.143054 10.865495 11.420613 10.715224 11.570884
## Jul 1998      11.326700 11.049142 11.604259 10.898870 11.754530
## Aug 1998      11.326230 11.048671 11.603789 10.898400 11.754060
## Sep 1998      11.429615 11.152056 11.707174 11.001785 11.857445
## Oct 1998      11.529697 11.252138 11.807256 11.101867 11.957527
## Nov 1998      12.011073 11.733514 12.288632 11.583243 12.438903
## Dec 1998      12.788586 12.511027 13.066145 12.360756 13.216416
## Jan 1999      10.852842 10.570067 11.135618 10.416971 11.288714
## Feb 1999      11.126279 10.843503 11.409055 10.690407 11.562151
## Mar 1999      11.592835 11.310059 11.875611 11.156963 12.028707
## Apr 1999      11.302955 11.020180 11.585731 10.867084 11.738827
## May 1999      11.350409 11.067633 11.633185 10.914537 11.786280
## Jun 1999      11.411770 11.128994 11.694546 10.975898 11.847642
## Jul 1999      11.595416 11.312640 11.878192 11.159544 12.031288
## Aug 1999      11.594946 11.312170 11.877722 11.159074 12.030817
## Sep 1999      11.698331 11.415555 11.981107 11.262459 12.134203
## Oct 1999      11.798413 11.515637 12.081189 11.362541 12.234285
## Nov 1999      12.279789 11.997013 12.562564 11.843917 12.715660
## Dec 1999      13.057302 12.774526 13.340078 12.621430 13.493174
## Jan 2000      11.121558 10.833063 11.410053 10.676871 11.566246
## Feb 2000      11.394995 11.106500 11.683490 10.950307 11.839682
## Mar 2000      11.861551 11.573056 12.150046 11.416863 12.306238
## Apr 2000      11.571671 11.283176 11.860166 11.126984 12.016359
## May 2000      11.619124 11.330629 11.907620 11.174437 12.063812
## Jun 2000      11.680486 11.391991 11.968981 11.235798 12.125173
## Jul 2000      11.864132 11.575637 12.152627 11.419444 12.308819
## Aug 2000      11.863661 11.575166 12.152157 11.418974 12.308349
## Sep 2000      11.967047 11.678552 12.255542 11.522359 12.411734
## Oct 2000      12.067129 11.778633 12.355624 11.622441 12.511816
## Nov 2000      12.548504 12.260009 12.837000 12.103817 12.993192
## Dec 2000      13.326018 13.037522 13.614513 12.881330 13.770705

autoplot(fancy_forecast)

j) How could you improve these predictions by modifying the model? Answer: We can test more transformations of the variables to see which one improves our results the best. We could also find different combinations of variables to use in our model.

Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.

gas = window(gasoline, end = 2005)
autoplot(gas)

gas1 = tslm(gas ~ trend + fourier(gas, K=5))
gas2 = tslm(gas ~ trend + fourier(gas, K=10))
gas3 = tslm(gas ~ trend + fourier(gas, K=15))
autoplot(gas) + autolayer(fitted(gas1)) + autolayer(fitted(gas2)) + 
  autolayer(fitted(gas3))

ANswer: the trend for each of the different Fourier terms are very similar, from first glance, it seems Fourier term = 15 (gas3) seems to be the best out of gas1, gas2, and gas3.

Select the appropriate number of Fourier terms to include by minimising the AICc or CV value.

CV(gas1)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.553646e-02 -1.873234e+03 -1.872723e+03 -1.813596e+03  8.406928e-01

CV(gas2)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.135754e-02 -1.915014e+03 -1.913441e+03 -1.809500e+03  8.516102e-01

CV(gas3)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.190862e-02 -1.910231e+03 -1.906988e+03 -1.758841e+03  8.525942e-01

Answer: Looks like if we have 10 Fourier terms, we have the lowest CV and AICc values.

check the residuals of the final model using the checkresiduals() function. Even though the residuals fail in the correlation tests, the results are probably not severe enough to make much difference to the forecasts and prediction intervals.

checkresiduals(gas2$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

d) Forecast using harmonic regression.

fit = forecast(gas2, newdata = data.frame(fourier(gas, 10, 52)))
fit

##          Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2005.014       8.745595 8.402358 9.088832 8.220249  9.270941
## 2005.033       8.604075 8.260737 8.947413 8.078574  9.129575
## 2005.052       8.607195 8.263839 8.950551 8.081666  9.132724
## 2005.071       8.683215 8.339879 9.026550 8.157718  9.208712
## 2005.090       8.746645 8.403404 9.089887 8.221293  9.271998
## 2005.110       8.783003 8.439850 9.126156 8.257785  9.308221
## 2005.129       8.833517 8.490370 9.176664 8.308309  9.358725
## 2005.148       8.917498 8.574348 9.260647 8.392285  9.442710
## 2005.167       8.997861 8.654729 9.340992 8.472676  9.523046
## 2005.186       9.029128 8.686005 9.372252 8.503956  9.554301
## 2005.205       9.018507 8.675381 9.361633 8.493330  9.543684
## 2005.225       9.017572 8.674439 9.360705 8.492385  9.542760
## 2005.244       9.056052 8.712918 9.399187 8.530863  9.581242
## 2005.263       9.105523 8.762395 9.448651 8.580344  9.630703
## 2005.282       9.121048 8.777925 9.464172 8.595876  9.646221
## 2005.301       9.105879 8.762754 9.449005 8.580703  9.631055
## 2005.320       9.112513 8.769382 9.455644 8.587329  9.637697
## 2005.340       9.175080 8.831948 9.518211 8.649895  9.700265
## 2005.359       9.259000 8.915873 9.602127 8.733822  9.784178
## 2005.378       9.296291 8.953167 9.639415 8.771118  9.821464
## 2005.397       9.268208 8.925083 9.611334 8.743032  9.793385
## 2005.416       9.234527 8.891397 9.577657 8.709345  9.759710
## 2005.435       9.270246 8.927115 9.613376 8.745062  9.795429
## 2005.455       9.381554 9.038427 9.724681 8.856376  9.906731
## 2005.474       9.497887 9.154762 9.841011 8.972713 10.023060
## 2005.493       9.546877 9.203751 9.890002 9.021700 10.072053
## 2005.512       9.524722 9.181592 9.867851 8.999539 10.049904
## 2005.531       9.487091 9.143960 9.830221 8.961908 10.012274
## 2005.550       9.482387 9.139260 9.825513 8.957209 10.007564
## 2005.570       9.509124 9.165999 9.852248 8.983950 10.034297
## 2005.589       9.536544 9.193418 9.879670 9.011367 10.061720
## 2005.608       9.546848 9.203718 9.889978 9.021665 10.072030
## 2005.627       9.541950 9.198820 9.885081 9.016767 10.067133
## 2005.646       9.517407 9.174280 9.860533 8.992229 10.042584
## 2005.665       9.453438 9.110314 9.796562 8.928265  9.978612
## 2005.685       9.344405 9.001279 9.687531 8.819228  9.869581
## 2005.704       9.226892 8.883762 9.570023 8.701709  9.752076
## 2005.723       9.160315 8.817184 9.503446 8.635131  9.685499
## 2005.742       9.174078 8.830951 9.517205 8.648901  9.699255
## 2005.761       9.241612 8.898488 9.584736 8.716439  9.766785
## 2005.780       9.310432 8.967306 9.653559 8.785255  9.835609
## 2005.800       9.348790 9.005658 9.691922 8.823604  9.873976
## 2005.819       9.354217 9.011085 9.697350 8.829031  9.879404
## 2005.838       9.326591 8.983464 9.669718 8.801413  9.851768
## 2005.857       9.258087 8.914964 9.601211 8.732915  9.783260
## 2005.876       9.163285 8.820158 9.506412 8.638107  9.688463
## 2005.895       9.102531 8.759394 9.445669 8.577338  9.627725
## 2005.915       9.142445 8.799307 9.485583 8.617250  9.667640
## 2005.934       9.276104 8.932975 9.619234 8.750923  9.801286
## 2005.953       9.396214 9.053089 9.739340 8.871039  9.921390
## 2005.972       9.374939 9.031803 9.718075 8.849747  9.900131
## 2005.991       9.184286 8.841055 9.527518 8.658948  9.709624

Plot the forecasts along with the actual data for 2005. What do you find?

gas_actual = window(gasoline, start = 2005, end = 2006)
autoplot(gas_actual) + autolayer(fit$mean)

Answer: There is a huge dip in the later half of 2005 that the forecast does not completely catch, but in the beginning of the year, the forecast catches majority of the trend.

Plot the data and comment on its features.

plot(huron)

Answer: Some seasonality changes throughout the years. Looks like the water level has been fairly constant throughout the years.

Fit a linear regression and compare this to a piecewise linear trend model with a knot at 1915.

lm1 = tslm(huron~trend)
summary(lm1)

## 
## Call:
## tslm(formula = huron ~ trend)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50997 -0.72726  0.00083  0.74402  2.53565 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.202037   0.230111  44.335  < 2e-16 ***
## trend       -0.024201   0.004036  -5.996 3.55e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared:  0.2725, Adjusted R-squared:  0.2649 
## F-statistic: 35.95 on 1 and 96 DF,  p-value: 3.545e-08

time = time(huron)
t2 = ts(pmax(0, time-1915), start = 1875)
pw1 = tslm(huron~time+t2)
summary(pw1)

## 
## Call:
## tslm(formula = huron ~ time + t2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.49626 -0.66240 -0.07139  0.85163  2.39222 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 132.90870   19.97687   6.653 1.82e-09 ***
## time         -0.06498    0.01051  -6.181 1.58e-08 ***
## t2            0.06486    0.01563   4.150 7.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.045 on 95 degrees of freedom
## Multiple R-squared:  0.3841, Adjusted R-squared:  0.3711 
## F-statistic: 29.62 on 2 and 95 DF,  p-value: 1.004e-10

Generate forecasts from these two moelds for the period up to 1980 and comment on these.

fit1 = forecast(lm1, newdata = data.frame(time = time))
newdata = cbind(time = time[length(time)] + seq(8), 
                t2 = t2[length(t2)] + seq(8)) %>% as.data.frame()
fit2 = forecast(pw1, newdata = newdata)
fit1

##      Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 1973       7.806127 6.317648 9.294605 5.516501 10.095752
## 1974       7.781926 6.292536 9.271315 5.490899 10.072952
## 1975       7.757724 6.267406 9.248043 5.465269 10.050179
## 1976       7.733523 6.242259 9.224788 5.439613 10.027434
## 1977       7.709322 6.217094 9.201550 5.413929 10.004715
## 1978       7.685121 6.191912 9.178331 5.388219  9.982024
## 1979       7.660920 6.166712 9.155128 5.362481  9.959359
## 1980       7.636719 6.141494 9.131943 5.336717  9.936721
## 1981       7.612518 6.116259 9.108776 5.310925  9.914110
## 1982       7.588317 6.091007 9.085626 5.285107  9.891526
## 1983       7.564116 6.065738 9.062494 5.259263  9.868968
## 1984       7.539914 6.040451 9.039378 5.233391  9.846438
## 1985       7.515713 6.015146 9.016280 5.207493  9.823933
## 1986       7.491512 5.989825 8.993200 5.181569  9.801456
## 1987       7.467311 5.964486 8.970136 5.155618  9.779004
## 1988       7.443110 5.939130 8.947090 5.129640  9.756580
## 1989       7.418909 5.913757 8.924061 5.103637  9.734181
## 1990       7.394708 5.888367 8.901048 5.077606  9.711809
## 1991       7.370507 5.862960 8.878053 5.051550  9.689463
## 1992       7.346306 5.837536 8.855075 5.025468  9.667144
## 1993       7.322104 5.812095 8.832114 4.999359  9.644850
## 1994       7.297903 5.786636 8.809170 4.973224  9.622582
## 1995       7.273702 5.761161 8.786243 4.947064  9.600341
## 1996       7.249501 5.735670 8.763333 4.920877  9.578125
## 1997       7.225300 5.710161 8.740439 4.894665  9.555935
## 1998       7.201099 5.684636 8.717562 4.868427  9.533771
## 1999       7.176898 5.659093 8.694702 4.842163  9.511633
## 2000       7.152697 5.633535 8.671859 4.815873  9.489520
## 2001       7.128496 5.607959 8.649032 4.789558  9.467433
## 2002       7.104294 5.582367 8.626222 4.763217  9.445372
## 2003       7.080093 5.556759 8.603428 4.736851  9.423335
## 2004       7.055892 5.531134 8.580651 4.710460  9.401325
## 2005       7.031691 5.505492 8.557890 4.684043  9.379339
## 2006       7.007490 5.479834 8.535146 4.657601  9.357379
## 2007       6.983289 5.454160 8.512418 4.631134  9.335444
## 2008       6.959088 5.428469 8.489706 4.604642  9.313534
## 2009       6.934887 5.402763 8.467011 4.578125  9.291649
## 2010       6.910686 5.377040 8.444331 4.551582  9.269789
## 2011       6.886484 5.351300 8.421668 4.525015  9.247953
## 2012       6.862283 5.325545 8.399021 4.498424  9.226143
## 2013       6.838082 5.299774 8.376391 4.471807  9.204357
## 2014       6.813881 5.273986 8.353776 4.445166  9.182596
## 2015       6.789680 5.248183 8.331177 4.418500  9.160860
## 2016       6.765479 5.222364 8.308594 4.391810  9.139148
## 2017       6.741278 5.196529 8.286027 4.365096  9.117460
## 2018       6.717077 5.170678 8.263475 4.338357  9.095797
## 2019       6.692876 5.144811 8.240940 4.311594  9.074158
## 2020       6.668674 5.118929 8.218420 4.284806  9.052543
## 2021       6.644473 5.093031 8.195916 4.257995  9.030952
## 2022       6.620272 5.067117 8.173427 4.231159  9.009385
## 2023       6.596071 5.041188 8.150954 4.204300  8.987842
## 2024       6.571870 5.015243 8.128497 4.177417  8.966323
## 2025       6.547669 4.989283 8.106055 4.150510  8.944828
## 2026       6.523468 4.963307 8.083628 4.123579  8.923356
## 2027       6.499267 4.937316 8.061217 4.096625  8.901909
## 2028       6.475066 4.911310 8.038821 4.069647  8.880484
## 2029       6.450864 4.885289 8.016440 4.042646  8.859083
## 2030       6.426663 4.859252 7.994075 4.015621  8.837706
## 2031       6.402462 4.833200 7.971724 3.988573  8.816351
## 2032       6.378261 4.807133 7.949389 3.961502  8.795020
## 2033       6.354060 4.781051 7.927069 3.934408  8.773712
## 2034       6.329859 4.754954 7.904764 3.907290  8.752427
## 2035       6.305658 4.728842 7.882473 3.880150  8.731166
## 2036       6.281457 4.702716 7.860198 3.852987  8.709926
## 2037       6.257256 4.676574 7.837937 3.825801  8.688710
## 2038       6.233054 4.650418 7.815691 3.798592  8.667517
## 2039       6.208853 4.624247 7.793460 3.771361  8.646346
## 2040       6.184652 4.598061 7.771243 3.744107  8.625197
## 2041       6.160451 4.571861 7.749041 3.716831  8.604071
## 2042       6.136250 4.545646 7.726854 3.689532  8.582968
## 2043       6.112049 4.519417 7.704681 3.662211  8.561886
## 2044       6.087848 4.493173 7.682522 3.634868  8.540827
## 2045       6.063647 4.466915 7.660378 3.607503  8.519790
## 2046       6.039446 4.440643 7.638248 3.580116  8.498775
## 2047       6.015244 4.414356 7.616133 3.552707  8.477782
## 2048       5.991043 4.388055 7.594032 3.525276  8.456811
## 2049       5.966842 4.361740 7.571944 3.497823  8.435862
## 2050       5.942641 4.335411 7.549871 3.470348  8.414934
## 2051       5.918440 4.309068 7.527812 3.442852  8.394028
## 2052       5.894239 4.282711 7.505767 3.415335  8.373143
## 2053       5.870038 4.256340 7.483736 3.387796  8.352280
## 2054       5.845837 4.229955 7.461719 3.360235  8.331438
## 2055       5.821636 4.203556 7.439715 3.332654  8.310618
## 2056       5.797434 4.177143 7.417726 3.305051  8.289818
## 2057       5.773233 4.150717 7.395750 3.277427  8.269040
## 2058       5.749032 4.124277 7.373787 3.249782  8.248282
## 2059       5.724831 4.097824 7.351839 3.222116  8.227546
## 2060       5.700630 4.071357 7.329903 3.194430  8.206830
## 2061       5.676429 4.044876 7.307982 3.166722  8.186136
## 2062       5.652228 4.018382 7.286073 3.138994  8.165461
## 2063       5.628027 3.991875 7.264179 3.111246  8.144808
## 2064       5.603826 3.965354 7.242297 3.083477  8.124174
## 2065       5.579624 3.938820 7.220429 3.055687  8.103562
## 2066       5.555423 3.912273 7.198573 3.027878  8.082969
## 2067       5.531222 3.885713 7.176731 3.000048  8.062397
## 2068       5.507021 3.859140 7.154902 2.972198  8.041845
## 2069       5.482820 3.832554 7.133086 2.944328  8.021313
## 2070       5.458619 3.805954 7.111284 2.916437  8.000800

fit2

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1973       8.455119 7.063583 9.846655 6.314483 10.59575
## 1974       8.454992 7.061518 9.848467 6.311374 10.59861
## 1975       8.454866 7.059398 9.850333 6.308182 10.60155
## 1976       8.454739 7.057225 9.852253 6.304906 10.60457
## 1977       8.454612 7.054998 9.854227 6.301549 10.60768
## 1978       8.454486 7.052717 9.856254 6.298109 10.61086
## 1979       8.454359 7.050384 9.858334 6.294587 10.61413
## 1980       8.454232 7.047997 9.860467 6.290984 10.61748

Answer: The linear model forecasts lower water levels than the piecewise function.

Chapter 6 1. Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067. Answer: 1/15(y + 2y + 3y + 3y + 2y + y). 1/15 = 0.067

Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(plastics)

Answer: The plot shows an increasing seasonal trend on the sales over time.

Use a classical muliplicative decompostion to calculate the trend-cycle and seasonal indices.

mult = decompose(plastics, type = "multiplicative")
autoplot(mult)

c) Do the results support the graphical interpretation from part a? Answer: yes, the trend is shoing the increase we discussed and the seasonal chart is showing the seasonality trends.

compute and plot the seasonally adjusted data

adjusted = mult$x/mult$seasonal
autoplot(adjusted)

e) Change one observation to be an outlier, and recompute the seasonally adjusted data. What is the effect of the outlier?

out_plastic = plastics
out_plastic[15] = out_plastic[15]+500
out_mult = decompose(out_plastic, type = "multiplicative")
out_adjusted = out_mult$x/out_mult$seasonal
autoplot(out_adjusted)

Answer: The addition of the outlier to the data is obvious in the plot. It also takes away from the upward trend slightly.

Does it make any difference if the outlier is near the end rather than in the middle of the time series? Answer: Yes, if the outlier was added at the end, it would show a greater increase in the later months of the time series. Whereas with the outlier being in the middle or the beginning, it decreases the amount of increase throughout time.

Recall your retail time series data. Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(seasonal)
retaildata <- readxl::read_excel("C:/Users/Jake/Documents/PredAnalytics/retail.xlsx", skip=1)
retail_ts = ts(retaildata[,"A3349627V"], frequency = 12, start = c(1982,4))
x11_test = seas(retail_ts, x11 = "")
autoplot(x11_test)

Answer: The irregular graph highlights some outliers in the data. Also, we can still see a slight increase in the data and trend.

Write 3-5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making interpretations. Answer: In the data and the trend graphs, we can see a clear positive trend. The remainder graph shows a few outliers that can cause the trend to dip a few times. Whereas the seasonal graph shows seasonality in the data, but it stays fairly constant throughout the years.
Is the recession of 1991/1992 visible in the estimated components? Answer: Yes, you can see a clear dip in the data, trend, and remainder graphs.

Plot the data using autoplot(), ggsubseriesplot(), and ggseasonplot() to look the the effect of the changing seasonality over time. What do you think is cause it to change so much?

autoplot(cangas)

ggsubseriesplot(cangas)

ggseasonplot(cangas)

Answer: Over the years, we can see a growing increase in the production of gas, which tells me there must be a demend increase. We can also see there is less of a demand for gas in the summer months, using the same logic. I find this interesting, because the summer is when people travel more, so I would think there would be a higher demand for gas. Whereas, in the winder months, we see an increase of gas production so that makes me think that people in Canada are using gas to stay warm during these times.

Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.

cangas_stl = stl(cangas, s.window = 15)
autoplot(cangas_stl)

c) Compare the results with those obtained using SEATS and X11. How are they different?

cangas_ts = ts(cangas, frequency = 12, start = c(1960,45))

x11_cangas = seas(cangas_ts, x11 = "")

autoplot(x11_cangas)

Answer: With STL, we can see clear seasonality in the 1980s, where with the X11 and SEATS techniques, we see more movement in the seasonality in the 1960s.

Use an STL decomposition to calculate the trend-cycle and seasonal indicies.

stl1 = stl(bricksq, s.window = 35, t.window = 15)
autoplot(stl1)

stl2 = stl(bricksq, s.window = "periodic", t.window = 15)
autoplot(stl2)

b) Compute and plot the seasonally adjusted data

season = seasadj(stl2)
autoplot(season)

c) Use a naive method to produce forecasts of the seasonally adjusted data.

autoplot(naive(season))

d) Use stlf() to reasonalise the results, giving forecast for the orginal data.

stlf_brick = stlf(bricksq, method = "naive")
forecast_stlf = forecast(stlf_brick)
forecast_stlf

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       467.0649 441.7635 492.3662 428.3698 505.7599
## 1995 Q1       426.0748 390.2933 461.8563 371.3517 480.7979
## 1995 Q2       484.0357 440.2125 527.8590 417.0139 551.0576
## 1995 Q3       494.0000 443.3973 544.6027 416.6098 571.3902
## 1995 Q4       467.0649 410.4893 523.6404 380.5400 553.5897
## 1996 Q1       426.0748 364.0994 488.0502 331.2916 520.8580
## 1996 Q2       484.0357 417.0946 550.9768 381.6582 586.4133
## 1996 Q3       494.0000 422.4370 565.5630 384.5538 603.4462

Do the residuals look uncorrelated?

checkresiduals(forecast_stlf$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

Answer: Some correlation issues seen in the ACF chart at Lag=5 & 8.

Repeat with a robust STL decompostion. Does it make much difference?

stl_robust = stl(bricksq, t.window = 15, s.window = "periodic", robust = T)
stl_robust1 = seasadj(stl_robust)
stl_robust2 = naive(stl_robust1)
forecast_robust = forecast(stl_robust2)
forecast_robust

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4        471.673 442.1422 501.2037 426.5095 516.8364
## 1995 Q1        471.673 429.9101 513.4358 407.8022 535.5437
## 1995 Q2        471.673 420.5242 522.8218 393.4476 549.8983
## 1995 Q3        471.673 412.6114 530.7345 381.3461 561.9998
## 1995 Q4        471.673 405.6401 537.7058 370.6845 572.6614
## 1996 Q1        471.673 399.3376 544.0083 361.0456 582.3003
## 1996 Q2        471.673 393.5419 549.8040 352.1818 591.1641
## 1996 Q3        471.673 388.1473 555.1986 343.9315 599.4144
## 1996 Q4        471.673 383.0806 560.2653 336.1827 607.1632
## 1997 Q1        471.673 378.2885 565.0575 328.8537 614.4922

checkresiduals(forecast_robust$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

Answer: Not much of a difference sighted when performing robust forecast.

Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?

train = window(bricksq, end = c(1992,3))
test = window(bricksq, start = c(1992,4), end = c(1994,3))
train_naive = snaive(train)
train_stlf = stlf(train)
train_fc_naive = forecast(train_naive, h=8)
train_fc_stlf = forecast(train_stlf, h = 8)
train_fc_naive

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1992 Q4            430 366.2905 493.7095 332.5647 527.4353
## 1993 Q1            387 323.2905 450.7095 289.5647 484.4353
## 1993 Q2            413 349.2905 476.7095 315.5647 510.4353
## 1993 Q3            451 387.2905 514.7095 353.5647 548.4353
## 1993 Q4            430 339.9011 520.0989 292.2056 567.7944
## 1994 Q1            387 296.9011 477.0989 249.2056 524.7944
## 1994 Q2            413 322.9011 503.0989 275.2056 550.7944
## 1994 Q3            451 360.9011 541.0989 313.2056 588.7944

train_fc_stlf

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1992 Q4       424.7340 398.8399 450.6280 385.1325 464.3354
## 1993 Q1       380.2198 343.5813 416.8584 324.1860 436.2537
## 1993 Q2       441.8892 396.9916 486.7867 373.2243 510.5540
## 1993 Q3       450.9971 399.1249 502.8693 371.6655 530.3287
## 1993 Q4       424.7340 366.7065 482.7614 335.9887 513.4792
## 1994 Q1       380.2198 316.6182 443.8215 282.9495 477.4902
## 1994 Q2       441.8892 373.1527 510.6256 336.7658 547.0125
## 1994 Q3       450.9971 377.4732 524.5210 338.5520 563.4422

test

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1992                 420
## 1993  394  462  476  443
## 1994  421  472  494

Answer: When compring to the test data, the STLF technique produces better results.

Use stlf() to produce forecasts of the writing series either method = “naive” or method = “rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(writing)

We see a lot of seasonality here with a slight increase in the trend. The best method here is naive.

writing_stlf = stlf(writing, s.window = "periodic", robust = T, method = "naive")
autoplot(writing_stlf)

8. Do the same thing for the fancy dataset.

autoplot(fancy)

There seems to be increasing seasonality, we are going to try a Box-Cox transformation and method“rwdrift”

fancy_stlf = stlf(fancy, s.window = "periodic", lambda = BoxCox.lambda(fancy),
                  method = "rwdrift", robust = T)

autoplot(fancy_stlf)