Homework Hyndmna Chapter 5 & 6 Exercises

library(fpp2)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## ── Attaching packages ────────────────────────────────────────────── fpp2 2.4 ──

## ✓ ggplot2   3.3.3     ✓ fma       2.4  
## ✓ forecast  8.14      ✓ expsmooth 2.3

##

library(ggplot2)
library(psych)

## 
## Attaching package: 'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

Chapter 5

1. Daily electricity demand for Victoria, Australia, during 2014 is contained in elecdaily. The data for the first 20 days can be obtained as follows.

daily20 = head(elecdaily,20)

A. Plot the data and find the regression model for Demand with temperature as an explanatory variable. Why is there a positive relationship?

autoplot(daily20, main="Daily Electricity Demand for Victoria, Australia during 2014")

lm(formula = Demand~Temperature, data=elecdaily)

## 
## Call:
## lm(formula = Demand ~ Temperature, data = elecdaily)
## 
## Coefficients:
## (Intercept)  Temperature  
##    212.3856       0.4182

summary(lm(formula = Demand~Temperature, data=elecdaily))

## 
## Call:
## lm(formula = Demand ~ Temperature, data = elecdaily)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -55.474 -15.719  -0.336  15.767 117.184 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 212.3856     5.0080  42.409   <2e-16 ***
## Temperature   0.4182     0.2263   1.848   0.0654 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.55 on 363 degrees of freedom
## Multiple R-squared:  0.009322,   Adjusted R-squared:  0.006592 
## F-statistic: 3.416 on 1 and 363 DF,  p-value: 0.0654

There is a slight postive relatiosnhip between temperature and demand as indicated by the 0.4182 temperature coefficient. Temperature only exhibits a p-value of 0.0654, suggesting low correlation between temperature and demand.

B. Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?

plot(lm(Demand~Temperature, data=elecdaily))

Observations 15, 16, and 17 lie well outside of Cook’s distance. The model appears to perform adequetly, extreme outliers should be investigated further.

C. Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was 15∘ and compare it with the forecast if the with maximum temperature was 35∘. Do you believe these forecasts?

Demand_TSLM = tslm(Demand ~ Temperature, data=daily20)
Temp_Range = data.frame(Temperature=c(15,35))
Forecast_Demand = forecast(Demand_TSLM, Temp_Range)
Forecast_Demand

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 4.285714       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.428571       275.7146 245.2278 306.2014 227.57056 323.8586

D. Give prediction intervals for your forecasts. The following R code will get you started:

  autoplot(daily20, facets=TRUE)

daily20 %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

fit <- tslm(Demand ~ Temperature, data=daily20)
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774

forecast(fit, newdata=data.frame(Temperature=c(15,35)))

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 4.285714       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.428571       275.7146 245.2278 306.2014 227.57056 323.8586

E. Plot Demand vs Temperature for all of the available data in elecdaily. What does this say about your model?

  plot(Demand ~ Temperature, data=elecdaily, main="Demand vs. Temperature of Elecdaily dataset")

The Demand vs Temperature for all of the available data in elecdaily plot looks similar to the plotted model results, including the observed outliers. This results suggests that themodel performed adequetly.

2. Data set mens400 contains the winning times (in seconds) for the men’s 400 meters final in each Olympic Games from 1896 to 2016.

A. Plot the winning time against the year. Describe the main features of the plot.

autoplot(mens400, main="Winning times for the Men's 400 meter Olympic Games Final", ylab="Time (Seconds)", xlab="Year")

Overtime, the winning time for the Men’s 400 meter has decreased. There are gaps in the plot than can be atrributed to the Olympic Games not being held during the first and second World Wars.

B. Fit a regression line to the data. Obviously the winning times have been decreasing, but at what average rate per year?

Time_400 = time(mens400)
tslm(mens400 ~ Time_400, data=mens400)

## 
## Call:
## tslm(formula = mens400 ~ Time_400, data = mens400)
## 
## Coefficients:
## (Intercept)     Time_400  
##   172.48148     -0.06457

#Winning time decreases by roughly 0.06457 seconds per year

autoplot(mens400, main="Winning times for the Men's 400 meter Olympic Games Final", ylab="Time (Seconds)", xlab="Year") + geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (stat_smooth).

C. Plot the residuals against the year. What does this indicate about the suitability of the fitted line?

Mens_400_Residuals = lm(mens400 ~ Time_400, data=mens400)
plot(Mens_400_Residuals)

The Residuals vs Fitted graph displays a mild negative slope, which aligns with the decrease in winning Men’s 400 meter times over time. The fitted line generally predicts that the winning time at the next Olympics will decrease, but the line will not accuractely predict the exact time.

D. Predict the winning time for the men’s 400 meters final in the 2020 Olympics. Give a prediction interval for your forecasts. What assumptions have you made in these calculations?

Time_Mens_400 = time(mens400)
Mens_400_Model = lm(mens400 ~ Time_Mens_400, data=mens400, na.action=na.exclude) ## Removing the 3 NA values from the Regression

Mens_400_2020 = forecast(Mens_400_Model, newdata=data.frame(Time_Mens_400=2020))
Mens_400_2020

##   Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1       42.04231 40.44975 43.63487 39.55286 44.53176

3. Type easter(ausbeer) and interpret what you see.

easter(ausbeer)

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00

?ausbeer

ausbeer is a dataset shows the total quarterly beer production in Austrailia from Q1 1956 to Q2 2010.

4. An elasticity coefficient is the ratio of the percentage change in the forecast variable (y) to the percentage change in the predictor variable (x). Express y as a function of x and show that the coefficient β 1 is the elasticity coefficient.

log(y) = β0 + β1log(x) + e β1log(x) = log(y) - β0 - e

β1 = (log(y) - β0 - e) / log(x)

The coefficient β1 is the elasticity coefficient becuase it shows the ratio of the percentage change in the forecast variable (y) to the change in the predictor variable (X).

For each change in the predictor (x), there is a β1 ratio percentage change in the forecast variable (y).

5. The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

A. Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.

autoplot(fancy, main="Sales Volume over Time in Queensland, Australia", ylab="Sales Volume")

seasonplot(fancy, main="Seasonal Sales Volume over Time in Queensland, Australia", ylab="Sales Volume")

The time plot reveals a seasonal trend in monthly sales figures, with sales spiking around the holiday season. Long-term, the plot reveals a general increase in sales volume year-over-year.

B. Explain why it is necessary to take logarithms of these data before fitting a model. If the variance of the residuals is not constant, then logarithms of these data may be required before fitting a model.

C. Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.

time = time(fancy)

surfingFestival = c()
for(i in 1:length(time)){
  month = round(12*(time[i] - floor(time[i]))) + 1
  year = floor(time[i])
  if(year >= 1988 & month == 3) ## started March, 1998
  {
    surfingFestival[i] = 1 ## Binary - 1 for if surfing festival is in town
  } else {
    surfingFestival[i] = 0 ## 0 if the surfing festival is not in town
  }
}

BoxCox_Transformation = (BoxCox(fancy, 0))
Surfing_Fesitval = tslm(BoxCox_Transformation ~ trend + season + surfingFestival)
summary(Surfing_Fesitval)

## 
## Call:
## tslm(formula = BoxCox_Transformation ~ trend + season + surfingFestival)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33673 -0.12757  0.00257  0.10911  0.37671 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     7.6196670  0.0742471 102.626  < 2e-16 ***
## trend           0.0220198  0.0008268  26.634  < 2e-16 ***
## season2         0.2514168  0.0956790   2.628 0.010555 *  
## season3         0.2660828  0.1934044   1.376 0.173275    
## season4         0.3840535  0.0957075   4.013 0.000148 ***
## season5         0.4094870  0.0957325   4.277 5.88e-05 ***
## season6         0.4488283  0.0957647   4.687 1.33e-05 ***
## season7         0.6104545  0.0958039   6.372 1.71e-08 ***
## season8         0.5879644  0.0958503   6.134 4.53e-08 ***
## season9         0.6693299  0.0959037   6.979 1.36e-09 ***
## season10        0.7473919  0.0959643   7.788 4.48e-11 ***
## season11        1.2067479  0.0960319  12.566  < 2e-16 ***
## season12        1.9622412  0.0961066  20.417  < 2e-16 ***
## surfingFestival 0.5015151  0.1964273   2.553 0.012856 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared:  0.9567, Adjusted R-squared:  0.9487 
## F-statistic:   119 on 13 and 70 DF,  p-value: < 2.2e-16

D. Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?

autoplot(Surfing_Fesitval$residuals, main="Residuals Plot of Surfing_Fesitval Resgression", ylab="Residuals")

# A seasonal pettern exists even after the data is tranformed.

E. Do boxplots of the residuals for each month. Does this reveal any problems with the model?

boxplot(Surfing_Fesitval$residuals, main="Surfing_Fesitval Regression Residuals")

F. What do the values of the coefficients tell you about each variable?

Surfing_Fesitval$coefficients

##     (Intercept)           trend         season2         season3         season4 
##      7.61966701      0.02201983      0.25141682      0.26608280      0.38405351 
##         season5         season6         season7         season8         season9 
##      0.40948697      0.44882828      0.61045453      0.58796443      0.66932985 
##        season10        season11        season12 surfingFestival 
##      0.74739195      1.20674790      1.96224123      0.50151509

#The coefficients demonstrate that sales increase throughout the year and the Surfing Festival postively impacts sales (0.50151509)

G. What does the Breusch-Godfrey test tell you about your model?

library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

bgtest(Surfing_Fesitval)

## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  Surfing_Fesitval
## LM test = 25.031, df = 1, p-value = 5.642e-07

The Breusch-Godfrey test indicates that there is serial correlation in our regression model.

H. Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.

Time_Surfing_Fesitval = ts(data=time, start=1994, end=c(1996,12), frequency=12)
forecast(Surfing_Fesitval, Time_Surfing_Fesitval)

## Warning in forecast.lm(Surfing_Fesitval, Time_Surfing_Fesitval): newdata column
## names not specified, defaulting to first variable required.

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1994       1006.002 501.0539 1510.950 227.5849 1784.419
## Feb 1994       1006.317 501.3479 1511.286 227.8675 1784.767
## Mar 1994       1006.396 501.6231 1511.168 228.2491 1784.542
## Apr 1994       1006.577 501.5658 1511.589 228.0624 1785.092
## May 1994       1006.667 501.6339 1511.699 228.1190 1785.214
## Jun 1994       1006.770 501.7159 1511.824 228.1896 1785.350
## Jul 1994       1006.995 501.9201 1512.070 228.3823 1785.608
## Aug 1994       1007.036 501.9403 1512.133 228.3910 1785.682
## Sep 1994       1007.182 502.0643 1512.299 228.5036 1785.860
## Oct 1994       1007.324 502.1850 1512.462 228.6128 1786.034
## Nov 1994       1007.847 502.6870 1513.006 229.1033 1786.590
## Dec 1994       1008.666 503.4851 1513.847 229.8900 1787.442
## Jan 1995       1006.768 501.5678 1511.967 227.9624 1785.573
## Feb 1995       1007.083 501.8618 1512.304 228.2450 1785.921
## Mar 1995       1007.161 502.1369 1512.186 228.6266 1785.696
## Apr 1995       1007.343 502.0797 1512.606 228.4399 1786.246
## May 1995       1007.432 502.1478 1512.717 228.4965 1786.368
## Jun 1995       1007.535 502.2298 1512.841 228.5670 1786.504
## Jul 1995       1007.761 502.4340 1513.088 228.7598 1786.762
## Aug 1995       1007.802 502.4542 1513.150 228.7685 1786.836
## Sep 1995       1007.947 502.5782 1513.317 228.8810 1787.014
## Oct 1995       1008.089 502.6989 1513.480 228.9903 1787.188
## Nov 1995       1008.612 503.2009 1514.024 229.4808 1787.744
## Dec 1995       1009.432 503.9990 1514.865 230.2674 1788.596
## Jan 1996       1007.533 502.0817 1512.985 228.3399 1786.727
## Feb 1996       1007.849 502.3757 1513.321 228.6225 1787.075
## Mar 1996       1007.927 502.6508 1513.203 229.0041 1786.850
## Apr 1996       1008.109 502.5936 1513.624 228.8174 1787.400
## May 1996       1008.198 502.6617 1513.734 228.8740 1787.522
## Jun 1996       1008.301 502.7437 1513.859 228.9445 1787.658
## Jul 1996       1008.527 502.9479 1514.105 229.1373 1787.916
## Aug 1996       1008.568 502.9681 1514.168 229.1460 1787.990
## Sep 1996       1008.713 503.0921 1514.334 229.2585 1788.168
## Oct 1996       1008.855 503.2128 1514.497 229.3678 1788.342
## Nov 1996       1009.378 503.7148 1515.042 229.8583 1788.898
## Dec 1996       1010.198 504.5129 1515.882 230.6449 1789.750

I. Transform your predictions and intervals to obtain predictions and intervals for the raw data.

BoxCox_Transformation = (BoxCox(fancy, 0))
forecast(BoxCox_Transformation)

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 1994       9.674641  9.464046  9.885235  9.352565  9.996717
## Feb 1994       9.958925  9.733371 10.184479  9.613970 10.303880
## Mar 1994      10.439733 10.200143 10.679323 10.073312 10.806155
## Apr 1994      10.121042  9.868185 10.373898  9.734331 10.507752
## May 1994      10.157291  9.891822 10.422759  9.751292 10.563289
## Jun 1994      10.227196  9.949682 10.504711  9.802774 10.651618
## Jul 1994      10.403289 10.114223 10.692356  9.961200 10.845378
## Aug 1994      10.382661 10.082480 10.682842  9.923574 10.841748
## Sep 1994      10.518849 10.207944 10.829754 10.043361 10.994337
## Oct 1994      10.612257 10.290980 10.933534 10.120906 11.103608
## Nov 1994      11.108401 10.777070 11.439732 10.601674 11.615129
## Dec 1994      11.898748 11.557653 12.239843 11.377088 12.420408
## Jan 1995       9.985303  9.634703 10.335903  9.449107 10.521499
## Feb 1995      10.269587  9.909735 10.629440  9.719240 10.819934
## Mar 1995      10.750396 10.381517 11.119274 10.186244 11.314547
## Apr 1995      10.431704 10.054009 10.809399  9.854070 11.009338
## May 1995      10.467953 10.081638 10.854268  9.877135 11.058770
## Jun 1995      10.537858 10.143107 10.932610  9.934137 11.141579
## Jul 1995      10.713952 10.310934 11.116969 10.097590 11.330314
## Aug 1995      10.693323 10.282201 11.104445 10.064567 11.322080
## Sep 1995      10.829511 10.410437 11.248586 10.188592 11.470430
## Oct 1995      10.922919 10.496035 11.349803 10.270057 11.575781
## Nov 1995      11.419063 10.984506 11.853621 10.754465 12.083661
## Dec 1995      12.209410 11.767308 12.651512 11.533273 12.885547

J. How could you improve these predictions by modifying the model? The predictions could have been imporved by further normalizing the model through other transformation methods.

6.The gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “million barrels per day.” Consider only the data to the end of 2004.

A. Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see,

Gas_2004 = window(gasoline, end=2005)

fourier_gas = tslm(Gas_2004 ~ trend + fourier(Gas_2004, K=12))
autoplot(Gas_2004, series="Data") +
  autolayer(fitted(fourier_gas), series="Fitted") +
  xlab("Year") + ylab("Gas Supply") +
  ggtitle("Gas Supply (K=12)")

B. Select the appropriate number of Fourier terms to include by minimising the AICc or CV value.

for(j in 1:20){
  fit <- tslm(Gas_2004~trend + fourier(Gas_2004, K = j))
  if (j==1){
    min.cv<-CV(fit)["CV"]
    min.k<-j
  }else{
    if (min.cv>CV(fit)["CV"]){
      min.cv<-CV(fit)["CV"]
      min.k<-j
    }
  }
}
print(min.cv)

##         CV 
## 0.07096945

print(min.k)

## [1] 12

C. Check the residuals of the final model using the checkresiduals() function. Even though the residuals fail the correlation tests, the results are probably not severe enough to make much difference to the forecasts and prediction intervals. (Note that the correlations are relatively small, even though they are significant.)

checkresiduals(tslm(Gas_2004~trend + fourier(Gas_2004, K = min.k)))

## 
##  Breusch-Godfrey test for serial correlation of order up to 104
## 
## data:  Residuals from Linear regression model
## LM test = 154.2, df = 104, p-value = 0.001021

D.To forecast using harmonic regression, you will need to generate the future values of the Fourier terms. This can be done as follows.

#where fit is the fitted model using tslm(), K is the number of Fourier terms used in creating fit, and h is the forecast horizon required.
#Forecast the next year of data.
fit = tslm(Gas_2004~trend + fourier(Gas_2004, K = min.k))
fc = forecast(fit, newdata=data.frame(fourier(Gas_2004,min.k,52)))

fc

##          Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2005.014       8.713386 8.371087 9.055684 8.189474  9.237298
## 2005.033       8.642131 8.299720 8.984542 8.118047  9.166215
## 2005.052       8.656942 8.314540 8.999345 8.132871  9.181013
## 2005.071       8.661292 8.318899 9.003684 8.137236  9.185347
## 2005.090       8.686370 8.344109 9.028631 8.162515  9.210225
## 2005.110       8.784129 8.441982 9.126276 8.260448  9.307809
## 2005.129       8.895967 8.553823 9.238111 8.372291  9.419643
## 2005.148       8.937884 8.595756 9.280012 8.414232  9.461535
## 2005.167       8.940875 8.598761 9.282988 8.417246  9.464503
## 2005.186       8.988596 8.646477 9.330716 8.464958  9.512235
## 2005.205       9.062316 8.720191 9.404442 8.538669  9.585963
## 2005.225       9.073794 8.731673 9.415915 8.550153  9.597435
## 2005.244       9.031913 8.689800 9.374027 8.508284  9.555542
## 2005.263       9.041458 8.699342 9.383575 8.517824  9.565092
## 2005.282       9.122877 8.780756 9.464999 8.599236  9.646519
## 2005.301       9.169383 8.827263 9.511503 8.645744  9.693022
## 2005.320       9.132601 8.790487 9.474715 8.608970  9.656231
## 2005.340       9.120782 8.778667 9.462897 8.597150  9.644414
## 2005.359       9.221551 8.879431 9.563671 8.697912  9.745190
## 2005.378       9.335425 8.993306 9.677545 8.811787  9.859064
## 2005.397       9.316637 8.974522 9.658752 8.793006  9.840269
## 2005.416       9.214109 8.871994 9.556224 8.690478  9.737740
## 2005.435       9.218957 8.876838 9.561076 8.695320  9.742595
## 2005.455       9.383861 9.041741 9.725980 8.860222  9.907499
## 2005.474       9.545162 9.203046 9.887278 9.021530 10.068795
## 2005.493       9.559667 9.217553 9.901782 9.036037 10.083298
## 2005.512       9.487053 9.144935 9.829171 8.963416 10.010689
## 2005.531       9.464953 9.122834 9.807073 8.941315  9.988592
## 2005.550       9.508378 9.166261 9.850494 8.984744 10.032011
## 2005.570       9.534922 9.192808 9.877037 9.011292 10.058553
## 2005.589       9.521986 9.179869 9.864104 8.998351 10.045621
## 2005.608       9.522899 9.180779 9.865019 8.999260 10.046538
## 2005.627       9.548443 9.206326 9.890560 9.024808 10.072078
## 2005.646       9.536705 9.194591 9.878819 9.013075 10.060335
## 2005.665       9.451014 9.108897 9.793131 8.927380  9.974649
## 2005.685       9.330630 8.988509 9.672750 8.806990  9.854269
## 2005.704       9.229844 8.887726 9.571962 8.706208  9.753480
## 2005.723       9.170862 8.828748 9.512976 8.647232  9.694492
## 2005.742       9.168084 8.825968 9.510200 8.644451  9.691717
## 2005.761       9.230990 8.888869 9.573110 8.707349  9.754630
## 2005.780       9.320372 8.978252 9.662492 8.796734  9.844010
## 2005.800       9.363916 9.021801 9.706030 8.840285  9.887546
## 2005.819       9.342664 9.000548 9.684779 8.819032  9.866296
## 2005.838       9.304159 8.962036 9.646281 8.780516  9.827801
## 2005.857       9.267511 8.925389 9.609633 8.743869  9.791153
## 2005.876       9.194408 8.852292 9.536523 8.670776  9.718040
## 2005.895       9.100701 8.758587 9.442816 8.577070  9.624332
## 2005.915       9.104667 8.762540 9.446794 8.581017  9.628317
## 2005.934       9.265935 8.923806 9.608064 8.742282  9.789587
## 2005.953       9.436660 9.094538 9.778782 8.913018  9.960302
## 2005.972       9.400410 9.058293 9.742527 8.876776  9.924044
## 2005.991       9.147441 8.805233 9.489649 8.623668  9.671215

E. Plot the forecasts along with the actual data for 2005. What do you find?

autoplot(Gas_2004, series="Data") +
  autolayer(fc, series="Fitted") +
  xlab("Year") + ylab("Gas Supply") +
  ggtitle("Gas Supply (K=12)") +
  scale_x_continuous(limits = c(2000, 2006))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Season trends are still relevant in the prediction - supply decreasing in the winter and increasing in the summer.

7. Data set huron gives the water level of Lake Huron in feet from 1875 to 1972.

A. Plot the data and comment on its features.

autoplot(huron)

The plot reveals some nonlinearity in the data. Between 1880 and 1920, there appears to be a general decline in water levels with some extreme shifts. After 1920, water levels appear neither to increase nor decrease regularly - large extremes are common.

B. Fit a linear regression and compare this to a piecewise linear trend model with a knot at 1915.

lm.huron = tslm(huron~trend)
#piecewise
t = time(huron)
t.break = 1915
t_piece = ts(pmax(0,t-t.break), start=1875)

fit.pw = tslm(huron ~ t + t_piece)

autoplot(huron) +
  autolayer(fitted(lm.huron), series="Linear") +
  autolayer(fitted(fit.pw), series="Piecewise")+
  xlab("Year") + ylab("Lake Level") +
  ggtitle("Lake Huron") +
  guides(colour = guide_legend(title = " "))

summary(lm.huron)

## 
## Call:
## tslm(formula = huron ~ trend)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50997 -0.72726  0.00083  0.74402  2.53565 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.202037   0.230111  44.335  < 2e-16 ***
## trend       -0.024201   0.004036  -5.996 3.55e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared:  0.2725, Adjusted R-squared:  0.2649 
## F-statistic: 35.95 on 1 and 96 DF,  p-value: 3.545e-08

summary(fit.pw)

## 
## Call:
## tslm(formula = huron ~ t + t_piece)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.49626 -0.66240 -0.07139  0.85163  2.39222 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 132.90870   19.97687   6.653 1.82e-09 ***
## t            -0.06498    0.01051  -6.181 1.58e-08 ***
## t_piece       0.06486    0.01563   4.150 7.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.045 on 95 degrees of freedom
## Multiple R-squared:  0.3841, Adjusted R-squared:  0.3711 
## F-statistic: 29.62 on 2 and 95 DF,  p-value: 1.004e-10

C. Generate forecasts from these two models for the period up to 1980 and comment on these.

h = 8
fcasts.lin = forecast(lm.huron, h = h)

t.new = t[length(t)] + seq(h)
tb1.new = t_piece[length(t_piece)] + seq(h)

newdata = cbind(t=t.new, t_piece=tb1.new) %>%
  as.data.frame()
fcasts.pw = forecast(fit.pw, newdata = newdata)

autoplot(huron) +
  autolayer(fitted(lm.huron), series="Linear") +
  autolayer(fcasts.lin, series="Linear") +
  xlab("Year") + ylab("Lake Level") +
  ggtitle("Lake Huron") +
  guides(colour = guide_legend(title = " "))

autoplot(huron) +
  autolayer(fitted(fit.pw), series="Piecewise")+
  autolayer(fcasts.pw, series="Piecewise") +
  xlab("Year") + ylab("Lake Level") +
  ggtitle("Lake Huron") +
  guides(colour = guide_legend(title = " "))

The linear trend suggests that the water level at Lake Heron will continue to drop. The plot suggests the water level should continue to decline, but we understand that lake is not going to completely dry up. The piecewise trend highlights that the water level will flatten after the knot at 1915. Piecewise is more applicable in this instance.

Chapter 6

1. Show that a 3 × 5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.

#A 3x5 moving average is a centered moving average and is symmetric #3x5 MA = [((Y1 + Y2 + Y3 + Y4 + Y5)/5) + ((Y2 + Y3 + Y4 + Y5 + Y6)/5) + ((Y3 + Y4 + Y5 + Y6 + Y7)/5)] / 3

#(1/15)Y1 + (2/15)Y2 + (1/5)Y3 + (1/5)Y4 + (1/5)Y5 + (2/15)Y6 + (1/15)Y7 #0.067 Y1 + 0.133 Y2 + 0.2 Y3 + 0.2 y4 + 0.2 Y5 + 0.133 Y6 + 0.067 Y7

2. The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

A. Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(plastics)

#The plot demosntrates both seasonal fluctuations and an upward trend-cycle.

B. Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

plastics %>% decompose(type="multiplicative") %>%
  autoplot() + xlab("Year") +
  ggtitle("Classical multiplicative decomposition
    of plastics")

C. Do the results support the graphical interpretation from part a? Yes, the upward trend in the graoh supports the graphical interpretation from part A.

D.Compute and plot the seasonally adjusted data.

library(seasonal)

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:psych':
## 
##     outlier

fit = decompose(plastics, type="multiplicative")

autoplot(plastics, series="Data") +
  autolayer(seasadj(fit), series="Seasonally Adjusted") +
  xlab("Year") + ylab("Monthly Sales") +
  ggtitle("Plastic Sales") +
  scale_colour_manual(values=c("gray","blue"),
             breaks=c("Data","Seasonally Adjusted","Trend"))

E.Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

plastics.out<-plastics
plastics.out[12]<-plastics.out[12]+500

fit = decompose(plastics.out, type="multiplicative")

autoplot(plastics.out, series="Data") +
  autolayer(seasadj(fit), series="Seasonally Adjusted") +
  xlab("Year") + ylab("Monthly Sales") +
  ggtitle("Plastic Sales") +
  scale_colour_manual(values=c("gray","blue"),
             breaks=c("Data","Seasonally Adjusted","Trend"))

The outlier creates more extremes in the data range, because it is outside the season trend.

F.Does it make any difference if the outlier is near the end rather than in the middle of the time series?

plastics.out<-plastics
plastics.out[50]<-plastics.out[50]+500

fit<-decompose(plastics.out, type="multiplicative")

autoplot(plastics.out, series="Data") +
  autolayer(seasadj(fit), series="Seasonally Adjusted") +
  xlab("Year") + ylab("Monthly Sales") +
  ggtitle("Plastic Sales") +
  scale_colour_manual(values=c("gray","blue"),
             breaks=c("Data","Seasonally Adjusted","Trend"))

The outlier appears to create create spikes near its relative position, but overall does not effect the trend.

3. Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(seasonal)
Retail = read.csv("/Users/austin/Desktop/Graduate\ School\ /Boston\ College/Spring\ Semster\ 2021/Predictive\ Analytics/Homework/tute1.csv")

Retail_Time_Series = ts(Retail$Sales, frequency = 12, start = c(1982, 4))
autoplot(Retail_Time_Series)

Retail_Time_Series %>% seas(x11="") -> fit
autoplot(fit) +
  ggtitle("X11 decomposition of retail")

4. Figures 6.16 and 6.17 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

A. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation. The decomposition shows an upward trend in the number of persons in the civilian labour force in Australia. The laborforce also exhibits a strogn seasonal feature highlited in Figures 6.16. Around 1991, there is a large negative remainder, which suggests a large dcrease in the laborforce. In Figure 6.17, strong hiring trends are highlighted in December, likely during the holiday season, coupled with layoff trends in August, end of seasonal work.

#B. Is the recession of 1991/1992 visible in the estimated components? Yes, the recession is clearly visable in the remainder chart in Figure 6.16. In addtion, the trendline shows a slight dip, suggesting a decrease in labor.

5. This exercise uses the cangas data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).

A. Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?

autoplot(cangas)

ggsubseriesplot(cangas)

ggseasonplot(cangas)

Monthly Canadian gas production increases year-over-year. This trend is likely an increase in population and, correspondingly, an increase in demand.

B. Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.

cangas %>%
  mstl(robust=TRUE) %>%
  autoplot()

C. Compare the results with those obtained using SEATS and X11. How are they different?

#SEATS
cangas %>% seas() %>%
autoplot() +
  ggtitle("SEATS decomposition of cangas")

#X-11
cangas %>% seas(x11="") -> fit
autoplot(fit) +
  ggtitle("X11 decomposition of cangas")

The SEATS and X11 decomposiiton charts both highlight a wider range of seasonaility early in the time series. The X11 decomposition exhibits more remainders closer to 1 with more extremes than SEATS.

6. We will use the bricksq data (Australian quarterly clay brick production. 1956–1994) for this exercise.

A. Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)

bricksq %>%
  stl(t.window=30, s.window="periodic", robust=TRUE) %>%
  autoplot()

bricksq %>%
  stl(t.window=30, s.window=4, robust=TRUE) %>%
  autoplot()

bricksq %>%
  stl(t.window=30, s.window=20, robust=TRUE) %>%
  autoplot()

B. Compute and plot the seasonally adjusted data.

bricksq %>%
  stl(t.window=30, s.window="periodic", robust=FALSE) %>% seasadj() %>%
  autoplot()

C. Use a naïve method to produce forecasts of the seasonally adjusted data.

fit = stl(bricksq, t.window=30, s.window="periodic", robust=FALSE)
fit %>% seasadj() %>% naive() %>%
  autoplot() + ylab("Production") +
  ggtitle("Naive Forecasts of Seasonally Adjusted Data")

D. Use stlf() to reseasonalise the results, giving forecasts for the original data.

ft_stlf = stlf(bricksq, method='naive')
autoplot(forecast(ft_stlf,h=8)) + ylab("Production") +
  ggtitle("Naive forecasts of seasonally adjusted data")

E. Do the residuals look uncorrelated?

checkresiduals(forecast(ft_stlf,h=8))

## Warning in checkresiduals(forecast(ft_stlf, h = 8)): The fitted degrees of
## freedom is based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  Random walk
## Q* = 40.829, df = 8, p-value = 2.244e-06
## 
## Model df: 0.   Total lags used: 8

The residual plot reveals some correlation, but not a significant amount.

F. Repeat with a robust STL decomposition. Does it make much difference?

bricksq %>%
  stl(t.window=30, s.window="periodic", robust=FALSE) %>% seasadj() %>%
  autoplot()

bricksq %>%
  stl(t.window=30, s.window="periodic", robust=TRUE) %>% seasadj() %>%
  autoplot()

A robust STL decomposition does not make much of a difference.

G. Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?

bricksq.1992 = window(bricksq, end=1992)
fcast.stlf = stlf(bricksq.1992, method='naive')
fcast.snaive = snaive(bricksq.1992, method='naive')

autoplot(fcast.stlf, series="Forecast") + autolayer(bricksq, series="Data") + ylab("Production") +
  ggtitle("Naive forecasts STLF")

autoplot(fcast.snaive, series="Forecast") + autolayer(bricksq, series="Data") + ylab("Production") +
  ggtitle("Naive forecasts STLF")

The STLf performs better, because it matches seasonality.

7. Use stlf() to produce forecasts of the writing series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(writing)

writing %>%
  stl(t.window=30, s.window="periodic", robust=TRUE) %>%
  autoplot()

fcast.stlf <- stlf(writing, method='rwdrift')
autoplot(fcast.stlf, series="Forecast") + ylab("Writing") +
  ggtitle("Random Walk forecasts STLF")

checkresiduals(forecast(fcast.stlf,h=8))

## Warning in checkresiduals(forecast(fcast.stlf, h = 8)): The fitted degrees of
## freedom is based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  Random walk with drift
## Q* = 52.365, df = 23, p-value = 0.0004466
## 
## Model df: 1.   Total lags used: 24

A Box-Cox transformation was not required, because the residuals were nornally distributed.

8.Use stlf() to produce forecasts of the fancy series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

fancy.bc<-BoxCox(fancy,BoxCox.lambda(fancy))

autoplot(fancy.bc)

fancy.bc %>%
  stl(t.window=30, s.window="periodic", robust=TRUE) %>%
  autoplot()

fcast.stlf <- stlf(fancy.bc, method='naive')
autoplot(fcast.stlf, series="Forecast") + ylab("Writing") +
  ggtitle("Random Walk forecasts STLF")

checkresiduals(forecast(fcast.stlf,h=8))

## Warning in checkresiduals(forecast(fcast.stlf, h = 8)): The fitted degrees of
## freedom is based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  Random walk
## Q* = 53.379, df = 17, p-value = 1.244e-05
## 
## Model df: 0.   Total lags used: 17

A Box-Cox transformation was required, because the trend was nonlinear. Distribution of the residuals suggests the estimates appropriate.