ADEC7460 - Assignment #1

####################
#Chapter 5 Exercises
####################
library(fpp2)

## Warning: package 'fpp2' was built under R version 4.0.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## -- Attaching packages ---------------------------------- fpp2 2.4 --

## v ggplot2   3.3.0     v fma       2.4  
## v forecast  8.12      v expsmooth 2.3

## Warning: package 'forecast' was built under R version 4.0.2

## Warning: package 'fma' was built under R version 4.0.3

## Warning: package 'expsmooth' was built under R version 4.0.3

##

library(knitr)

Question 1.

Daily electricity demand for Victoria, Australia, during 2014 is contained in elecdaily. The data for the first 20 days can be obtained as follows. daily20 = head(elecdaily,20)

1a. Plot the data and find the regression model for Demand with temperature as an explanatory variable. Why is there a positive relationship?

daily20 = head(elecdaily,20) #data for first 20 days
plot.ts(daily20)

autoplot(daily20)

RegMod = tslm(Demand ~ Temperature, data = daily20)
summary(RegMod)

## 
## Call:
## tslm(formula = Demand ~ Temperature, data = daily20)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.060  -7.117  -1.437  17.484  27.102 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  39.2117    17.9915   2.179   0.0428 *  
## Temperature   6.7572     0.6114  11.052 1.88e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22 on 18 degrees of freedom
## Multiple R-squared:  0.8716, Adjusted R-squared:  0.8644 
## F-statistic: 122.1 on 1 and 18 DF,  p-value: 1.876e-09

There is a positive relationship perhaps because as temperature increases, the demand for things such as air conditioners increased and thus, increased demand for electricity.

1b. Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?

checkresiduals(RegMod$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

The residuals are not correlated, so yes, the model is adequate.Yes, there was an outlier.

1c.Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was 15∘and compare it with the forecast if the with maximum temperature was 35∘. Do you believe these forecasts?

FRegMod = forecast(RegMod,newdata=data.frame(Temperature=c(15,35)))
FRegMod

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 4.285714       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.428571       275.7146 245.2278 306.2014 227.57056 323.8586

Yes, I believe the forecasts. The forecasts are close to the ranges in the data.

1d. Give prediction intervals for your forecasts. The following R code will get you started

autoplot(daily20, facets=TRUE)

daily20 %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

fit = tslm(Demand ~ Temperature, data=daily20)
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774

forecast(fit, newdata=data.frame(Temperature=c(15,35)))

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 4.285714       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.428571       275.7146 245.2278 306.2014 227.57056 323.8586

1e. Plot Demand vs Temperature for all of the available data in elecdaily. What does this say about your model?

elecdaily %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

This plot shows that the model was not the best since it does not account of a lot of the data in elecdaily

Question 2.

Data set mens400 contains the winning times (in seconds) for the men’s 400 meters final in each Olympic Games from 1896 to 2016.

2a. Plot the winning time against the year. Describe the main features of the plot.

autoplot(mens400)

There is a trend - the winning times have decreased over time. Another feature is that there are missing values.

2b. Fit a regression line to the data. Obviously the winning times have been decreasing, but at what average rate per year?

Tmens400 = time(mens400)
Regmens400 = tslm(mens400 ~ Tmens400, data = mens400)
Regmens400

## 
## Call:
## tslm(formula = mens400 ~ Tmens400, data = mens400)
## 
## Coefficients:
## (Intercept)     Tmens400  
##   172.48148     -0.06457

Regmens400$coefficients

##  (Intercept)     Tmens400 
## 172.48147736  -0.06457385

autoplot(mens400) + 
  geom_abline(slope = Regmens400$coefficients[2],
              intercept = Regmens400$coefficients[1],
              color = "blue")

The winning times have been decreasing at an average rate of 0.06457 seconds per year.

2c. Plot the residuals against the year. What does this indicate about the suitability of the fitted line?

cbind(Time = Tmens400, 
  Residuals = Regmens400$residuals) %>%
  as.data.frame() %>%
  ggplot(aes(x = Time, y = Residuals)) +
  geom_point() +
  ylab("Residuals of Regression Line(Unit:s)")

## Warning: Removed 3 rows containing missing values (geom_point).

checkresiduals(Regmens400$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

Generally speaking, the model fit the data pretty well.

2d.Predict the winning time for the men’s 400 meters final in the 2020 Olympics. Give a prediction interval for your forecasts. What assumptions have you made in these calculations?

Fmens400 = lm(mens400 ~ Tmens400,data = mens400,na.action = na.exclude)

F2mens400 = forecast(Fmens400,newdata = data.frame(Tmens400 = 2020))

autoplot(mens400) +
  autolayer(F2mens400, PI = TRUE)

F2mens400

##   Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1       42.04231 40.44975 43.63487 39.55286 44.53176

The assumption was that the residuals were normally distribubted, but that is not the case.

Question 3.

Type easter(ausbeer) and interpret what you see.

easter(ausbeer)

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00

?easter

## starting httpd help server ... done

?ausbeer
str(ausbeer)

##  Time-Series [1:218] from 1956 to 2010: 284 213 227 308 262 228 236 320 272 233 ...

library(psych)

## Warning: package 'psych' was built under R version 4.0.2

## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

describe(ausbeer)

##    vars   n   mean    sd median trimmed   mad min max range  skew kurtosis   se
## X1    1 218 415.37 85.88    422  418.06 64.49 213 599   386 -0.31    -0.25 5.82

plot.ts(ausbeer)

There are 218 data points. This showed the Australian beer production for the observed time period of Easter (the days from Good Friday to Easter Sunday inclusively, plus optionally Easter Monday if easter.mon=TRUE.)if it is between March and April between the years 1956 - 2010.

Question 4.

An elasticity coefficient is the ratio of the percentage change in the forecast variable (

to the percentage change in the predictor variable (x). Mathematically, the elasticity is defined as (dy/dx)×(x/y). Consider the log-log model,logy=β0+β1logx+ε.Express y as a function of x and show that the coefficient β1 is the elasticity coefficient.

y could be y = e^( β0 + β1 log x + e) Elasticity is (dy/dx)×(x/y). If you substitute that equation in for β1, you get the same result.

Question 5.

The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

5a. Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.

autoplot(fancy)

ggseasonplot(fancy)

For the most part, sales increased from January to December with a much higher increase in each December as time went on except for the slight decrease in 1991.There also seems to be a slight increase every March.

This seems to be consistent with the description which states that there is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988.

5b.Explain why it is necessary to take logarithms of these data before fitting a model.

We must take logarithms due to the increased seasonal variations. In order to be fitted well, the size of the seasonal variations should

5c.Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.

Tfancy = time(fancy)
surfing_festival = c()
for(i in 1:length(Tfancy)){
  month = round(12*(Tfancy[i] - floor(Tfancy[i]))) + 1
  year = floor(Tfancy[i])
  if(year >= 1988 & month == 3){
    surfing_festival[i] = 1
  } else {
    surfing_festival[i] = 0
  }
}

Tfancy_log = tslm(BoxCox(fancy, 0) ~ trend + season + surfing_festival)

5d.Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?

autoplot(Tfancy_log$residuals)

There is pattern as time goes by. Therefore, there is a correlation between the time and the residuals, and this is a problem.

cbind(Residuals = Tfancy_log$residuals,
      Fitted_values = Tfancy_log$fitted.values) %>%
  as.data.frame() %>%
  ggplot(aes(x = Fitted_values,
             y = Residuals)) +
    geom_point()

The scatterplot shows that there is heteroscedacity.

5e.Do boxplots of the residuals for each month. Does this reveal any problems with the model?

cbind.data.frame(
  Month = factor(
    month.abb[round(12*(Tfancy - floor(Tfancy)) + 1)],
    labels = month.abb,
    ordered = TRUE
  ),
  Residuals = Tfancy_log$residuals
) %>%
  ggplot(aes(x = Month,
             y = Residuals)) +
  geom_boxplot()

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

5f.What do the values of the coefficients tell you about each variable?

Tfancy_log$coefficients

##      (Intercept)            trend          season2          season3 
##       7.61966701       0.02201983       0.25141682       0.26608280 
##          season4          season5          season6          season7 
##       0.38405351       0.40948697       0.44882828       0.61045453 
##          season8          season9         season10         season11 
##       0.58796443       0.66932985       0.74739195       1.20674790 
##         season12 surfing_festival 
##       1.96224123       0.50151509

There is a positive trend - sales amount increases as time increases. The minimum occured in January and the maximum occured in December. Surfing_festival is ~0.501, so the festival leads to higher sales in March.

5g.What does the Breusch-Godfrey test tell you about your model?

checkresiduals(Tfancy_log)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals from Linear regression model
## LM test = 37.954, df = 17, p-value = 0.002494

The p-value is less than 0.05. There is autocorrelation.

5h.Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.

Ffancy = rep(0, 36)
for(i in 1:36){
  if(i %% 12 == 3){
    Ffancy[i] = 1
  }
}

Ffancy = ts(data = Ffancy, start = 1994, end = c(1996, 12),frequency = 12)

Ffancy_log = forecast(Tfancy_log,newdata = data.frame(Time = time(Ffancy),surfing_festival = Ffancy))

autoplot(Ffancy_log)

Ffancy_log$upper

##               [,1]     [,2]
## Jan 1994  9.744183  9.88111
## Feb 1994 10.017620 10.15455
## Mar 1994 10.557120 10.69475
## Apr 1994 10.194296 10.33122
## May 1994 10.241749 10.37868
## Jun 1994 10.303110 10.44004
## Jul 1994 10.486756 10.62368
## Aug 1994 10.486286 10.62321
## Sep 1994 10.589671 10.72660
## Oct 1994 10.689753 10.82668
## Nov 1994 11.171129 11.30806
## Dec 1994 11.948642 12.08557
## Jan 1995 10.011336 10.14984
## Feb 1995 10.284773 10.42328
## Mar 1995 10.823938 10.96297
## Apr 1995 10.461449 10.59996
## May 1995 10.508903 10.64741
## Jun 1995 10.570264 10.70877
## Jul 1995 10.753910 10.89242
## Aug 1995 10.753440 10.89195
## Sep 1995 10.856825 10.99533
## Oct 1995 10.956907 11.09541
## Nov 1995 11.438282 11.57679
## Dec 1995 12.215796 12.35430
## Jan 1996 10.279093 10.41951
## Feb 1996 10.552530 10.69294
## Mar 1996 11.091365 11.23212
## Apr 1996 10.729206 10.86962
## May 1996 10.776659 10.91707
## Jun 1996 10.838021 10.97843
## Jul 1996 11.021667 11.16208
## Aug 1996 11.021196 11.16161
## Sep 1996 11.124582 11.26499
## Oct 1996 11.224664 11.36508
## Nov 1996 11.706039 11.84645
## Dec 1996 12.483552 12.62396

Ffancy_log$lower

##               [,1]      [,2]
## Jan 1994  9.238522  9.101594
## Feb 1994  9.511959  9.375031
## Mar 1994 10.048860  9.911228
## Apr 1994  9.688635  9.551707
## May 1994  9.736088  9.599161
## Jun 1994  9.797449  9.660522
## Jul 1994  9.981095  9.844168
## Aug 1994  9.980625  9.843698
## Sep 1994 10.084010  9.947083
## Oct 1994 10.184092 10.047165
## Nov 1994 10.665468 10.528541
## Dec 1994 11.442981 11.306054
## Jan 1995  9.499844  9.361338
## Feb 1995  9.773281  9.634775
## Mar 1995 10.310518 10.171489
## Apr 1995  9.949957  9.811451
## May 1995  9.997411  9.858904
## Jun 1995 10.058772  9.920265
## Jul 1995 10.242418 10.103911
## Aug 1995 10.241948 10.103441
## Sep 1995 10.345333 10.206826
## Oct 1995 10.445415 10.306908
## Nov 1995 10.926791 10.788284
## Dec 1995 11.704304 11.565797
## Jan 1996  9.760564  9.620151
## Feb 1996 10.034000  9.893588
## Mar 1996 10.571566 10.430810
## Apr 1996 10.210677 10.070264
## May 1996 10.258130 10.117718
## Jun 1996 10.319491 10.179079
## Jul 1996 10.503137 10.362725
## Aug 1996 10.502667 10.362254
## Sep 1996 10.606052 10.465640
## Oct 1996 10.706134 10.565722
## Nov 1996 11.187510 11.047097
## Dec 1996 11.965023 11.824611

5i.Transform your predictions and intervals to obtain predictions and intervals for the raw data.

Ffancy2 = Ffancy_log

#inverse log transformation
log_trans = c('x', 'mean', 'lower', 'upper', 'residuals', 'fitted')

Ffancy2[log_trans] = lapply(
  Ffancy_log[log_trans],
  InvBoxCox,
  lambda = 0
)

autoplot(Ffancy2)

#inverse log to boxcox
Ffancy2[['model']][['model']][1] = lapply(
  Ffancy_log[['model']][['model']][1],
  InvBoxCox,
  lambda = 0
)

autoplot(Ffancy2)

Ffancy2$upper

##               [,1]      [,2]
## Jan 1994  17054.73  19557.43
## Feb 1994  22418.00  25707.73
## Mar 1994  38450.24  44123.68
## Apr 1994  26750.16  30675.62
## May 1994  28050.15  32166.37
## Jun 1994  29825.24  34201.95
## Jul 1994  35837.72  41096.73
## Aug 1994  35820.87  41077.41
## Sep 1994  39722.43  45551.50
## Oct 1994  43903.67  50346.32
## Nov 1994  71049.28  81475.41
## Dec 1994 154607.08 177294.90
## Jan 1995  22277.59  25587.08
## Feb 1995  29283.31  33633.55
## Mar 1995  50208.44  57697.39
## Apr 1995  34942.16  40133.06
## May 1995  36640.25  42083.42
## Jun 1995  38958.95  44746.58
## Jul 1995  46812.70  53767.06
## Aug 1995  46790.69  53741.78
## Sep 1995  51887.06  59595.26
## Oct 1995  57348.77  65868.34
## Nov 1995  92807.48 106594.69
## Dec 1995 201954.08 231955.81
## Jan 1996  29117.46  33506.86
## Feb 1996  38274.14  44043.89
## Mar 1996  65602.25  75517.62
## Apr 1996  45670.42  52555.15
## May 1996  47889.88  55109.18
## Jun 1996  50920.48  58596.65
## Jul 1996  61185.57  70409.18
## Aug 1996  61156.80  70376.07
## Sep 1996  67817.91  78041.33
## Oct 1996  74956.52  86256.08
## Nov 1996 121302.09 139588.15
## Dec 1996 263959.89 303751.35

Ffancy2$lower

##               [,1]       [,2]
## Jan 1994  10285.82   8969.583
## Feb 1994  13520.45  11790.284
## Mar 1994  23129.40  20155.412
## Apr 1994  16133.21  14068.696
## May 1994  16917.24  14752.395
## Jun 1994  17987.81  15685.969
## Jul 1994  21613.98  18848.111
## Aug 1994  21603.82  18839.249
## Sep 1994  23956.87  20891.193
## Oct 1994  26478.61  23090.230
## Nov 1994  42850.31  37366.903
## Dec 1994  93244.59  81312.400
## Jan 1995  13357.65  11629.938
## Feb 1995  17558.28  15287.252
## Mar 1995  30046.98  26146.972
## Apr 1995  20951.33  18241.435
## May 1995  21969.51  19127.918
## Jun 1995  23359.80  20338.387
## Jul 1995  28068.91  24438.412
## Aug 1995  28055.72  24426.922
## Sep 1995  31111.50  27087.467
## Oct 1995  34386.35  29938.733
## Nov 1995  55647.40  48449.831
## Dec 1995 121091.75 105429.448
## Jan 1996  17336.40  15065.329
## Feb 1996  22788.25  19802.984
## Mar 1996  39009.73  33887.802
## Apr 1996  27191.96  23629.808
## May 1996  28513.41  24778.151
## Jun 1996  30317.82  26346.183
## Jul 1996  36429.60  31657.322
## Aug 1996  36412.48  31642.439
## Sep 1996  40378.47  35088.887
## Oct 1996  44628.77  38782.394
## Nov 1996  72222.70  62761.521
## Dec 1996 157160.50 136572.460

5j.How could you improve these predictions by modifying the model?

I could use transformations.

Question 6

6.The gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “million barrels per day”. Consider only the data to the end of 2004.

6a. Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.

gas = window(gasoline, end = 2005)
autoplot(gas)

fgas1 = tslm(gas ~ trend + fourier(gas, K=5))
fgas2 = tslm(gas ~ trend + fourier(gas, K=10))
fgas3 = tslm(gas ~ trend + fourier(gas, K=20))

autoplot(gas, ylab = "Gas",main= "Fourier") +  autolayer(fitted(fgas1))+
autolayer(fitted(fgas2))+autolayer(fitted(fgas3))

6b.Select the appropriate number of Fourier terms to include by minimising the AICc or CV value.

CV(fgas1)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.553646e-02 -1.873234e+03 -1.872723e+03 -1.813596e+03  8.406928e-01

CV(fgas2)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.135754e-02 -1.915014e+03 -1.913441e+03 -1.809500e+03  8.516102e-01

CV(fgas3)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.223834e-02 -1.908017e+03 -1.902469e+03 -1.710753e+03  8.540588e-01

fgas2 is the best model to use here as it has the lowest CV and AIC.

6c.Check the residuals of the final model using the checkresiduals() function. Even though the residuals fail the correlation tests, the results are probably not severe enough to make much difference to the forecasts and prediction intervals. (Note that the correlations are relatively small, even though they are significant.)

checkresiduals(fgas2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 104
## 
## data:  Residuals from Linear regression model
## LM test = 155.45, df = 104, p-value = 0.0008135

6d.To forecast using harmonic regression, you will need to generate the future values of the Fourier terms. This can be done as follows.

fc <- forecast(fgas2, newdata=data.frame(fourier(gas,10,52)))
fc

##          Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2005.014       8.745595 8.402358 9.088832 8.220249  9.270941
## 2005.033       8.604075 8.260737 8.947413 8.078574  9.129575
## 2005.052       8.607195 8.263839 8.950551 8.081666  9.132724
## 2005.071       8.683215 8.339879 9.026550 8.157718  9.208712
## 2005.090       8.746645 8.403404 9.089887 8.221293  9.271998
## 2005.110       8.783003 8.439850 9.126156 8.257785  9.308221
## 2005.129       8.833517 8.490370 9.176664 8.308309  9.358725
## 2005.148       8.917498 8.574348 9.260647 8.392285  9.442710
## 2005.167       8.997861 8.654729 9.340992 8.472676  9.523046
## 2005.186       9.029128 8.686005 9.372252 8.503956  9.554301
## 2005.205       9.018507 8.675381 9.361633 8.493330  9.543684
## 2005.225       9.017572 8.674439 9.360705 8.492385  9.542760
## 2005.244       9.056052 8.712918 9.399187 8.530863  9.581242
## 2005.263       9.105523 8.762395 9.448651 8.580344  9.630703
## 2005.282       9.121048 8.777925 9.464172 8.595876  9.646221
## 2005.301       9.105879 8.762754 9.449005 8.580703  9.631055
## 2005.320       9.112513 8.769382 9.455644 8.587329  9.637697
## 2005.340       9.175080 8.831948 9.518211 8.649895  9.700265
## 2005.359       9.259000 8.915873 9.602127 8.733822  9.784178
## 2005.378       9.296291 8.953167 9.639415 8.771118  9.821464
## 2005.397       9.268208 8.925083 9.611334 8.743032  9.793385
## 2005.416       9.234527 8.891397 9.577657 8.709345  9.759710
## 2005.435       9.270246 8.927115 9.613376 8.745062  9.795429
## 2005.455       9.381554 9.038427 9.724681 8.856376  9.906731
## 2005.474       9.497887 9.154762 9.841011 8.972713 10.023060
## 2005.493       9.546877 9.203751 9.890002 9.021700 10.072053
## 2005.512       9.524722 9.181592 9.867851 8.999539 10.049904
## 2005.531       9.487091 9.143960 9.830221 8.961908 10.012274
## 2005.550       9.482387 9.139260 9.825513 8.957209 10.007564
## 2005.570       9.509124 9.165999 9.852248 8.983950 10.034297
## 2005.589       9.536544 9.193418 9.879670 9.011367 10.061720
## 2005.608       9.546848 9.203718 9.889978 9.021665 10.072030
## 2005.627       9.541950 9.198820 9.885081 9.016767 10.067133
## 2005.646       9.517407 9.174280 9.860533 8.992229 10.042584
## 2005.665       9.453438 9.110314 9.796562 8.928265  9.978612
## 2005.685       9.344405 9.001279 9.687531 8.819228  9.869581
## 2005.704       9.226892 8.883762 9.570023 8.701709  9.752076
## 2005.723       9.160315 8.817184 9.503446 8.635131  9.685499
## 2005.742       9.174078 8.830951 9.517205 8.648901  9.699255
## 2005.761       9.241612 8.898488 9.584736 8.716439  9.766785
## 2005.780       9.310432 8.967306 9.653559 8.785255  9.835609
## 2005.800       9.348790 9.005658 9.691922 8.823604  9.873976
## 2005.819       9.354217 9.011085 9.697350 8.829031  9.879404
## 2005.838       9.326591 8.983464 9.669718 8.801413  9.851768
## 2005.857       9.258087 8.914964 9.601211 8.732915  9.783260
## 2005.876       9.163285 8.820158 9.506412 8.638107  9.688463
## 2005.895       9.102531 8.759394 9.445669 8.577338  9.627725
## 2005.915       9.142445 8.799307 9.485583 8.617250  9.667640
## 2005.934       9.276104 8.932975 9.619234 8.750923  9.801286
## 2005.953       9.396214 9.053089 9.739340 8.871039  9.921390
## 2005.972       9.374939 9.031803 9.718075 8.849747  9.900131
## 2005.991       9.184286 8.841055 9.527518 8.658948  9.709624

6e. Plot the forecasts along with the actual data for 2005. What do you find?

gas2 = window(gasoline, start=2004, end=2006)
autoplot(gas2, series = "Actual")+ autolayer(fc$mean,series = "Forecast", ylab="Gasoline", main="Actual/Forecast")

## Warning: Ignoring unknown parameters: ylab, main

The forecast somewhat fits the data except for the sharp decrease in the later half of 2005 where the forecast does not go down that far.

Question 7. 7. Data set huron gives the water level of Lake Huron in feet from 1875 to 1972.

7a.Plot the data and comment on its features.

autoplot(LakeHuron)

head(LakeHuron)

## Time Series:
## Start = 1875 
## End = 1880 
## Frequency = 1 
## [1] 580.38 581.86 580.97 580.80 579.79 580.39

library(psych)
describe(LakeHuron)

##    vars  n mean   sd median trimmed mad    min    max range  skew kurtosis   se
## X1    1 98  579 1.32 579.12  579.01 1.3 575.96 581.86   5.9 -0.14    -0.55 0.13

There is somewhat of a downward trend, and for the most part it is difficult to find a seasonal pattern.

7b.Fit a linear regression and compare this to a piecewise linear trend model with a knot at 1915

h = 8
Thuron = tslm(LakeHuron ~ trend)
summary(Thuron)

## 
## Call:
## tslm(formula = LakeHuron ~ trend)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50997 -0.72726  0.00083  0.74402  2.53565 
## 
## Coefficients:
##               Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 580.202037   0.230111 2521.398  < 2e-16 ***
## trend        -0.024201   0.004036   -5.996 3.55e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared:  0.2725, Adjusted R-squared:  0.2649 
## F-statistic: 35.95 on 1 and 96 DF,  p-value: 3.545e-08

Timehuron = time(LakeHuron)
h_break = 1915
h_piece = ts(pmax(0,Timehuron-h_break), start=1875)
PWhuron = tslm(LakeHuron ~ Timehuron + h_piece)
summary(PWhuron)

## 
## Call:
## tslm(formula = LakeHuron ~ Timehuron + h_piece)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.49626 -0.66240 -0.07139  0.85163  2.39222 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 702.90870   19.97687  35.186  < 2e-16 ***
## Timehuron    -0.06498    0.01051  -6.181 1.58e-08 ***
## h_piece       0.06486    0.01563   4.150 7.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.045 on 95 degrees of freedom
## Multiple R-squared:  0.3841, Adjusted R-squared:  0.3711 
## F-statistic: 29.62 on 2 and 95 DF,  p-value: 1.004e-10

7c.Generate forecasts from these two models for the period up to 1980 and comment on these.

Fhuron = forecast(Thuron, h=h)
Thuron2 = Timehuron[length(Timehuron)] + seq(h)
h_piece2 = h_piece[length(h_piece)]+seq(h)
huron2 = cbind(t=Thuron2, h_piece=h_piece2) %>%
  as.data.frame()
fpwhuron = forecast(PWhuron,newdata=LakeHuron)

## Warning in forecast.lm(PWhuron, newdata = LakeHuron): newdata column names not
## specified, defaulting to first variable required.

## Warning in forecast.lm(PWhuron, newdata = LakeHuron): Could not find required
## variable h_piece in newdata. Specify newdata as a named data.frame

autoplot(LakeHuron) +
  autolayer(fitted(Thuron), series = "Linear") +
  autolayer(Fhuron, series = "Linear")

autoplot(LakeHuron) +
  autolayer(fitted(PWhuron), series = "Piecewise") +
  autolayer(fpwhuron, series="Piecewise")

####################
#Chapter 6 Exercises
####################

Question 1.

Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.

= (1/3)(1/5)*(V1+2V2+3V3+3V4+3V5+2V6+V7)

= (1/15)(V1+2V2+3V3+3V4+3V5+2V6+V7)

(1/15) = 0.067 (2/15) = 0.133 (3/15) = 0.2 (3/15) = 0.2 (2/15) = 0.133 (1/15) = 0.067

Question 2.

The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

2a. Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(plastics)

Yes, there are seasonal fluctuations and an upward trend.

2b.Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

decplastics = decompose(plastics, "multiplicative")

autoplot(decplastics)

2c.Do the results support the graphical interpretation from part a?

Yes, the results support the interpretation from a.

2d. Compute and plot the seasonally adjusted data.

adjplastics = decplastics$x/decplastics$seasonal

autoplot(adjplastics)

2e.Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

outplastics = plastics
outplastics[25] = outplastics[25] + 500
decout = decompose(outplastics, "multiplicative")
outadj =  decout$x / decout$seasonal
autoplot(outadj)

2f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

No, it does not matter

Question 3. 3. Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(readxl)
library(seasonal)

## Warning: package 'seasonal' was built under R version 4.0.3

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:psych':
## 
##     outlier

retail= read_excel("C:/Users/burtkb/Downloads/retail.xlsx", skip=1)
head(retail)

## # A tibble: 6 x 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## 3 1982-06-01 00:00:00      298       40.3      62.7      401       62.3
## 4 1982-07-01 00:00:00      308.      40.9      65.6      414.      68.2
## 5 1982-08-01 00:00:00      299.      42.1      62.6      404.      66  
## 6 1982-09-01 00:00:00      305.      42        64.4      412.      62.3
## # ... with 184 more variables: A3349336V <dbl>, A3349337W <dbl>,
## #   A3349397X <dbl>, A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>,
## #   A3349790V <dbl>, A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>,
## #   A3349873A <dbl>, A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>,
## #   A3349789K <dbl>, A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>,
## #   A3349799R <dbl>, A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>,
## #   A3349416V <dbl>, A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>,
## #   A3349727C <dbl>, A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>,
## #   A3349349F <dbl>, A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>,
## #   A3349566A <dbl>, A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>,
## #   A3349561R <dbl>, A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>,
## #   A3349637X <dbl>, A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>,
## #   A3349719C <dbl>, A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>,
## #   A3349480L <dbl>, A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>,
## #   A3349481R <dbl>, A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>,
## #   A3349654A <dbl>, A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>,
## #   A3349656F <dbl>, A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>,
## #   A3349360V <dbl>, A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>,
## #   A3349575C <dbl>, A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>,
## #   A3349433W <dbl>, A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>,
## #   A3349815C <dbl>, A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>,
## #   A3349742A <dbl>, A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>,
## #   A3349824F <dbl>, A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>,
## #   A3349434X <dbl>, A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>,
## #   A3349908R <dbl>, A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>,
## #   A3349365F <dbl>, A3349746K <dbl>, ...

retail.ts = ts(retail[,"A3349627V"], frequency = 12, start = c(1982,4))

autoplot(retail.ts)

retailx11 = seas(retail.ts, x11 = "")

autoplot(retailx11)

Yes, we do see outliers in the remainder plot, and they are seen in the late 1980s.

4.Figures 6.16 and 6.17 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

4a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

There is seasonality and an upward trend. However, there is an exception during the early 90s as it becomes flatter. We also see a brief dip in the number of workers.

4b. Is the recession of 1991/1992 visible in the estimated components?

Yes, you can see the recession.

This exercise uses the cangas data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).

5a.Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?

autoplot(cangas)

ggsubseriesplot(cangas)

ggseasonplot(cangas)

There are likely less heaters during summer months and fewer airconditioners in Canada. The upward trend is likely due to increased use of electricity.

5b.Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.

cstl = stl(cangas, s.window = 12)
autoplot(cstl)

5c.Compare the results with those obtained using SEATS and X11. How are they different?

There’s more variation in seasonality.

6.We will use the bricksq data (Australian quarterly clay brick production. 1956–1994) for this exercise.

6a. Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)

#fixed
stlbricksq1 = stl(bricksq, 
                  s.window = "periodic",
                  robust = TRUE)
autoplot(stlbricksq1)

#changing
stlbricksq2 = stl(bricksq,
                  s.window = 5,
                  robust = TRUE)
autoplot(stlbricksq2)

6b. Compute and plot the seasonally adjusted data.

bricksqseas = seasadj(stlbricksq1)
autoplot(bricksqseas, ylab = "Production", xlab = "Time",main = "Seasonal Adjusted  Production")

6c.Use a naïve method to produce forecasts of the seasonally adjusted data.

fbricksq = naive(bricksqseas)
autoplot(fbricksq)

6d.Use stlf() to reseasonalise the results, giving forecasts for the original data.

bricksqstlf = stlf(bricksq,method="naive")
autoplot(bricksqstlf)

fore_bricksqstlf = forecast(bricksqstlf)
fore_bricksqstlf

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       465.7338 440.1878 491.2798 426.6645 504.8030
## 1995 Q1       424.6739 388.5464 460.8014 369.4217 479.9261
## 1995 Q2       483.9926 439.7456 528.2395 416.3227 551.6624
## 1995 Q3       494.0000 442.9080 545.0920 415.8616 572.1384
## 1995 Q4       465.7338 408.6112 522.8563 378.3723 553.0952
## 1996 Q1       424.6739 362.0993 487.2485 328.9742 520.3736
## 1996 Q2       483.9926 416.4042 551.5809 380.6251 587.3600
## 1996 Q3       494.0000 421.7450 566.2550 383.4956 604.5044

6e.Do the residuals look uncorrelated?

checkresiduals(bricksqstlf)

## Warning in checkresiduals(bricksqstlf): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  Random walk
## Q* = 40.829, df = 8, p-value = 2.244e-06
## 
## Model df: 0.   Total lags used: 8

Yes, there is a slight correlation

6f.Repeat with a robust STL decomposition. Does it make much difference?

robustbricksq = stlf(bricksq, robust = TRUE)
autoplot(robustbricksq)

checkresiduals(robustbricksq)

## Warning in checkresiduals(robustbricksq): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,N,N)
## Q* = 28.163, df = 6, p-value = 8.755e-05
## 
## Model df: 2.   Total lags used: 8

6g.Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?

train_bricksq = window(bricksq, end=c(1992,3))
test_bricksq = window(bricksq, start=c(1992,4), end=c(1994,3))


snaive_bricksq = snaive(train_bricksq)
stlf_bricksq1 = stlf(train_bricksq, robust = TRUE)

snaive_bricksq

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1992 Q4            430 366.2905 493.7095 332.5647 527.4353
## 1993 Q1            387 323.2905 450.7095 289.5647 484.4353
## 1993 Q2            413 349.2905 476.7095 315.5647 510.4353
## 1993 Q3            451 387.2905 514.7095 353.5647 548.4353
## 1993 Q4            430 339.9011 520.0989 292.2056 567.7944
## 1994 Q1            387 296.9011 477.0989 249.2056 524.7944
## 1994 Q2            413 322.9011 503.0989 275.2056 550.7944
## 1994 Q3            451 360.9011 541.0989 313.2056 588.7944

stlf_bricksq1

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1992 Q4       425.1610 398.1223 452.1997 383.8089 466.5131
## 1993 Q1       378.5145 340.2544 416.7747 320.0007 437.0284
## 1993 Q2       443.2827 396.3956 490.1697 371.5751 514.9902
## 1993 Q3       450.9970 396.8234 505.1705 368.1457 533.8482
## 1993 Q4       425.1610 364.5560 485.7660 332.4737 517.8483
## 1994 Q1       378.5145 312.0843 444.9448 276.9182 480.1108
## 1994 Q2       443.2827 371.4856 515.0797 333.4785 553.0868
## 1994 Q3       450.9970 374.1953 527.7986 333.5390 568.4550

the STLF is more accurate

Use stlf() to produce forecasts of the writing series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(writing)

str(writing)

##  Time-Series [1:120] from 1968 to 1978: 563 599 669 598 580 ...

stflwriting = stlf(writing, 
                     s.window = 13, 
                     robust = TRUE,
                     lambda = BoxCox.lambda(writing),
                     method = "rwdrift")

autoplot(stflwriting)

Use stlf() to produce forecasts of the fancy series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(fancy)

str(fancy)

##  Time-Series [1:84] from 1987 to 1994: 1665 2398 2841 3547 3753 ...

stflfancy = stlf(fancy, 
                     s.window = 13, 
                     robust = TRUE,
                     lambda = BoxCox.lambda(fancy),
                     method = "rwdrift")

autoplot(stflfancy)

ADEC7460 - Assignment #1

Kayla Burt

10/31/2020