Homework 2

First, let’s install the packages

library(fpp2)

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 3.6.2

## Loading required package: forecast

## Warning: package 'forecast' was built under R version 3.6.2

## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Loading required package: fma

## Loading required package: expsmooth

library(BETS)

## 
## Attaching package: 'BETS'

## The following object is masked from 'package:stats':
## 
##     predict

library(lubridate)

## Warning: package 'lubridate' was built under R version 3.6.2

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Chapter 5

Q1.

daily20 <- head(elecdaily,20)
autoplot(daily20, facets = TRUE)

reg1 <- tslm(Demand~Temperature, data = daily20)
reg1

## 
## Call:
## tslm(formula = Demand ~ Temperature, data = daily20)
## 
## Coefficients:
## (Intercept)  Temperature  
##      39.212        6.757

When the temperature goes up, people will use more refrigerates and air-conditions, which requires more electricity.

checkresiduals(reg1)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774

Here we can see in the residual plot that the residuals are left-skewed. Moreover, there are several outliers are smaller than others.The residual is correlated with temperature, which infers it is a model which omits necessary variables.

newTem <- data.frame(Temperature = c(15,35))
felec <- forecast(reg1, newdata = newTem)

I can partially believe the result. It shows a reasonable result.

felec

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 3.857143       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.000000       275.7146 245.2278 306.2014 227.57056 323.8586

autoplot(felec)

as.data.frame(elecdaily) %>% 
  ggplot(aes(x = Temperature, y = Demand))+
  geom_point()

The relationship between Demand and Temperature is non-linear. The truth is when temperature is below 20, the demand of electricity is negatively correlated with temperature. While when the temperature is above 20, the demand of electricity is positively correlated with temperature. A better way is to use the piecewise model.

Q2.

autoplot(mens400)

We can see that there are two interrupted intervals because of world war. The time spent decreases as time goes by.

reg2<-tslm(mens400 ~ trend, mens400)
reg2

## 
## Call:
## tslm(formula = mens400 ~ trend, data = mens400)
## 
## Coefficients:
## (Intercept)        trend  
##     50.3078      -0.2583

autoplot(mens400)+
 geom_smooth(method = 'lm', se =FALSE)

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (stat_smooth).

Average decreasing rate per year is 0.2583.

checkresiduals(reg2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals from Linear regression model
## LM test = 3.6082, df = 6, p-value = 0.7295

We can find a outlier in the residual plot. The mean of residual is 0, so the fitted line does a good job.

h=2020-2016
forecast(reg2, h)

## Warning in forecast.lm(reg2, h): newdata column names not specified,
## defaulting to first variable required.

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2020       42.04231 40.44975 43.63487 39.55286 44.53176

I assume that the linear decreasing trend remains. But we all know that the time is not going to be negative. So, this assumption is not useful in long term.

Q3.

easter(ausbeer)

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00

The result shows which quarter contains easter in that year. There are two exceptions: If Easter is on Apirl 1st, then Qtr1 = 0.67 and Qtr2 = 0.33; If Easter is on Apirl 2nd, then Qtr1 = 0.33 and Qtr2 = 0.67.

y = e^(β0+β1*logx+ε)

logy = β0+β1*logx+ε

Take the derivatives for both sides,

1/y*dy/dx = β1/x

so, β1=(x/dx)/(y/dy) = (dy/dx)*(x/y)

autoplot(fancy)

ggseasonplot(fancy,polar=TRUE)

ggsubseriesplot(fancy)

I notice that the monthly sale increases dramatically during Christmas and moderately increases in every March.

Because monthly sales in December is much larger than other months.

Time <- time(fancy)
festival <- c()
for(i in 1:length(Time)){
  month <- round(12*(Time[i] - floor(Time[i]))) + 1
  year <- floor(Time[i])
  if(year >= 1988 & month == 3){
    festival[i] <- 1
  } else {
    festival[i] <- 0
  }
}
reg5<-tslm(log(fancy) ~ trend+season+festival)
reg5

## 
## Call:
## tslm(formula = log(fancy) ~ trend + season + festival)
## 
## Coefficients:
## (Intercept)        trend      season2      season3      season4  
##     7.61967      0.02202      0.25142      0.26608      0.38405  
##     season5      season6      season7      season8      season9  
##     0.40949      0.44883      0.61045      0.58796      0.66933  
##    season10     season11     season12     festival  
##     0.74739      1.20675      1.96224      0.50152

autoplot(reg5$residuals)

cbind(residuals = reg5$residuals,fitted_value=fitted(reg5)) %>%
 as.data.frame() %>%
 ggplot(aes(x=fitted_value,y=residuals))+
 geom_point()

In these two plots, we find that the residuals are correlated with both time and fitted value.

month = factor(
      month.abb[round(12*(Time - floor(Time)) + 1)],
      labels = month.abb,
      ordered = TRUE
    )
    
cbind.data.frame(month=month,residuals = reg5$residuals) %>%
 ggplot(aes(x=month,y=residuals))+
 geom_boxplot()

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

The residual is correlated with month

reg5$coefficients

## (Intercept)       trend     season2     season3     season4     season5 
##  7.61966701  0.02201983  0.25141682  0.26608280  0.38405351  0.40948697 
##     season6     season7     season8     season9    season10    season11 
##  0.44882828  0.61045453  0.58796443  0.66932985  0.74739195  1.20674790 
##    season12    festival 
##  1.96224123  0.50151509

As predicted, season 12 has the highest coefficient, and festival dummy has positive coefficient.

checkresiduals(reg5)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals from Linear regression model
## LM test = 37.954, df = 17, p-value = 0.002494

P-value is 0.002494, which is lower than 0.01. So, we can tell that the model is not free of autocorrelation.

h <- rep(0, 36)
f_fancy<-forecast(reg5,h)

## Warning in forecast.lm(reg5, h): newdata column names not specified,
## defaulting to first variable required.

f_fancy

##          Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## Jan 1994       9.491352  9.238522  9.744183  9.101594  9.88111
## Feb 1994       9.764789  9.511959 10.017620  9.375031 10.15455
## Mar 1994       9.801475  9.461879 10.141071  9.277961 10.32499
## Apr 1994       9.941465  9.688635 10.194296  9.551707 10.33122
## May 1994       9.988919  9.736088 10.241749  9.599161 10.37868
## Jun 1994      10.050280  9.797449 10.303110  9.660522 10.44004
## Jul 1994      10.233926  9.981095 10.486756  9.844168 10.62368
## Aug 1994      10.233456  9.980625 10.486286  9.843698 10.62321
## Sep 1994      10.336841 10.084010 10.589671  9.947083 10.72660
## Oct 1994      10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994      10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994      11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995       9.755590  9.499844 10.011336  9.361338 10.14984
## Feb 1995      10.029027  9.773281 10.284773  9.634775 10.42328
## Mar 1995      10.065713  9.722498 10.408928  9.536620 10.59481
## Apr 1995      10.205703  9.949957 10.461449  9.811451 10.59996
## May 1995      10.253157  9.997411 10.508903  9.858904 10.64741
## Jun 1995      10.314518 10.058772 10.570264  9.920265 10.70877
## Jul 1995      10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995      10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995      10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995      10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995      11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995      11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996      10.019828  9.760564 10.279093  9.620151 10.41951
## Feb 1996      10.293265 10.034000 10.552530  9.893588 10.69294
## Mar 1996      10.329951  9.982679 10.677222  9.794605 10.86530
## Apr 1996      10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996      10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996      10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996      10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996      10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996      10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996      10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996      11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996      12.224288 11.965023 12.483552 11.824611 12.62396

autoplot(f_fancy)

raw_fancy<-f_fancy[c('mean','lower','upper')] %>%
  as.data.frame() %>%
  exp()
raw_fancy

##         mean   lower.1    lower.2   upper.1   upper.2
## 1   13244.70  10285.82   8969.583  17054.73  19557.43
## 2   17409.81  13520.45  11790.284  22418.00  25707.73
## 3   18060.36  12860.02  10699.594  25363.61  30484.96
## 4   20774.16  16133.21  14068.696  26750.16  30675.62
## 5   21783.73  16917.24  14752.395  28050.15  32166.37
## 6   23162.27  17987.81  15685.969  29825.24  34201.95
## 7   27831.56  21613.98  18848.111  35837.72  41096.73
## 8   27818.48  21603.82  18839.249  35820.87  41077.41
## 9   30848.42  23956.87  20891.193  39722.43  45551.50
## 10  34095.57  26478.61  23090.230  43903.67  50346.32
## 11  55176.84  42850.31  37366.903  71049.28  81475.41
## 12 120067.79  93244.59  81312.400 154607.08 177294.90
## 13  17250.40  13357.65  11629.938  22277.59  25587.08
## 14  22675.20  17558.28  15287.252  29283.31  33633.55
## 15  23522.50  16688.88  13858.024  33154.31  39926.92
## 16  27057.06  20951.33  18241.435  34942.16  40133.06
## 17  28371.96  21969.51  19127.918  36640.25  42083.42
## 18  30167.42  23359.80  20338.387  38958.95  44746.58
## 19  36248.88  28068.91  24438.412  46812.70  53767.06
## 20  36231.84  28055.72  24426.922  46790.69  53741.78
## 21  40178.16  31111.50  27087.467  51887.06  59595.26
## 22  44407.37  34386.35  29938.733  57348.77  65868.34
## 23  71864.42  55647.40  48449.831  92807.48 106594.69
## 24 156380.86 121091.75 105429.448 201954.08 231955.81
## 25  22467.57  17336.40  15065.329  29117.46  33506.86
## 26  29533.04  22788.25  19802.984  38274.14  44043.89
## 27  30636.60  21648.24  17936.708  43356.94  52328.52
## 28  35240.15  27191.96  23629.808  45670.42  52555.15
## 29  36952.72  28513.41  24778.151  47889.88  55109.18
## 30  39291.20  30317.82  26346.183  50920.48  58596.65
## 31  47211.93  36429.60  31657.322  61185.57  70409.18
## 32  47189.73  36412.48  31642.439  61156.80  70376.07
## 33  52329.57  40378.47  35088.887  67817.91  78041.33
## 34  57837.85  44628.77  38782.394  74956.52  86256.08
## 35  93598.96  72222.70  62761.521 121302.09 139588.15
## 36 203676.38 157160.50 136572.460 263959.89 303751.35

Maybe we can improve by using multiplicative method (ie,season*trend) to forecast.

a.b.

gasoline_until_2004<-window(gasoline, end=2005)
reg6<-tslm(gasoline_until_2004~trend)
cbind(gasoline = gasoline_until_2004, fitted_value = fitted(reg6))%>%
  as.data.frame()%>%
  ggplot(aes(x=fitted_value,y=gasoline))+
  geom_point()

#Harmonic regression is too difficult for me...
for(num in c(1:10)){

  var_name <- paste("tslm_ft",
                    as.character(num),
                    "_gasoline_until_2004",
                    sep = "")
  

  assign(var_name,
         tslm(gasoline_until_2004 ~ trend + fourier(
           gasoline_until_2004,
           K = num
           ))
         )
  

  print(
    autoplot(gasoline_until_2004) +
      autolayer(get(var_name)$fitted.values,
                series = as.character(num)) +
      ggtitle(var_name) +
      ylab("gasoline") +
      guides(colour = guide_legend(title = "Number of Fourier Transform pairs"))
  )
}

autoplot(gasoline_until_2004) +
  autolayer(tslm_ft1_gasoline_until_2004$fitted.values, series = "1") +
  autolayer(tslm_ft2_gasoline_until_2004$fitted.values, series = "2") +
  autolayer(tslm_ft3_gasoline_until_2004$fitted.values, series = "3") +
  autolayer(tslm_ft5_gasoline_until_2004$fitted.values, series = "5") +
  autolayer(tslm_ft10_gasoline_until_2004$fitted.values, series = "10") +
  guides(colour = guide_legend(title = "Fourier Transform pairs")) +
  scale_color_discrete(breaks = c(1:10))

for(i in c(1:10)){
  tslm_ft_gasoline_until_2004.name <- paste(
    "tslm_ft", as.character(i), "_gasoline_until_2004",
    sep = ""
  )
  writeLines(
    paste(
    "\n", tslm_ft_gasoline_until_2004.name, "\n"
    )
  )
  print(CV(get(tslm_ft_gasoline_until_2004.name)))
}

## 
##  tslm_ft1_gasoline_until_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  8.201921e-02 -1.813292e+03 -1.813208e+03 -1.790354e+03  8.250858e-01 
## 
##  tslm_ft2_gasoline_until_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  8.136854e-02 -1.819113e+03 -1.818957e+03 -1.787001e+03  8.269569e-01 
## 
##  tslm_ft3_gasoline_until_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.658846e-02 -1.863085e+03 -1.862834e+03 -1.821797e+03  8.375702e-01 
## 
##  tslm_ft4_gasoline_until_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.648608e-02 -1.864112e+03 -1.863742e+03 -1.813649e+03  8.382404e-01 
## 
##  tslm_ft5_gasoline_until_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.553646e-02 -1.873234e+03 -1.872723e+03 -1.813596e+03  8.406928e-01 
## 
##  tslm_ft6_gasoline_until_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##     0.0725333 -1902.7534284 -1902.0773721 -1833.9401782     0.8474536 
## 
##  tslm_ft7_gasoline_until_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##     0.0714734 -1913.5362459 -1912.6718391 -1835.5478956     0.8501073 
## 
##  tslm_ft8_gasoline_until_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.147431e-02 -1.913619e+03 -1.912542e+03 -1.826455e+03  8.505267e-01 
## 
##  tslm_ft9_gasoline_until_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.161451e-02 -1.912292e+03 -1.910980e+03 -1.815954e+03  8.506543e-01 
## 
##  tslm_ft10_gasoline_until_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.135754e-02 -1.915014e+03 -1.913441e+03 -1.809500e+03  8.516102e-01

Therefore, we know that K=7 has the lowest CV and AICc value.

gasoline_until_2004<-window(gasoline, end=2005)
tslm_ft7_gasoline_until_2004 <- tslm(
  gasoline_until_2004 ~ trend + fourier(
    gasoline_until_2004, 
    K = 7
    )
  )
checkresiduals(tslm_ft7_gasoline_until_2004)

## 
##  Breusch-Godfrey test for serial correlation of order up to 104
## 
## data:  Residuals from Linear regression model
## LM test = 149.31, df = 104, p-value = 0.002404

tslm_forecast7 <- forecast(
  tslm_ft7_gasoline_until_2004,
  newdata = data.frame(fourier(
    gasoline_until_2004, 
    K = 7,
    h=52
    ))
  )

#Since we already know K=7 has the lowest CV and AICc value.

autoplot(tslm_forecast7)+
  autolayer(window(
    gasoline,
    start = 2005,
    end=2006
    )
  )+
  scale_x_continuous(limits=c(2005,2006))

## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.

## Warning: Removed 726 row(s) containing missing values (geom_path).

# This forecast does a not bad job. Its interval fits 90% of the actual value.

autoplot(huron)

# It decreases over time

#linear
fit.lin<- tslm(huron~trend)
#piecewise
t <- time(huron)
t.break <- 1915
tb1 <- ts(pmax(0,t-t.break), start=1875)
fit.pw <- tslm(huron ~ t+tb1)

h<-1980-t[length(t)]
t.new<- t[length(t)]+seq(h)
tb1.new<- tb1[length(tb1)]+seq(h)
newdata <- cbind(t= t.new, tb1= tb1.new) %>%
  as.data.frame()

fc_linear <- forecast(fit.lin, h=h)
fc_piecewise <- forecast(fit.pw, newdata=newdata)

autoplot(huron)+
  autolayer(fc_linear, series='Linear', PI=FALSE)+
  autolayer(fc_piecewise, series='Piecewise',PI=FALSE)

# These two model is not continuous and smooth, which infers they are not good forecasts.

Chapter 6.

T = 1/3(1/5Yt-3 + 1/5Yt-2 + 1/5Yt-1 + 1/5Yt + 1/5Yt+1)+ 1/3(1/5Yt-2 + 1/5Yt-1 + 1/5Yt + 1/5Yt+1 + 1/5Yt+2)+ 1/3(1/5Yt-1 + 1/5Yt + 1/5Yt+1 + 1/5Yt+2 + 1/5Yt+3) = 0.067Yt-3 + 0.133Yt-2 + 0.200Yt-1 + 0.200Yt + 0.200Yt+1 + 0.133Yt+2 + 0.133Yt+3

autoplot(plastics)

I can identify a significant seasonal component and positive trend.

dc_plas<-decompose(plastics, 'multiplicative')
autoplot(dc_plas)

Yes. It shows a positive trend and seasonal effect.

autoplot(plastics, series='Data')+
  autolayer(trendcycle(dc_plas),series='Trend')+
  autolayer(seasadj(dc_plas), series='Seasonally Adjusted')+
  xlab("Year") + ylab("plastics manufacture") +
  ggtitle("product A for a plastics manufacturer for five years") +
  scale_colour_manual(values=c("gray","blue","red"))

## Warning: Removed 12 row(s) containing missing values (geom_path).

plastics1 <- plastics
plastics1[15]<-plastics1[15]+500
dc_plas<-decompose(plastics1, 'multiplicative')
autoplot(dc_plas)

autoplot(plastics1, series='Data')+
  autolayer(trendcycle(dc_plas),series='Trend')+
  autolayer(seasadj(dc_plas), series='Seasonally Adjusted')+
  xlab("Year") + ylab("plastics manufacture") +
  ggtitle("product A for a plastics manufacturer for five years") +
  scale_colour_manual(values=c("gray","blue","red"))

## Warning: Removed 12 row(s) containing missing values (geom_path).

The seasonally adjusted data changes dramatically at the time point of outlier. But it affects trend line little.

plastics2 <- plastics
plastics2[55]<-plastics2[55]+500
dc_plas<-decompose(plastics2, 'multiplicative')
autoplot(dc_plas)

autoplot(plastics2, series='Data')+
  autolayer(trendcycle(dc_plas),series='Trend')+
  autolayer(seasadj(dc_plas), series='Seasonally Adjusted')+
  xlab("Year") + ylab("plastics manufacture") +
  ggtitle("product A for a plastics manufacturer for five years") +
  scale_colour_manual(values=c("gray","blue","red"))

## Warning: Removed 12 row(s) containing missing values (geom_path).

If the outlier is near the end, the effect to trend will decrease.

library(seasonal)

## Warning: package 'seasonal' was built under R version 3.6.2

retail<- read.csv('/Users/jiahaolin/Downloads/Forecasting/HW1/tute1.csv')
ts_retail  <- ts(retail[,"Sales"], 
                frequency=12, 
                start=c(1982,4))
ts_retail %>% seas(x11="") -> fit
autoplot(fit) +
  ggtitle("X11 decomposition of retail")

There are trend and seasonlity

Overall, the number of persons in the civilian labour force in Australia is increasing. The size of seasonal effect is increasing overtime. There is a positive lonbg-term trend. Several residual outlier exist in period 1991-1992

We can find that the seasonal component is increase in the period of 1991/1992, so it is visible. But not so significant.

autoplot(cangas)

ggsubseriesplot(cangas)

ggseasonplot(cangas)

The gas production increases in winter and decreases in summer. The possible reason is that Canada is high latitude. Therefore, in winter the demand of gas for heating increases.

stl_cangas <- stl(cangas, s.window = 13)
autoplot(stl_cangas)

x11_cangas <- seas(cangas, x11 = "")
seats_cangas <- seas(cangas)

autoplot(cangas, series = "Data") +
  autolayer(seasadj(stl_cangas), series = "Seasonally Adjusted") +
  autolayer(trendcycle(stl_cangas), series = "Trend-cycle") +
  scale_color_manual(values = c("gray", "blue", "red"),
                     breaks = c("Data", "Seasonally Adjusted", "Trend-cycle"))

autoplot(cangas, series = "Data") +
  autolayer(seasadj(x11_cangas), series = "Seasonally Adjusted") +
  autolayer(trendcycle(x11_cangas), series = "Trend-cycle") +
  scale_color_manual(values = c("gray", "blue", "red"),
                     breaks = c("Data", "Seasonally Adjusted", "Trend-cycle"))

autoplot(cangas, series = "Data") +
  autolayer(seasadj(seats_cangas), series = "Seasonally Adjusted") +
  autolayer(trendcycle(seats_cangas), series = "Trend-cycle") +
  scale_color_manual(values = c("gray", "blue", "red"),
                     breaks = c("Data", "Seasonally Adjusted", "Trend-cycle"))

The trend for stl_decomposition method is more smooth, and the seasonally adjusted curve is more variable.

stl_changing_bricksq <- stl(bricksq, s.window = 13)
stl_fixed_bricksq <- stl(bricksq, s.window = 'periodic')
autoplot(stl_changing_bricksq)

autoplot(stl_fixed_bricksq)

autoplot(bricksq, series = "Data") +
  autolayer(trendcycle(stl_fixed_bricksq),
            series = "Trend-cycle") +
  autolayer(seasadj(stl_fixed_bricksq),
            series = "Seasonally Adjusted Data") +
  scale_color_manual(values = c("gray", "red", "blue"),
                     breaks = c("Data", "Trend-cycle", "Seasonally Adjusted Data"))

autoplot(bricksq, series = "Data") +
  autolayer(trendcycle(stl_changing_bricksq),
            series = "Trend-cycle") +
  autolayer(seasadj(stl_changing_bricksq),
            series = "Seasonally Adjusted Data") +
  scale_color_manual(values = c("gray", "red", "blue"),
                     breaks = c("Data", "Trend-cycle", "Seasonally Adjusted Data"))

stl_fixed_bricksq %>% seasadj() %>% naive() %>% autoplot()

stl_changing_bricksq %>% seasadj() %>% naive() %>% autoplot()

bricksq %>% stlf() %>% autoplot()

bricksq %>% stlf() %>% checkresiduals()

## Warning in checkresiduals(.): The fitted degrees of freedom is based on the
## model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,N,N)
## Q* = 41.128, df = 6, p-value = 2.733e-07
## 
## Model df: 2.   Total lags used: 8

They look correlated.

stlf_brick_robust <- stlf(bricksq, robust = TRUE)
autoplot(stlf_brick_robust)

checkresiduals(stlf_brick_robust)

## Warning in checkresiduals(stlf_brick_robust): The fitted degrees of freedom
## is based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,N,N)
## Q* = 28.163, df = 6, p-value = 8.755e-05
## 
## Model df: 2.   Total lags used: 8

It does not make a big difference.

trainset_brick <- subset(bricksq, 
                        end = length(bricksq) - 8)
testset_brick <- subset(bricksq,
                        start = length(bricksq) - 7)
snaive_brick <- snaive(trainset_brick)
stlf_brick_part <- stlf(trainset_brick, robust = TRUE)

autoplot(bricksq, series = "Original data") +
  geom_line(size = 1) +
  autolayer(stlf_brick_part, PI = FALSE, size = 1,
            series = "stlf") +
  autolayer(snaive_brick, PI = FALSE, size = 1,
            series = "snaive") +
  scale_color_manual(values = c("gray50", "blue", "red"),
                     breaks = c("Original data", "stlf", "snaive")) +
  scale_x_continuous(limits = c(1990, 1994.5)) +
  scale_y_continuous(limits = c(300, 600)) +
  guides(colour = guide_legend(title = "Data")) +
  ggtitle("Forecast from stlf and snaive functions") +
  annotate(
    "rect",
    xmin=1992.75,xmax=1994.5,ymin=-Inf,ymax=Inf,
    fill="lightgreen",alpha = 0.3
    )

## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.

## Warning: Removed 136 row(s) containing missing values (geom_path).

Can see that the stlf curve is closer to the original data

stlf_writing_rwdridt <- stlf(writing, 
                     s.window = 13, 
                     robust = TRUE,
                     lambda = BoxCox.lambda(writing),
                     method = "rwdrift")
autoplot(stlf_writing_rwdridt)

stlf_writing_naive <- stlf(writing, 
                     s.window = 13, 
                     robust = TRUE,
                     lambda = BoxCox.lambda(writing),
                     method = "naive")
autoplot(stlf_writing_naive)

Can see it is better to use rwdrift

stlf_fancy_rwdridt <- stlf(fancy, 
                     s.window = 13, 
                     robust = TRUE,
                     lambda = BoxCox.lambda(fancy),
                     method = "rwdrift")
autoplot(stlf_fancy_rwdridt)

stlf_fancy_naive <- stlf(fancy, 
                     s.window = 13, 
                     robust = TRUE,
                     lambda = BoxCox.lambda(fancy),
                     method = "naive")
autoplot(stlf_fancy_naive)

It is better to use rwdrift

Homework 2

Jiahao Lin

7/11/2020